Revision 1.0, 07/22/2003
Questions & Comments : Please mail Prabhat Hegde
Supporting Indic Scripts in Mozilla
Introduction
This is a two part document which
outlines Indic support in Mozilla. This document pertains to all issues
dealing with non-BiDi CTL support in Mozilla for Unix platforms. Though
the design and implementation is Cross-Platform, non Unix platforms
have not been considered for implementation.
Complex Text Script Support (CTL)
The presentation of Indian Language
scripts requires contextual processing for display and editing. This
output technology is called Complex Text Layout (CTL). CTL features
enables the providers (such as OS/X-Windows/CDE-Motif, Gnome/Gtk/Pango
based applications, or Cross-platform applications such as Mozilla or
Star/OpenOffice) to support writing systems that require a complex set
of transformations between logical (as in stored-text) and
physical/Visual (as in display or printed) text-data representations.
Additionally, CTL support also defines the behaviour of character
combinations and shaping, Text edit operations and if needed,
Component orientation.
Mozilla's non-Bidi CTL support currently provides for:
- Context Sensitive Shaping and Rendering on Unix platforms using
non-intelligent (non-OT) fonts encoded in "sun.unicode.india-0"
encoding. A set of free fonts in this encoding is available for
download at:
http://developer.sun.com/techtopics/global/index.html
- Support for edit operations including Cursor positioning and
Selection.
- Printing through XPrint.
- Features 1-3 above are available for Core-X and XFT2 backends.
- Currently Hindi and Tamil (TISCII) support is available but its
fairly easy to extend support for other Indic scripts using the same
architecture.
Future enhancements will be to
provide the same featureset but by using OpenType fonts which is the
future roadmap for fonts for Indic scripts.
Mozilla also supports BiDi CTL more information for which is available
at:
http://www.langbox.com/bidimozilla
Indic Script Support Issues in the Browser (*nix only)
Input Method support in Mozilla is
handled by the underlying platform. This leaves the following issues:
- Output/Rendering support which includes:
- Font Handling
- Printing
- CTL Issues that cover:
- Character-Clustering
- Context-Sensitive Shaping
- Joining
- Split Glyphs
- Character-Reordering
- Edit Operations
- Data Exchange including handling ISCII-91 data and Mail
Font Handling is an issue for Indic
scripts due to lack of font (repotoire, layout and encoding)
standardization as well as a uniform typographic framework across
Windows, Unix and Unix variants. Similarly '2' above is an issue since
ISCII-91 or its variants are not registered MIME-Charsets under IANA.
However, lack of Localized applications and legacy data that comes with
it means that this is less of an issue.
Design goals for Indic Support in
Mozilla (*nix Only)
The original design &
implementation goals were :
- Provide non-Bidi CTL script support for languages such as Thai
and HIndi for Core-X11 fonts. on *nix platforms.
- Do not regress existing builds.
- Leverage existing code
- Provide pluggable interfaces to support additional scripts.
- Localize code-changes as much as possible.
Design approach
The design approach to solving Indic-CTL script issues are fairly
localized since it has less effects on layout compared to Bidi-CTL
scripts. The approach followed involves:
- In I18n(Encoder) layer
- Identify groups of input character that form a logic
chararacter.
- Convert ISCII<->Unicode UCS-2 for ISCII support.
- In Layout layer
- Identify Cluster-Boundaries, ie. groups of Unicode chars that
form a logical unit of display.
- During painting/measuring/sizing text do not split chars
without losing Cluster-Boundary information.
- While performing edit-operations such as cursor positioning,
do not lose Cluster-Boundary information, ie. do not split by chars.
- In Graphic(gfx) layer
- Identify Cluster-Boundaries, ie. groups of Unicode chars that
form a logical unit of display.
- Always perform a glyph-generation/context-sensitive shaping
operation before performing a measuring/drawing operation.
- Perform glyph generation.
Implementation approach
The design goals described above mapped to the following implementation
approach.
- Provide a compile time switch to enable Indic builds which would
not need to be turned on by default.
- Leverage Core-X11 shaper(s) API from pango (www.pango.org) since
it is simple and cross-platform in nature.
- Leverage nsIUnicodeEncoder & gfx.
- Create a light-weight CTL API to handle Clustering used for Edit
operations.
- Use of pango.modules to additional scripts.
- Support not more than 2 non-OT fonts
In addition
Jungshik Shin re-used the
shapers to extend CTL functionality to XFT backend.
The implementation approaches in the font layer will vary depending on
whether the support needs to be extended to OpenType fonts. My
recommendation for supporting OpenType fonts is to re-use the
light-weight API's created for the original implementation while using
a combination of ICU + Mozilla XFT2 for the same. The effort involved
in doing so is fairly trivial since ICU already supports Indic scripts
and ICU layout cab be built separately of the parent ICU. Specifically,
it would involve the following (Minimum set of tasks) in Mozilla:
- Create an ICU-layout XPCOM component.
- Access OpenType Tables using (or adding to) Mozilla XFT API.
- Obtain Cluster-Boundary information from OpenType fonts using
ICU+(adding/extending)Mozilla XFT.
- Create additional shapers that use ICU layout for
Glyph-Generation.
- Modify gfx to call the shapers.
Code to accesses the tables in the
font (TrueType/OpenType/...) and rasterize the outlines is already
present in Mozilla-XFT. Code also exists in Mozilla to move the raster
to the X server.
Implementation Details
Implementation details for Mozilla CTL code that covers Thai and Indic
is as below:
- Building:
Use --enable-CTL switch.
Safe with Core-X11 and with --enable-default-toolkit=gtk2
- Complete set of newly created sources
Search
LXR using SUNCTL Macro
- External Interfaces
components/libctl.so
pango.modules
libmozpango.so
libmozpango-thaix.so
libmozpango-dvngx.so
.. .. ..
.. .. ..
- Lighweight CTL API
There are additional API that need to be checked in as a part of
bugzilla ID
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsULE.h
- Adding fonts - Reference the following:
http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetData.properties#161
http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetTitles.properties#112
- Creating and adding CTLized Encoders - Reference
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToSunIndic.cpp
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsUnicodeToThaiTTF.cpp
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/nsCtlLEModule.cpp
- Creating and addiing CTLized Shapers
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/pangoLite/pango.modules
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/thaiShaper/
http://lxr.mozilla.org/seamonkey/source/intl/ctl/src/hindiShaper/
- Encoder Changes
None at the moment since encoders in nsCTLLEModule register themselves.
- Layout Changes
Interfaces PeekOffset,
GetFocusOffset, GetPointFromOffset, .. etc will be amoung those
affected in order to handle Edit Operations. All interfaces that deal
with obtaining text buffer offsets from a screen position are expected
to be affected. Currently the changes are localized to
nsTextFrame.cpp,
-- #ifdef SUNCTL portions but this is expected to spread to more
interfaces inthe following:
- Graphics Changes
All entry to drawing and measurement API are expected to be affected.
nsFontMetricsGTK.cpp,
fontEncoding.properties
-- XFT Support
nsFontMetricsXlib.cpp
-- Core X & XPrint
Current Status
Bugs
Open Issues
References
Downloads
Contacts