Go to Table of contents, next or previous page.


Developing additional input methods

Thessalonica’s input methods (also called layouts) are described in standard OpenOffice.org xml registry files (with the .xcu extension). Information from these files is merged into the OpenOffice.org registry each time you install the package using the unopkg utility (or the OpenOffice.org graphical extension manager), so that it is easy to retrieve this information using standard registry access methods defined in OpenOffice.org API. So if you want to extend Thessalonica with additional input methods, you have to prepare your own .xcu file similar to those already available (they are stored in the ‘InputMethods’ subdirectory), package it together with Thessalonica and reinstall it. If you are not familiar with the .xcu format, refer to the OpenOffice.org developer’s guide.

Each input method definition should follow the rules described in the InputMethods.xcs schema definition file. Some of these rules are discussed below.

Thessalonica’s input method description format

Since Thessalonica’s input method descriptions are standard .xcu files, they should have the oor:component-data root node with the following attributes:

xmlns:oor="http://openoffice.org/2001/registry" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" 
oor:name="InputMethods" 
oor:package="org.openoffice.comp.thessalonica"

The oor:name and oor:package parameters mean that information from this file should be merged into OpenOffice.org registry file called InputMethods.xcu and will be accessible under the path org.openoffice.comp.thessalonica.InputMethods/.

The oor:component-data node has only one child, called Root, which may have several children. Each of these children describes a separate input method. It is recommended to put description of each input method into its own file, although OpenOffice.org will merge them all together.

Each node describing an input method should have several properties and child nodes, listed below:

string Title

This property specifies a displayable name for this input method, used in the GUI.

string ImageID

This property allows to specify an image file which should be displayed on the Thessalonica toolbar when this input method is active (however this feature doesn’t work in OpenOffice.org 2.1). Note that you should put here not a direct URL pointing to your image, but rather an identifier corresponding to an entry in the Images section of the Addons.xcu file (this file resides in the root directory of the Thessalonica package and is used to describe various GUI elements installed by Thessalonica). So if you have created a new image file for your input method, you should edit both your input method description file and Addons.xcu. Of course you can also associate your input method with one of the predefined image files (their identifiers have the form of comp.thessalonica.*Image, where * stands for a script name). Note that it is not necessary to create a separate image for each input method: it is a normal practice when several input methods, designed for one particular script (e. g. Greek), share the same image.

boolean ComplexScript

This property specifies, if this input method is designed for a “complex” script (right-to-left scripts, such as Hebrew or Arabic, are considered complex). This property is necessary, because OpenOffice.org uses separate sets of formatting attributes for Western, complex and Asian scripts, and thus Thessalonica should know, if it should set “Western” or “complex” formatting before switching to each particular input method.

(node) Format

This node contains a list of formatting properties (namely Family, Bold, Italic, Size and Language), which should be used by default when your input method is enabled (note that these properties may correspond either to a “Western”, or to a “Complex” formatting, depending from the value of the ComplexScript flag). Of course in most cases you will assign the default "(No changes)" value to the Bold, Italic and Size properties, but it is important to select correct values for the font Family and Language, although users may always change these settings via the Keyboard Manager dialog. You should remember that:

(node) Rules

This node is the most important one: it should contain a set of rules which will be applied to the keyboard input. The syntax of such rules is described below.

Each node representing a rule corresponds to a single character, produced by keyboard and intercepted by Thessalonica. Since rules are accessed by name, each rule should have a unique name containing information about the Unicode character this rule coresponds to, and so it was very natural to base naming scheme used for these rules on the Adobe glyph list. However, since it would be difficult to put all glyph names defined in AGL into a relatively small program, the naming convention was slightly simplified, i. e. you can use standard meaningful names only for ASCII characters, while all other symbols are accessed by their hexadecimal Unicode indices preceded by “uni” (e. g. “uni0410”, “uni1F00”).

Note that currently there are no limitations which would require designing your input method by a such way that it works at the top of the standard US English (or any other Latin based) keyboard, as issue 22546, referenced in the previous versions of this manual, has been successfully resolved in OpenOffice.org 2.0. Thus any Unicode characters can be correctly intercepted by OpenOffice.org and handled by Thessalonica.

Except its unique name, each rule has the following properties and child nodes:

string String

This is a Unicode string which should be inserted into your document instead of the keyboard character which was intercepted.

string Comment

Nothing more than a comment, but, as any other comment, it should be used to make the whole xml file more legible. In case the String property contains a sinle Unicode character it is recommended to put the canonical name of that character into Comment.

(node) After
This is a set of rules again. However, each rule from this block will correspond to a Unicode character in your document, after that pressing this key should have a special meaning.

So when the user hits a specific key, Thessalonica checks first, if there is a rule defined for the character produced by this key. If such a rule really exists and has an After block, Thessalonica also checks which character precedes the insertion point. If there is a special rule for this character in the After block, it is replaced with the String specified in that rule. Otherwise, the String defined in the top-level rule is inserted. Note that each rule specified in the After block may have its own set of After rules, and so on, so that your input method may consider a string of any length preceding the insertion point before selecting which text should really be inserted.

To show how an After set works, let’s take the following example from the definition of the GreekBabel input method:

<node oor:name="quotesingle" oor:op="replace">
   <prop oor:name="String" oor:type="xs:string">
      <value>´</value>
   </prop>
   <prop oor:name="Comment" oor:type="xs:string">
      <value>GREEK OXIA</value>
   </prop>
   <node oor:name="After">
      <node oor:name="uni1FBF" oor:op="replace">
         <prop oor:name="String" oor:type="xs:string">
            <value>῎</value>
         </prop>
         <prop oor:name="Comment" oor:type="xs:string">
            <value>GREEK PSILI AND OXIA</value>
         </prop>
      </node>
      <node oor:name="uni1FFE" oor:op="replace">
         <prop oor:name="String" oor:type="xs:string">
            <value>῞</value>
         </prop>
         <prop oor:name="Comment" oor:type="xs:string">
            <value>GREEK DASIA AND OXIA</value>
         </prop>
      </node>
      <node oor:name="uni00A8" oor:op="replace">
         <prop oor:name="String" oor:type="xs:string">
            <value>΅</value>
         </prop>
         <prop oor:name="Comment" oor:type="xs:string">
            <value>GREEK DIERESIS AND OXIA</value>
         </prop>
      </node>
   </node>
</node>

Here pressing the ['] key after Greek psili, dasia or dieresis deletes the previous character and replaces it with the combination of breathing or dieresis with Greek acute (oxia). Otherwise, the Greek acute itself is inserted into the document.


Go to Table of contents, next or previous page.