Project acronym: QUESTION-HOW
Project Full Title:Quality Engineering
Solutions via Tools, Information and Outreach for the New Highly-enriched
Offerings from W3C: Evolving the Web in Europe
Project/Contract No. IST-2000-28767
Workpackage 1, Deliverable D1.5
Project Manager: Daniel Dardailler
<danield@w3.org>
Author of this document: Vincent
Quint<quint@w3.org>
Created: 28 August 2002. Last updated: 28 August 2002.
This document is a summary of the achievements of project "Enhancement of the Amaya Web browser/editor for Internationalization and XML".
The goal of the project was to develop Amaya, W3C's Web browser and editor, in two directions simultaneously. One development track is Internationalization, the other is XMLization. The motivation is to allow Web users to create and display pages in many different languages, including those using non-latin scripts. This will broaden the use of Amaya to new geographic areas. While the coverage increases, the technologies implemented in Amaya are also extended to support and promote the latest developments of XML technologies.
Amaya is a Web client that acts both as a browser and as an authoring tool, the two features being seamlessly integrated. Web pages are edited in WYSIWYG mode, i.e. the user interacts on a formatted document, like in most word processors. The editor maintains a structured representation of the document (a DOM tree). Every user command is first performed on this tree, and the part of the tree that has been modified is reformatted and redisplayed. Several views of each document may be open simultaneously, to provide a more complete representation of the document being edited. In addition to the formatted view, Amaya can display the source code and/or the DOM tree. All views can be edited. The formatted view and the tree view always reflect the current status of the document while the source view is refreshed on request.
To allow users to really edit the Web, Amaya provides direct access to remote Web sites through the HTTP 1.1 protocol, both for reading and writing Web pages (the HTTP Get and Put methods are used). Thus users can edit pages that are stored on remote servers exactly in the same way they work on local files.
Several document formats are supported natively by Amaya: HTML, XHTML, MathML, and SVG. This allows author to edit various types of Web pages, including scientific documents with lots of mathematical expressions, and structured graphics.
CSS style sheets are also supported. Not only documents are formatted according to their style sheets, but users can create and edit style sheets in a WYSIWYG mode.
Amaya is available on several platforms: Windows (95, 98, NT, 2000, XP), Unix (Linux, Solaris, AIX, etc.) and Mac-OS X. It is an Open Source endeavour.
Developments in the Internationalization area cover several aspects:
On-line help is provided to assist users while they are working. Help pages are available in French, English, German, and Spanish (not complete yet).
In addition to scripts, others aspects of languages are handled, such as hyphenation: words are hyphenated according to the language. Version 6.2 can hyphenate Dutch, English, Finish, French, German, Italian, Portuguese, Spanish, and Swedish.
The editor is able to spell check document contents in English and French.
All these features are available on all supported platforms.
The international structure of W3C was a great help in these developments. The Team from Keio University has greatly contributed to the support of Japanese. This joint effort is now continued to introduce other Asian languages in Amaya.
At the specification level, Unicode is used by XML to allow XML files to contain text written in any script. This feature applies to all XML languages, including XHTML, MathML and SVG.
In Amaya, internationalization has been implemented in the spirit of XML. The common ground provided by XML to these document formats has been adapted to Unicode and, as a consequence, all formats (XHTML, MathML, SVG) have immediately taken advantage from Unicode.
An important part of the internationalization effort was dedicated to the support of Unicode. All documents handled by Amaya are now represented internally in UTF-16. This allows to represent any document, even the most complex multilingual documents, in an uniform way. This also makes processing these documents simpler.
The initial version of Amaya supported only 8-bits characters. All internal structures functions have been changed to handle multi-bytes characters.
Although it uses UTF-16 internally, Amaya does not impose documents to be coded in UTF-16. It provides conversion from and to several other encodings:
To display all these characters, the support of fonts has been considerably extended. In particular, to get access to a large variety of characters in many different styles and sizes, TrueType fonts have been introduced. This also allows to use the same fonts on different platforms and makes font management easier.
Handling several writing directions in the same block of text has required the layout process to be redesigned. The main change was to implement the Unicode bidi algorithm. This allows to set characters properly within a block of text such as a paragraph with several chunks of text in different writing directions (for instance English, Arabic and Hebrew).
More work was required to align blocks correctly in a page according to the writing direction. With this new approach to page formatting, complex structures such as tables or nested lists can now be displayed equally well in Arabic, Hebrew or Latin documents.
The above mentioned extensions were required to display multilingual document, but to also allow editing, a some more changes have been made. Cursor movements now follow the logical order of characters. For instance, when the cursor is at the end of a chunk of text in English followed by some text in Arabic, moving to the next character actually put the cursor at the other end of the Arabic text.
Various input methods have to be supported to allow users to enter characters from different scripts. These methods are usually provided by the operating system, but Amaya was extended to allow the user to select among the methods available. This feature has been used to enter Japanese characters.
The internationalization effort concerns two aspects of HTTP 1.1:
%hh
Multi-lingual XHTML pages can be displayed and edited freely. In fact, with the extensions mentioned previously for the support of Unicode, this was quite easy to achieve. But XHTML has a few internationalization features that have been implemented:
xml:lang
is used to control hyphenation and
spell checkingdir
and bdo
are used to control
the bidi algorithm.The CSS style sheet language can be used to specify the style of any
document, whatever the format used ((X)HTML, XML, SVG, MathML). Like XHTML,
CSS has a few internationalization features. In particular, properties
direction
and unicode-bidi
have been implemented to
control the bidi algorithm.
The SVG graphics format provide full support for internationalized text. A large part of the implementation was realized by the general support of Unicode, but there are also some SVG specific features that had to be implemented in addition:
switch
and attribute systemLanguage
allow several alternatives of some text to be presented in different
languages, according to user preferences. This is an interesting
provision for adaptable graphics.direction
and unicode-bidi
,
similar to the corresponding CSS properties, allow to control the bidi
algorithm.Unicode provides MathML with a very wide variety of characters and symbols. Again, the basic Unicode support in Amaya makes it possible to handle all these characters, but more efforts were required to cope with the special needs of mathematical expressions. While most platforms propose a number of fonts for displaying usual scripts (Latin, Greek, Arabic, etc.), they are very limited regarding mathematical symbols. To address this issue, the support of the ESSTIX fonts was added to Amaya.
ESSTIX offers a consistent set of 17 fonts that are freely available in several formats, including TrueType. With these fonts, Amaya can display almost every mathematical expression represented in MathML. The support of these fonts required mapping tables to be created, as the original encoding was not Unicode.
Annotea is an application built on top of Amaya to allow users to create and share annotations attached to any part of any Web page.
Annotea was extended to fully support UTF-8 in all the RDF data that describes an annotation. Users can now annotate any kind of document, regardless of its encoding. The name of the annotation author is also stored in UTF-8.
Amaya natively supported some XML languages (XHTML, MathML, SVG), but it did not know what to do with documents using other XML languages: it just displayed their source code and allowed the user to edit the code as plain text. The goal of the XML activity in Amaya was to add support for generic XML, i.e. to allow users to see any XML document well formatted and to edit them in the same way they edit XHTML pages.
The first step in supporting generic XML was to revisit the whole sequence of document processing from downloading to publishing, and to adapt each step to generic XML documents.
Generic XML documents can be downloaded either locally or remotely (through HTTP), like any other document. They are then parsed by the same parser (Expat), but parsing does not check elements and attributes, it only makes sure that the document is well-formed (in the XML sense). It simply builds a DOM tree that matches the source file. A formatted representation of the document is then created, relying only on the DOM tree and the CSS style sheets attached to the document. This is different from documents in the natively supported formats: Amaya knows how to format them and style sheets are used only to make changes to the default layout and style. For generic XML documents, if there is no style sheet, Amaya uses its own heuristics to format the document as a sequence of blocks. This provides a very simple layout, but it is enough to read and edit the document. Two other views can be displayed: one showing the DOM tree, the other showing the source code. The three views are handled in the same way as for XHTML, MathML or SVG documents. They can be used to edit the document. Finally, generic XML documents can be saved locally or remotely, like any other document.
XML namespaces allow several XML languages to be used simultaneously in a single document. This feature was already available in the initial version of Amaya, which allowed MathML expressions or SVG graphics to be included within XHTML documents. With the support of generic XML, this was extended to any XML language. Also, the natively supported languages were offered more possibilities to mix together. It is now possible to include MathML expressions or fragments of XHTML text within SVG graphics, even when graphics are themselves included within XHTML pages.
Another possibility is to have islands of MathML, SVG, or XHTML in generic XML documents. Those islands are then processed with the full semantics (formatting, editing) of their language.
Finally, a generic XML document can use several namespaces, some of them being known (XHTML, SVG, MathML), some other being unknown (generic XML). The tree view shows clearly where namespaces change in the DOM tree.
Most editors for XML documents handle only source code and use the document DTD or schema to constrain the user. In Amaya a different approach is taken. Although the source code is available and can be edited, the emphasis is put on WYSIWYG editing. The user interacts mainly on the formatted representation of the document. Also, the notion of well-formedness introduced by XML is used to allow the user to manipulate the document freely, without any DTD or schema.
Editing several documents simultaneously is a basic feature of Amaya. This is specially useful for copying some parts of a local or remote document into another document. This also works well for generic XML. Given the multi-namespace feature, it is easy to copy or move pieces of a document into another while preserving their structure.
The current version of Amaya allows the user to perform a number of operations on the formatted view. He/she can edit the content of any generic XML document, either directly or through a search/replace command. He/she can edit attributes (create or remove attributes, change their value), but the creation of new elements is currently limited and need further work. Obviously, any modification can be done using the source view and such changes can be reflected in the other views on user's request, but that was not the main goal of these developments.
All editing commands for generic XML documents are built on top of the same basic commands used for other types of documents. Therefore they benefit from the same advantages, which include an unlimited undo/redo mechanism.
When implementing XML, it makes sense to also implement some closely related technologies. For instance, XML relies on XLink to add hyperlinking semantics to structured documents. XLink was already implemented in Amaya for handling links in SVG and MathML, and to relate documents and annotations in Annotea. This implementation was extended to allow generic XML documents to use it. It is then possible to create new links in any XML documents and to use them as easily as in XHTML pages.
CSS is another technology used in conjunction with XML. As explained above, it is used to format generic XML documents, but it would be very useful to be able to create or update CSS style sheets while editing an XML document. Work is in progress to allow that, by reusing the CSS editing feature which is already available for XHTML, MathML and SVG. This will be specially useful to improve the layout of documents that come without any CSS style sheet.
None, work is done.