Internationalization and Localization of XML: Introducing "ITS"
WARNING!
- THIS WILL BE A BORING PRESENTATION!
- IF YOU EXPECT GROUNDBREAKING TECHNOLOGY, LEAVE THE ROOM!
Promise
- If you expect unexciting, but useful new combinations of existing technology, please stay
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
i18n and l10n
- ITS: "data categories" and their implementation as markup for internationalization and localization purposes
- Internationalization (i18n): Make your XML ready for worldwide use!
- Localization (l10n): Adapt your XML to specific audiences!
Targets of ITS
- i18n example target: schema (XML DTD, XML Schema, RELAX NG) with
@translate
attribute
<!ELEMENT p ...>
<!ATTLIST p its:translate (yes|no) #IMPLIED>
Targets of ITS
- l10n example question: which
<string>
elements need translation?
<resources>
<section id="Homepage">
[...]
<keyvalue_pairs>
<string>Page</string>
<string>ABC Corporation - Policy Repository</string>
<string>Footer_Last</string>
<string>Pages</string>
<string>bgColor</string>
<string>NavajoWhite</string>
<string>title</string>
<string>List of Available Policies</string>
</keyvalue_pairs>
</section>
</resources>
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
User: Schema Developers
- Schema developers using "translatability" data category: Add
@its:translate
attribute to your schema
- or: ...
User: Schema Developers
- ... specify separate rules for translatability (no change of schema or document):
<its:rules its:version="1.0" ...>
<its:ns prefix="myns" uri="http://www.example.com/myschema"/>
<its:translateRule translate="yes" selector="//myns:p"/>
<!-- All p elements should be translated -->
</its:rules>
Users: Content Producers
- Content producers and architects using "translatability" data category: Use the
@its:translate
attribute in your
document
<book its:version="1.0" ...>
<head>...</head>
<body>
<p>And he said: you need a new
<quote its:translate="no">T-Model</quote>
</p>
</body>
</book>
Users: Content Producers
- ... specify ITS rules separately
<text>
<head>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<its:translateRule translate="no" selector="//dt"/>
<its:rules>
</head>
<body> ...
<p> ... <dl><dt>...</dt><dd>...</dd></dl></p>
</body>
</text>
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
Basic Concepts of ITS: Summary
- ITS "data categories"
- Selection of nodes locally (example:
@its:translate
attribute)
- Selection of nodes globally (example:
<its:translateRule>
element)
- Adding information to selected nodes / pointing to existing information
- Inheritance / precedence definitions for selected nodes
Notion of "Data Categories"
- "Data categories" defined conceptually in prose:
The data category translatability expresses information about
whether the content of an element or attribute should be
translated or not. The values of this data category are "yes" (translatable)
or "no" (not translatable).
- ... implemented in various schema languages
- ... implemented globally (e.g.
<its:translateRule>
element) or
- ... implementet locally (e.g.
@its:translate
attribute)
Selection of Nodes
- Selection local: mainly via attributes.
- Uses data category specific defaults for inheritance
- Example: "Translatability information pertains to elements, not attributes."
<article xmlns="http://docbook.org/ns/docbook"
... its:version="1.0"
its:translate="yes">
<info id="a001">
<title>An example article</title>
<author its:translate="no">
<personname>
<firstname>John</firstname>
<surname>Doe</surname>
</personname>
<address>...</address>
</author>
</info>
</article>
Selection of Nodes
<topic id="myTopic" xml:lang="en-us"
xmlns="myvocabulary.com">
<title>Using ITS</title>
<prolog>
<its:rules ... its:version="1.0">
<its:ns prefix="my" uri="myvocabulary.com"/>
<its:translateRule selector="//my:term" translate="no"/>
</its:rules>
</prolog>
<body>
<p>An <term>ITS namespace</term> definition exists...</p>
</body>
</topic>
- No (or minimal) impact on documents / schemas
- Data categories can pertain to more than one node
- ... and to attributes as well
So again: What's this, Selection?
- Selection local: similar to CSS
@style
attribute
- Selection global: similar to CSS
<style>
element
- Difference: data category specific defaults, and XPath (instead of CSS selectors)
Adding ITS information vs. Pointing
- Translatability: adding information ("translate yes" or "translate no") to nodes in documents
- Other data categories: adding and / or pointing to existing information
- Example: adding localization information via
<locInfoRule>
element with <locInfo>
child:
<its:rules its:version="1.0">
<its:locInfoRule locInfoType="alert" selector="//body/p[1]">
<its:locInfo>This p element has to be handled carefully"</its:locInfo>
</its:locInfoRule>
</its:rules>
Adding ITS information vs. Pointing
- ... pointing to existing localization information via
<locInfoRule>
element with @locInfoPointer
attribute:
<its:rules its:version="1.0" ...>
<its:locInfoRule locInfoType="alert"
locInfoPointer="@locn-alert" selector="//*"/>
<its:locInfoRule locInfoType="description"
locInfoPointer="//@locn-note" selector="//*"/>
</its:rules>
- Pointing to existing information = saying "this value is a value of an ITS data category"
- Useful for specifying ITS "semantics" of values, e.g. for authoring tools ("highlight all ITS localization information")
Precedence between Selections
- Examples: local has precedence other global, and
- Conflicts between rule elements (e.g.
<translateRule>
) : the last rule element wins
<text>
<its:rules its:version="1.0">
<its:translateRule translate="yes" selector="//p"/>
<its:translateRule translate="no" selector="//p[@transinfo='no-trans']"/>
<its:rules>
<body>
<p its:translate="no"> ... <dl><dt>...</dt><dd>...</dd></dl></p>
</body>
</text>
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
Overview of ITS Data Categories
- Translatability (previous slides)
- Localization Information (previous slides)
- Terminology
- Directionality
- Ruby
- Language Information
- Elements within text
Terminology
- Used to mark-up terms, to increase terminological consistency
- Optional reference to further information about the term:
@termInfoRef
attribute
- Available global (with pointing / adding) or local
<its:rules its:version="1.0">
<its:termRule selector="//body/p[1]/span"
termInfoRef="http://example.com/termdatabase/#x142539"/>
</its:rules>
<its:rules its:version="1.0">
<its:termRule selector="//body/p[1]/span"
termInfoRefPointer="@myTermRef"/>
</its:rules>
Directionality
- Expresses the directionality of a piece of text
- Available global (only with adding) or local
<its:rules its:version="1.0">
<its:dirRule dir="rtl" selector="/body/p[1]/quote[xml:lang='he']"/>
<!-- Some Hebrew quotation -->
</its:rules>
Ruby
- Provides a short annotation of the associated base text e.g. for pronunciation
- Available global (with pointing / adding) or local
<text its:version="1.0">
<head> ... </head>
<body>
<p>This is about the
<its:ruby>
<its:rb>W3C</its:rb>
<its:rt>World Wide Web Consortium</its:rt>
</its:ruby>
</p>
</body>
</text>
Language Information
- Expresses that a given piece of content (selected by the attribute langPointer) is used to express language information as defined by
RFC 3066bis.
- Available only global, only with pointing
<its:rules its:version="1.0">
<its:langRule selector="//p" langPointer="@mylangattribute"/>
</its:rules>
Elements Within Text
- Adds information globally (for segmentation purposes) about how elements should affect the flow of the content
<its:rules its:version="1.0">
<its:withinTextRule withinText="yes" selector="//b | //em | //i"/>
<its:withinTextRule withinText="no" selector="//p"/>
<its:withinTextRule withinText="nested" selector="//p/footnote/p"/>
</its:rules>
@withinText="yes"
: element is in text flow of a different element
@withinText="no"
: counterpart to @withinText="yes"
@withinText="nested"
: element is nested in a text flow
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
Basis: The ODD Language
- ITS specification and schemas: developed in the ODD ("One Document Does it all") language
- ODD document: Markup Declarations and Documentation (literate programming style)
- Markup Declarations: (RELAX NG based) content models, "macros", hierarchical class system, element "modules"
- From ODD document: Generation of schemas (XML DTD, XML Schema, RELAX NG) and documentation
Benefit for ITS
- ODD eases potential localization of ITS specification
- Markup declaration changes reflect in the text and the schemas
- Generation of schema modules eases task of integrating ITS into schemas
- Consistency checking becomes much easier - no hand-crafted schemas!
ODD Elements
- Schemas:
<schemaSpec>
- Elements:
<elementSpec>
- Classes:
<classSpec>
- Attribute lists:
<attList>
- Attribute definitions:
<attDef>
ODD Example: Element Declaration
<elementSpec ident="rules" ns="http://www.w3.org/2005/11/its">
<desc>Container for global rules.</desc>
<classes>
<memberOf key="att.xlink"/>
</classes>
<content>
<rng:group>
<rng:zeroOrMore>
<rng:ref name="ns"/>
</rng:zeroOrMore>
<rng:zeroOrMore>
<rng:choice>
<rng:oneOrMore>
<rng:ref name="translateRule"/>
<rng:ref name="locInfoRule"/>
<rng:ref name="termRule"/>
<rng:ref name="dirRule"/> ...
</rng:oneOrMore>
</rng:choice>
</rng:zeroOrMore>
</rng:group>
</content> ...
</elementSpec>
Transforming ODD
- Documentation extraction in HTML, XSL-FO, LaTex
- Generation of RELAX NG schemas and XML DTDs
- Via trang: XML Schema generation
Overview
- Background
- Users and Usages of ITS
- Basic Concepts of ITS
- Overview of ITS "Data Categories"
- Development of the ITS Specification
Questions to the Audience
- Will you integrate ITS markup in your schema?
- Will you implement ITS in your editor (e.g. highlighting of translatable text)?
- Will you process ITS markup (e.g. term extraction)?
- If you answered "no" to a question - why?
- What data categories are important / not important for you?