Internationalization Tag Set (ITS)

1 Introduction

This section is informative.

This document defines data categories and their implementation as a schema that can be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD [XML 1.0], XML Schema [XML Schema] and RELAX NG [RELAX NG]. The document provides examples of how ITS can be used with existing vocabularies.

Requirements for the internationalization and localization related to markup are formulated in [ITS REQ]. Not all of these requirements are addressed in this document, for example:

The Working Group will cover some of the requirements in a separate document on techniques for internationalization and localization of schemas and XML instances.

1.1 Background: Motivation for ITS

Content or software that is authored in one language (i.e. source language) is often made available in additional languages. This is done through a process called localization, where the original material is translated and adapted to the target audience.

From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization. For a detailed explanation of the terms "localization" and "internationalization", see [l10n i18n].

The increasing usage of XML as a medium for documentation-related content (e.g. DocBook [DocBook], a format for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the eXtensible User Interface Language [XUL]) creates challenges and opportunities in the domain of XML internationalization and localization.

The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism which identifies which parts of an XML instance need to be translated (the text in bold face shows the parts that need to be localized). Tools often cannot automatically do this identification.

Example 1: Document with partially localizable content

PhaseCode should not be translated; the title attribute sometimes has to be translated and sometimes must not be translated.

<Manual>
 <Info>
  <PhaseCode>Review Level</PhaseCode>
  <FormNo>8U81-GS-52C</FormNo>
  <Name>Owner's Manual</Name>
  ...
 </Info>
 <Section id="0" title="#Introduction#">
  <Ltitle id="005" title="#ZOOM#">
   <Mtitle id="00501" title="Getting started" option="no" cols="1">
    <MultiCol cols="1">
     <Text>Some text to localize</Text>
     ...
    </Multicol>
   </Mtitle>
  </Ltitle>...
</Manual>

Example 2: Document with partially localizable content

The first file name in the first component element would not be translated.

<dialogue xml:lang="en-gb">
 <rsrc id="123">
  <component id="456" type="image">
   <data type="text">images/cancel.gif</data>
   <data type="coordinates">12,20,50,14</data>
  </component>
  <component id="789" type="caption">
   <data type="text">Cancel</data>
   <data type="coordinates">12,34,50,14</data>
  </component>
 </rsrc>
</dialogue>

Example 3: Document with partially localizable content

In the example below, there are no clear mechanism allowing one to know which string element needs to be translated.

<resources>
 <section id="Homepage">
  <arguments>
   <string>page</string>
   <string>childlist</string>
  </arguments>
  <variables>
   <string>POLICY</string>
   <string>Corporate Policy</string>
  </variables>
  <keyvalue_pairs>
   <string>Page</string>
   <string>ABC Corporation - Policy Repository</string>
   <string>Footer_Last</string>
   <string>Pages</string>
   <string>bgColor</string>
   <string>NavajoWhite</string>
   <string>title</string>
   <string>List of Available Policies</string>
  </keyvalue_pairs>
 </section>
</resources>

1.2 Out of Scope

The data categories and their implementation as a schema does not address document-external mechanisms or data formats for describing localization-relevant information over and above what is appropriate for inclusion in the format itself. Such mechanisms and data formats, also sometimes called XML Localization Properties, are out of the scope of this document. However, this document specifies a methodology how localization properties and information about internationalization and localization can be applied to various places in schemas and instance documents. See Section 4: Selection of ITS information.

1.3 Important Design Principles

Abstraction via data categories: ITS defines data categories as a description of information for internationalization and localization of XML schemas and documents. This description is independent of its implementation e.g. using an element or attribute. See Section 3.3: Data category for a definition of the term data categories, Section 5: Description of Data Categories for the definition of the various ITS data categories, and Section 7: Markup Declarations for the data category implementations.

Selection mechanisms, here exemplified by the translatability data category: Content authors need a simple way to express whether the content of an element or attribute should be translated or not, e.g. a translate attribute. On the other hand, for translations of large document sets based on the same schema, a specification of defaults for translatability and exceptions from the defaults is important (e.g. all p elements should be translated, but not p elements inside of an index element). This specification responds to these requirements by introducing mechanisms for specifying ITS information in XML documents or schemas, see Section 4: Selection of ITS information. This method also provides a means for specifying ITS information for attributes (a task for which no standard means yet exists). The ITS mechanisms for selection are:

useable for both XML schemas and XML instances
useable local (at the XML node to which it pertains) or globally (not at the XML node to which it pertains)

Extensibility: It may be useful or necessary to extend the set of information available for internationalization or localization purposes beyond what is provided by ITS. This specification does not define a general extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces [XML Names]) may be used.

Ease of integration:

ITS follows the example from section 4 of [XLink 1.1], by providing mostly global attributes for the implementation of ITS data categories. Avoiding elements for ITS purposes as much as possible ensures ease of integration into existing markup schemes, see section 3.14 in [ITS REQ]. Only for some requirements do additional child elements have to be used, see for example Section 5.6: Ruby.
ITS has no dependency on technologies which are yet to be developed
ITS fits with existing work in the W3C architecture (e.g. use of XPath [XPath 1.0] as a selection mechanism)

1.4 Development of this Specification

This specification has been developed using the ODD (One Document Does it all) language of the Text Encoding Initiative ([TEI]). This is a literate programming language for writing XML schemas, with three characteristics:

The element and attribute set is specified using an XML vocabulary which includes support for macros (like DTD entities, or schema patterns), a hierarchical class system for attributes and elements, and creation of modules.
The content models for elements and attributes is written using embedded RELAX NG XML notation.
Documentation for elements, attributes, value lists etc is written inline, along with examples and other supporting material.

XSLT transform are provided by the TEI to extract documentation in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.

2 Basic Concepts

This section is informative.

Information (e.g. "translate this") captured by ITS markup (e.g. "its:translate='yes'") always pertains to one or more XML nodes (mainly element and attribute nodes). ITS markup explicitly or implicitly selects these XML node(s). ITS distinguishes two ways of selecting XML nodes: locally, and with global rules.

The mechanisms defined for ITS selection resemble those defined in [CSS2]. Element-specific ITS information can be compared to the style attribute in CSS, and ITS information in global rules is similar to the style element in CSS. In contrast to CSS, ITS uses XPath for identifying nodes.

the selection in an instance approach puts ITS markup in the relevant element of the host vocabulary (e.g. the author element in DocBook)
the rule-based approach puts the ITS markup in elements defined by ITS itself (i.e. the documentRule element)

ITS markup can be used with XML instances (e.g. a DocBook article), or schemas (e.g. an XSD for a proprietary document format). Since each usage defines some specific requirements, ITS markup in XML instances may look slightly different than ITS markup in schemas.

The following three examples sketch the distinction between the local and global approaches, and the difference between ITS in XML instances and schemas.

Example 4: ITS markup on elements in an XML instance

<article xmlns="http://docbook.org/ns/docbook"
 xmlns:its="http://www.w3.org/2005/11/its"
 its:translate="yes">
 <info>
  <title>An example article</title>
  <author its:translate="no">
   <personname>
    <firstname>John</firstname>
    <surname>Doe</surname>
   </personname>
   <affiliation>
    <address><email>foo@example.com</email></address>
   </affiliation>
  </author>
 </info>...
</article>

Example 5: ITS global markup in an XML instance

<dita:topic xmlns:dita="http://dita.oasis-open.org/architecture/2005/"
  xmlns:its="http://www.w3.org/2005/11/its"
  DITAArchVersion="1.0" 
  id="myTopic">
 <dita:title>ITS and Namespaces</dita:title>
 <its:documentRules>
   <its:ns its:prefix="dita" its:uri="http://dita.oasis-open.org/architecture/2005/"/>
   <its:documentRule its:translateSelector="//dita:term" its:translate="no" />
 </its:documentRules>
 <dita:body>
  <dita:p>An <dita:term>ITS namespace</dita:term> definition exists ....</dita:p>  
 </dita:body>
</dita:topic>

Example 6: ITS markup on elements in an XML Schema

<xs:schema>
 <xs:element name="term">
  <xs:annotation>
   <xs:appinfo>
    <its:schemaRule its:translate="no"/>
    </xs:appinfo>
   </xs:annotation> ...
  </xs:element> ...
 </xs:schema>

The commonality in all of the examples above is the markup "its:translate='no'". This piece of ITS markup can be interpreted as follows:

it pertains to the data category translatability
the ITS data category attribute translate holds a value of "no"

The examples with global and local usage of ITS markup show that ITS data category attributes in some cases appear in elements defined by ITS itself: the documentRule element (embedded within a documentRules element), or the schemaRule element. It should come as no surprise that one difference between these two elements is where they are used:

documentRule : may appear in XML instances and schemas
schemaRule : may only appear in schemas

A less obvious, but important difference between documentRule and schemaRule is the following: in addition to one or more ITS data category attributes, documentRule contains a corresponding set of ITS selector attributes (in the example translateSelector ). As their name suggests, they select (or designate) one or more XML nodes (namely those to which a corresponding ITS data category attribute pertains). The value of ITS selector attributes are XPath absolute location paths. Information for to the handling of namespaces in these path expression is contained in the ITS element ns which is a child of documentRules .

ITS selector attributes allow:

ITS data category attributes to appear in global rules (even outside of an XML instance or schema)
ITS data categories attributes to pertain to sets of XML nodes (for example all p elements in an XML instance)
ITS markup to pertain to attributes
ITS markup to map to existing markup (for example the term element in DITA)

The power of ITS selector attributes comes at a price: rules related to overwriting/precedence, and inheritance, have to be established.

Example 7: Overwriting and Inheritance

<text>
 <head>
 <its:documentRules>
  <its:documentRule its:translate="yes" its:translateSelector="//p"/>
 <its:documentRules>
 </head>
 <body> ...
  <p its:translate="no"> ... <dl><dt>...</dt><dd>...</dd></dl></p>
 </body>
</text>

In this example, the ITS data category attribute translate appears twice: in a documentRule , and on a specific p element. Since the ITS selector attribute in the documentRule selects all p elements, the question arises what the value for the translate data category of the p element which has local markup is. ITS provides precedence and inheritance rules which answer questions like this. In the example, the value is "no" (that is the content of the p element should not be translated).

3 Notation and Terminology

This section is normative.

3.1 Notation

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].

The namespace URI that must be used by implementations of this specification is:

http://www.w3.org/2005/11/its

The namespace prefix used in this specification for this URI is "its". It is recommended that implementations of this specification use this prefix.

In addition, the following namespaces are used in this document:

http://www.w3.org/2001/XMLSchema for the XML Schema namespace, here used with the prefix "xs"
http://relaxng.org/ns/structure/1.0 for the RELAX NG namespace, here used with the prefix "rng"

3.2 Schema Language and Schema Annotation

[Definition: Schema language refers in this specification to XML DTD, XML Schema or RELAX NG.]

[Definition: Schema annotation is a schema language specific means to provide information about element, attribute, type etc. declarations. This information is not used by the schema processor, but for external, validation independent applications.]

3.3 Data category

[Definition: ITS defines data category as an abstract concept for a particular type of information for internationalization and localization of XML schemas and documents.] The concept of a data category is independent of its implementation in an XML environment (e.g. using an element or attribute).

For each data category, ITS distinguishes between the following:

the prose description, see Section 5: Description of Data Categories
schema language independent formalization, see Section 7: Markup Declarations
schema language specific implementations, see Appendix A: Schemas for ITS

Example 8: A data category and its implementation

The data category translatability conveys information as to whether a piece of content should be translated or not.

The simplest formalization of this prose description on a schema language independent level is a translate attribute with two possible values: "yes" and "no". An implementation on a schema language specific level would be the declaration of the translate attribute in e.g. an XML DTD, an XML Schema document or an RELAX NG document.

An alternative formalization on a schema language independent level is a schemaRule element which conveys with a translate attribute information about translatability. An implementation on a schema language specific level is the declaration of the schemaRule element.

3.4 Selection

[Definition: selection encompasses mechanisms to specify to what parts of an XML document or schema an ITS data category and its values should be applied to.]. Selection is discussed in detail in Section 4: Selection of ITS information.

4 Selection of ITS information

This section is normative.

4.1 Locations of Data Categories and Selection Mechanisms

Selections of ITS Information can appear in three places:

in a schema: ITS data categories are expressed as schema annotation, and the selection is the element or attribute declaration which is being annotated
global rules: the selection is realized as a selector attribute, which appears together with a data category attribute. The selector attribute contains an AbsoluteLocationPath as described in [XPath 1.0]
in an instance document: the selection is realized using a data category attribute, which is attached to the selected element node. There is no additional selector attribute. The default selection for each data category defines whether the selection covers attributes and child elements. See Section 5.1: Position and Default Selections of Data Categories.

The various selection mechanisms are defined in detail below.

4.1.1 Schema Annotation

In Schemas, selection of ITS information is realized with schema annotation. The selection for a data category depends on the position of the schema annotation. Since schema annotation mechanisms are schema language specific, the following definitions are made:

[Definition: selection of elements in XML Schema is expressed with an xs:appinfo element which is a direct child of the xs:element element and which contains a schemaRule element, which has one or more data category attributes.]

Example 9: Selection of elements in an XML Schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule its:translate="yes"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

[Definition: selection of attributes in XML Schema is expressed with an xs:appinfo element which is a direct child of the xs:attribute element and which contains a schemaRule element, which has one or more data category attributes.]

Example 10: Selection of attributes in an XML Schema

<xs:attribute name="alt">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule its:translate="yes"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:attribute>

[Definition: selection of elements in RELAX NG is expressed with a schemaRule element which is a direct child of the rng:element element, and which has one or more data category attributes.]

Example 11: Selection of elements in RELAX NG

<element name="p">
 <its:schemaRule its:translate="yes"/> ...
</element>

[Definition: selection of attributes in RELAX NG is expressed with a schemaRule element which is a direct child of the rng:attribute element, and which has one or more data category attributes.]

Example 12: Selection of attributes in RELAX NG

<attribute name="p">
 <its:schemaRule its:translate="yes"/> ...
</attribute>

As for XML DTD, this specification defines no selection mechanism within the DTD.

Note: To be able to select elements or attributes defined within a XML DTD, the mechanisms described in Section 4.1.2: Rule-based Selection can be used.

Several data categories on the same element or attribute declaration should be expressed at the same schemaRule element.

Example 13: Several data categories at the same element

<its:schemaRule its:translate="yes" its:locInfo="This has to be handled carefully"
its:locInfoType="alert"/>

4.1.2 Rule-based Selection

Rule-based selection is implemented using the documentRules element. It contains one or more documentRule elements. Each documentRule element has one or more data category attributes, and for each data category attribute an selector attribute which points to the selected information.

The naming convention for the selector attributes is data category + Selector, e.g. translateSelector . In ITS rules selections, the value of the attribute must be an XPath expression which starts with "/", that is, it must be an AbsoluteLocationPath as described in [XPath 1.0]. This ensures that the selection is not relative to a specific location.

If namespaces [XML Names] are used in these XPath expressions, the following rules must be applied while processing XPath:

For each prefix, there must be an ns element as a child of the documentRules element. The ns element has two attributes prefix (for the namespace prefix) and uri (for the namespace URI).
Element and attribute names without a prefix are interpreted as having no namespace.
To avoid a conflict with rule 2., default namespaces must not be used in the XPath expressions.

Example 14: XPath expressions with namespaces and without namespaces

The term element from the TEI is in a namespace http://www.tei-c.org/ns/1.0. The qterm element from DocBook is in no namespace.

<documentRules xmlns="http://www.w3.org/2005/11/its">
 <its:ns its:prefix="tei" its:uri="http://www.tei-c.org/ns/1.0"/>
 <documentRule its:translate="no" its:translateSelector="//tei:term"/>
 <documentRule its:translate="no" its:translateSelector="//qterm"/>
</its:documentRules>

Note: The usage of the ns element is motivated by [Schematron] and compliant to the requirements on namespace bindings described in [Tag Namespace Finding].

Selection can appear in a schema (e.g. as content of the xs:appinfo element), in an instance file or in a separate XML document. The precedence of the processing of the selection information depends on these variations. See also Section 4.2: Precedence between Selections.

Note: The difference between schemaRule and documentRule is that schemaRule has no selector attributes, e.g. no translateSelector attribute. The reason is that schemaRule always refers to the element or attribute declaration of which it is part of. In contrast, documentRule can be used everywhere in a schema to express selection information. It is possible to use schemaRule and documentRule together in a schema.

Example 15: Example for using schemaRule and documentRules together in a schema.

<xs:schema>
 <xs:annotation>
  <xs:appinfo>
   <its:documentRules>
    <its:documentRule its:translate="no" its:translateSelector="//p[@editor='john']"/>
<!-- This rule holds for p elements which are edited by John. -->
   </its:documentRules>
  </xs:appinfo>
 </xs:annotation>
 <xs:element name="p">
  <xs:annotation>
   <xs:appinfo>
    <its:schemaRule its:translate="yes"/>
<!-- This rule holds for all p elements -->
    </xs:appinfo>
   </xs:annotation> ...
  </xs:element> ...
 </xs:schema>

4.1.3 Selection in an Instance Document

In instance documents, selection of ITS information is realized only with data category attributes. It depends on the data category what is being selected. The necessary data category specific defaults are described in Section 5.1: Position and Default Selections of Data Categories.

Example 16: Defaults for various data categories

its:translate="no" at the head element means that the textual content of this element, including child elements and attributes, should not be translated. its:translate="yes" at the body element means that the textual content of this element, including child elements, but excluding attributes should be translated.

its:dir="ltr" at the body element means that the directionality of the textual content of this element, including child elements and attributes, is "left-to-right".

<text>
 <head its:translate="no"> ... </head>
 <body its:translate="yes" its:dir="ltr"> ... </body>
</text>

4.2 Precedence between Selections

The following precedence order is defined for selections of ITS information in various positions (the first item in the list has the highest precedence):

Implicit selection in instance documents (data category attributes on a specific element)
Selections in instance documents (using a documentRules element)
Selections in an external file (using a documentRules element)
In a schema, selections expressed with a documentRules element
Selections expressed with schemaRule (See also the note in Section 4.1.2: Rule-based Selection)
Selections via defaults for data categories, see Section 5.1: Position and Default Selections of Data Categories

In case of conflicts between selections via multiple documentRule elements, the last selector has higher precedence.

Note: The precedence order fulfills the same purpose as the built-in template rules of [XSLT 1.0].

Example 17: Conflicts between selections of ITS information which are resolved using the precedence order

Due to the rules described above, the translatability information from the translateSelector attribute on the p element has precedence over the translatability information on the first documentRule element. A conflict occurs for p elements inside of entry elements, because of the two documentRules elements. This conflict is resolved via the order of the documentRules elements (the last one has higher precedence).

<text>
 <head>
 <its:documentRules>
  <its:documentRule its:translate="yes" its:translateSelector="//p"/>
  <its:documentRule its:translate="no" its:translateSelector="//index/entry/p"/>
 <its:documentRules>
 </head>
 <body> ...
  <p its:translate="no"> ... </p>
 </body>
 <back><index>
  <entry><p> ... </p></entry>
 </index></back>
</text>

4.3 Mapping of ITS Data Categories to Existing Markup

Some markup schemes provide markup which can be used to express ITS data categories. ITS data categories can be mapped to such existing markup, using the selection mechanism described in Section 4.1.2: Rule-based Selection. In this way, there is no need to integrate ITS markup into documents.

Example 18: Mapping of the ITS data categories translatability and terminology to [Dita 1.0] markup

<topic 
    xmlns="http://dita.oasis-open.org/architecture/2005/" 
    xmlns:its="http://www.w3.org/2005/11/its" 
    DITAArchVersion="1.0" id="myTopic">
  <title>The ITS Topic</title>
    <its:documentRules>
      <its:ns prefix="dita"
        uri="http://dita.oasis-open.org/architecture/2005/"/>
      <its:documentRule its:translateSelector="//*[@dita:translate='yes']" 
        its:translate="yes"/>
      <its:documentRule its:translateSelector="//*[@dita:translate='no']" 
        its:translate="no"/>
      <its:documentRule its:termSelector="//dita:term" its:term="yes"/>
      <its:documentRule its:termSelector="//dita:dt" its:term="yes"/>
    </its:documentRules>
  <body>[...] 
  <dlentry id="tDataCat">
    <dt>Data category</dt>
    <dd>ITS defines <term>data
    category</term> as an abstract concept for a particular type
    of information for internationalization and localization of XML
    schemas and documents.
    </dd>
    </dlentry>[...]  
    <p>For the implementation of ITS, apply the rules in the order:</p>
    <ul>
      <li>Default</li>
      <li>Rules in the schema</li>
      <li>Rules in the instance document</li>
      <li>Local attributes </li>
    </ul>
    <p>
      <ph translate="no" xml:lang="fr">Et voilà !</ph> The last rule wins
    </p>
  </body>
</topic>

5 Description of Data Categories

This section is normative.

5.1 Position and Default Selections of Data Categories

The following table summarizes the relations between data categories, location of their selection mechanisms, and default selections in instance documents.

Data category	Applicable in schema	Rule selection applicable	default selection in instance document
Translatability	+	+	Textual content of element, including content of child elements, but excluding attributes
Localization information	+	+	Textual content of element, including content of child elements, but excluding attributes
Terminology	+	+	Textual content of element, including content of child elements, but excluding attributes
Directionality	-	+	Textual content of element, including attributes and child elements
Ruby	-	+	Textual content of element, including content of child elements, but excluding attributes

Note: The data categories differ with respect to defaults in the instance document for compatibility reasons with existing standards and practices. For example, the dir attribute in [XHTML2] refers to the content of the element and all attributes and child elements. Hence, the data category of directionalty selects the same information as the default. On the other hand, it is common practice that information about translatability refers only to textual content of an element. Hence, the data category of translatability selects as a default the same information.

5.2 Translatability

5.2.1 Definition

[Definition: The data category translatability expresses information about whether the content of an element or attribute should be translated or not.]. The values of this data category are "yes" (translatable) or "no" (not translatable).

5.2.2 Implementation

Translatability can be expressed in a schema, in a set of rules, or on an individual element.

In a schema, translatability is expressed with a schemaRule element with a translate attribute. The attribute has the values "yes" or "no".

Example 19: Translatability expressed in a schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule its:translate="yes"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Translatability is expressed with a documentRule element with a translate attribute. The attribute has the values "yes" or "no". In addition, a translateSelector attribute is required.

Example 20: Translatability expressed in document rules

<its:documentRules>
 <its:documentRule its:translate="yes" its:translateSelector="//p"/>
<!-- All p elements should be translated-->
</its:documentRules>

In an instance document, translatability is expressed with a translate attribute with the values "yes" or "no" The selection is the textual content of the element, including child elements, but excluding attributes.

Example 21: Translatability expressed in an instance document

In the body element, its textual content and the content of all elements should be translated. The content of the specified quote element, however, must not be translated.

<book>
 <head>...</head> <body its:translate="yes"> ...  
  <p>And he said: you need a new 
  <quote its:translate="no">motherboard</quote> 
 </p> ...  </body>
</book>

5.3 Localization Information

5.3.1 Definition

[Definition: The data category localization information is used to communicate information to localizers about a particular item of content.]

This data category has several purposes:

Tell the translator how to translate parts of the content
Expand on the meaning or contextual usage of a specific element, such as what a variable refers to or how a string will be used on the user interface
Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.)
Indicate why a piece of text is emphasized (important, sarcastic, etc.)

Two types of informative notes are needed:

An alert contains information that the translator must read before translating a piece of text. Example: an instruction to the translator to leave parts of the text in the source language.
A description provides useful background information that the translator will refer to only if they wish. Example: a clarification of ambiguity in the source text.

5.3.2 Implementation

Localization information can be expressed in a schema, in rules, or on individual elements.

In a schema, localization information is expressed with a schemaRule element with a locInfo attribute. The type of the localization information is expressed with a locInfoType attribute with the values "alert" or "description".

Example 22: Localization information expressed in a schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule
its:locInfo="This has to be handled carefully" its:locInfoType="alert"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Localization information is expressed with a documentRule element with the attributes locInfo and locInfoType . In addition, a locInfoSelector attribute is required.

Example 23: Localization information expressed in rules

<its:documentRules>
 <its:documentRule its:locInfo="This p element has to be handled carefully"
its:locInfoType="alert" its:locInfoSelector="/body/p[1]"/>
</its:documentRules>

In an instance document, localization information is expressed with the attributes locInfo and locInfoType . The selection is the textual content of element, including child elements, but excluding attributes.

Example 24: Localization information expressed in an instance document

<book>
 <head>...</head>
 <body> ...
  <p its:locInfo="This p element has to be handled
   carefully" its:locInfoType="alert">And he said: you need a new
   <quote>motherboard</quote>
  </p> ...
 </body>
</book>

5.4 Terminology

5.4.1 Definition

The terminology data category is used to mark terms. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.

5.4.2 Implementation

The terminology data category can be expressed in a schema, in rules or on individual elements.

In a schema, the terminology data category is expressed with a schemaRule element with a term attribute, which has the value "yes".

Example 25: The terminology data category expressed in a schema

<xs:element name="span">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule its:term="yes"/>
<!-- All span elements are used to mark up terms-->
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

The terminology data category is expressed with a documentRule element with the term attribute, which has the value "yes". A termSelector attribute is required. In addition, an optional termRef attribute can be used to refer to external information about the term. The datatype of termRef is xs:anyURI.

Example 26: The terminology data category expressed in rules

<its:documentRules>
 <its:documentRule its:term="yes" its:termSelector="/body/p[1]/span"
its:termRef="http://example.com/termdatabase/#x142539"/>
</its:documentRules>

In an instance document, the terminology data category is expressed with a term attribute, which has the value "yes", and an optional termRef attribute. The selection is the textual content of the element, including content of child elements, but excluding attributes.

Example 27: The terminology data category expressed in an instance document

<book>
 <head>...</head> <body> ...  <p>And he said: you need a
 new <quote its:term="yes">motherboard</quote></p> ...
 </body> 
</book>

5.5 Directionality

5.5.1 Definition

This data category expresses the directionality of a piece of text. Its values are "ltr", "rtl", "lro" or "rlo". This definition is compliant with the dir attribute in [XHTML2], except that [XHTML2] does not allow for rule-based selection.

5.5.2 Implementation

The dir attribute is used for the implementation of the directionality data category. It has the four values "ltr", "rtl", "lro" or "rlo".

Directionality can be expressed in rules or on individual elements.

Directionality is expressed in rules using a documentRule element with the dir attribute. In addition, a dirSelector attribute is required.

Example 28: Directionality expressed in rules

<its:documentRules>
 <its:documentRule its:dir="rtl" its:dirSelector="/body/p[1]/quote[xml:lang='he']"/>
<!-- Some Hebrew quotation -->
</its:documentRules>

In an instance document, directionality is expressed with a dir attribute. The selection is the textual content of the element, including all child elements and attributes.

Example 29: Directionality expressed in an instance document

<book>
 <head>...</head> 
 <body> ...  <p>And he said: <quote
 its:dir="rtl"> ... a Hebrew quotation ... </quote></p>
 ...  </body> 
</book>

5.6 Ruby

5.6.1 Definition

The data category ruby is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.

5.6.2 Implementation

Ruby can be expressed in an instance document with or without selections.

Ruby in an instance document without selections is realized with a ruby element which contains a rubyBase and a rubyText element.

Example 30: Ruby in an instance document without selection

<text>
 <head> ... </head>
 <body>
  <p>This is about the
    <its:ruby>
     <its:rubyBase>W3C</its:rubyBase>
     <its:rubyText>World Wide Web Consortium</its:rubyText>
    </its:ruby>.
  </p>
 </body>
</text>

Note: The structure of the content model for the ruby element without selection is identical with the structure of ruby in section 5.4 of [OpenDocument], and simple ruby markup as defined in section 1.2.1 in [Ruby-TR].

5.6.3 Handling Legacy Content

In legacy situations, where one cannot change the element markup and there one want to apply ruby text to an attribute or existing element content, then the following approaches can be used.

Ruby in an instance document with selections is expressed with a documentRule element with two attributes:

A rubyText attribute contains the ruby text (corresponding to the rubyText element in the case of no selections)
A rubySelector attribute contains the selector. It selects the ruby base text, corresponding to the rubyBase element in the case of no selection.

Example 31: Ruby with a documentRule element

<text ...>
 <head> ... </head>
  <its:documentRules>
   <its:documentRule its:rubyText="World Wide Web Consortium"
    its:rubySelector="/body/img[1]/@alt"/>
  </its:documentRules>
 <body>
  <img src="w3c_home.png" alt="W3C"/> ...
 </body>
</text>

6 Modularizations of ITS with Existing Markup Schemes

This section is informative.

[Ed. note: This section will be mostly written in a subsequent working draft. In a longer term, the working group plans to publish a separate document out of this section.]

Two topics are covered in this section:

How should ITS be integrated in specific markup schemes? For example, as for XHTML, it is helpful for the interoperability of ITS implementations to specify that the documentRules element will always be part of the content model of the head element.
How should ITS data categories be related to existing markup declarations in a schema, which fulfill identical or overlapping purposes? For example, [Dita 1.0] already has an attribute to indicate translatability of text, but without a mechanism for selection of information in documents and schemas.

6.1 ITS and XHTML 1.0

6.2 ITS and DocBook

TODO

6.3 ITS and Open Document Format 1.0

TODO

6.4 ITS and DITA 1.0

TODO

6.5 ITS and TEI

The TEI ([TEI]) is intended for literary and linguistic material, and is most often used for digital editions of existing printed material. It is also suitable, however, for general purpose writing. The P5 release of the TEI consists of 23 modules which can be combined together as needed.

6.5.1 Integration of ITS into TEI

The TEI is maintained as a single ODD document, and customizations of it are also written as ODD documents. These are processed using XSLT stylesheets to make a tailored user-level schema in XML DTD, XML Schema or RELAX NG.

The ITS additions involve two changes to TEI:

Allowing documentRules to appear in the TEI metadata section (the teiHeader).
Adding the ITS data category attributes to the TEI global attribute set.

Both of these can be easily achieved using standard techniques in ODD.

The body of a TEI/ITS customization consists of a schemaSpec which lists the modules to be included (this example includes six common ones):

Example 32: A schemaSpec element with modules to be included

<schemaSpec ident="tei-its" start="TEI">
 <moduleRef key="header"/>
 <moduleRef key="core"/>
 <moduleRef key="tei"/>
 <moduleRef key="textstructure"/>
 <moduleRef key="namesdates"/>
 <moduleRef key="msdescription"/> ...

In addition, we load the ITS schema (in its RELAX NG XML format, the language used by the TEI for expressing content models), and overload the definition of the TEI content class model.headerPart to include the ITS documentRules :

Example 33: Inclusion of ITS documentRules into the TEI schema

<moduleRef url="its.rng">
 <content>
 <rng:define name="model.headerPart" combine="choice">
  <rng:ref name="documentRules"/>
 </rng:define>
 </content>
</moduleRef>

The content class determines which elements are allowed as children of teiHeader. Lastly, we change the definition of the global attribute class att.global to reference the ITS data category attributes (available from the ITS schema we loaded earlier):

Example 34: Addition of the ITS data category attributes to the global attributes

<classSpec ident="att.global" type="atts" mode="change">
  <attList>
   <attRef name="att.selector.attributes"/>
   <attRef name="att.datacats.attributes"/>
  </attList>
 </classSpec>
... </schemaSpec>

When processing, this customization produces a schema which permits markup like this:

Example 35: Instance document which is valid against a schema TEI+ITS

<TEI
 xmlns:its="http://www.w3.org/2005/11/its"
 xmlns="http://www.tei-c.org/ns/1.0">
 <teiHeader>
  <fileDesc>
<!-- details of the file -->
  </fileDesc>
  <documentRules xmlns="http://www.w3.org/2005/11/its">
   <ns its:prefix="t" its:uri="http://www.tei-c.org/ns/1.0"/>
   <documentRule its:translate="no"
     its:translateSelector="//t:body/t:p/@*"/>
   <documentRule its:translate="yes"
     its:translateSelector="//t:body/t:p"/>
  </documentRules>
 </teiHeader>
 <text>
  <body>
   <p rend="normal">Hello  <hi>world</hi></p>
   <p rend="special">Goodbye</p>
   <p its:translate="no">This must not be translated</p>
  </body>
 </text>
</TEI>

In this example, a set of documentRule elements are provided in the header to provide rules, and the body of the text performs a specific override.

6.6 ITS and XML Spec

[XMLSPEC] is intended for W3C working drafts, notes, recommendations, and all other document types that fall under the category of technical reports. XML Spec is available in the formats of XML DTD, XML Schema and RELAX NG.

6.6.1 Integration of ITS into XML Spec

ITS has been integrated into xmlspec-i18n.dtd. This is a version of the XML DTD version 2.9 of XML Spec which already supplies various internationalization and localization related features. For example, there is an attribute translate in xmlspec-i18n.dtd, which can be used for the same purposes as the ITS translate attribute. To be able to separate them from original XML Spec declarations, all additions are stored in two separate files i18n-extensions.mod and i18n-elements.mod. Xmlspec-i18n.dtd is used within the W3C Internationalization Activity for the creation of technical reports.

For the integration of ITS, the following modifications to the xmlspec-i18n.dtd have been made:

A new entity <!ENTITY % its SYSTEM "its.dtd"> and the entity call %its; have been added to xmlspec-i18n.dtd.
The existing XML Spec entity %common.att; has been modified . The ITS entities %att.datacats.attributes; and %att.selector.attributes; have been added to %common.att;. In this way, the data category attributes and the selector attributes can be used at any element defined in the XML Spec DTD.
The XML Spec entity %header.mdl; contains the content model of the header element. The ITS element documentRules has been added as the last element to this content model. In this way, documentRules can be used inside an XML Spec instance. The header element of the XML Spec DTD has been chosen as the place for documentRules , to avoid the impact of ITS markup on XML Spec markup.
The ITS element ruby has been added to the XML Spec entity %p.pcd.mix;. In this way it is possible to use ruby as an inline element.

6.6.2 Relating ITS to Existing Markup in XML Spec

As mentioned before, xmlspec-i18n.dtd has its own existing markup declarations for various internationalization and localization related purposes. In the original XML Spec 2.9 DTD, there is a term element which fulfills the same purpose as the ITS term attribute.

To relate such existing XML Spec and xmlspec-i18n.dtd related markup to ITS markup (see Section 4.3: Mapping of ITS Data Categories to Existing Markup), the following documentRules has been created. [Ed. note: This is not an exhaustive list of mappings yet, but only a first attempt].

Example 36: Mapping ITS markup to XML Spec and xmlspec-i18n.dtd markup

<its:documentRules xmlns:its="http://www.w3.org/2005/11/its">
 <!--The following rules are for xmlspec-i18n.dtd-->
 <its:documentRule its:term="yes" its:termSelector="//qterm"/>
 <its:documentRule its:dir="ltr" its:dirSelector="//*[@dir='ltr']"/>
 <its:documentRule its:dir="rtl" its:dirSelector="//*[@dir='rtl']"/>
 <its:documentRule its:dir="lro" its:dirSelector="//*[@dir='lro']"/>
 <its:documentRule its:dir="rlo" its:dirSelector="//*[@dir='rlo']"/>
 <its:documentRule its:locInfo="" its:locInfoType="alert"
   its:locInfoSelector="//@locn-alert"/>
 <its:documentRule its:locInfo="" its:locInfoType="description"
   its:locInfoSelector="//@locn-note"/>
 <its:documentRule its:translate="yes" 
   its:translateSelector="//*[@translate='yes']"/>
 <its:documentRule its:translate="no" 
   its:translateSelector="//*[@translate='no']"/>
 <!--This rule is for the original XML Spec DTD-->
 <its:documentRule its:term="yes" its:termSelector="//term"/>
</its:documentRules>

Since both XML Spec and xmlspec-i18n.dtd do not define a namespace, the mappings use XPath expressions with unqualified element and attribute names.

7 Markup Declarations

This section is normative.

7.1 Declaration of the Span Element

7.1.1 The ITS Span Element

The span element can be used if a markup scheme has no element to which data category attributes can be attached. span contains these attributes and serves as a hook for using them in XML documents.

Example 37: Using the span element

<text>
 <head>[...]</head>
 <body> ...
  <its:span its:translate="no"> ... </its:span>
 </body>
</text>

The span element contains the data category attributes.

span

[1]	`span`	::=	`element :span { span.content, span.attributes }`
[2]	`span.content`	::=	`text`
[3]	`span.attributes`	::=	`att.datacats.attributes, empty`

7.2 Declarations of General Datatypes

A data type data.selector is defined for selector attributes. Its value is an XPath expression [XPath 1.0]. A data type itsBoolean is defined for boolean values, e.g. to express translatability. The data type dirValues is used for the data category attribute dir . The data type locInfoType is used to express the type of the locInfo attribute. The data type itsBooleanTrue is used for the term attribute.

data.selector

[4] data.selector ::= text

data.itsBoolean

[5] data.itsBoolean ::= "yes" | "no"

data.dirValues

[6] data.dirValues ::= "ltr" | "rtl" | "lro" | "rlo"

data.locInfoType

[7] data.locInfoType ::= "alert" | "description"

data.itsBooleanTrue

[8] data.itsBooleanTrue ::= "yes"

7.3 Declarations of Data Categories

The attribute group att.datacats is used to express the ITS data categories. It makes use of the data type data.itsBoolean.

att.datacats

[9]	`att.datacats.attributes`	::=	`att.datacats.attribute.translate, att.datacats.attribute.locInfo, att.datacats.attribute.locInfoType, att.datacats.attribute.term, att.datacats.attribute.termRef, att.datacats.attribute.dir, att.datacats.attribute.rubyText, empty`
[10]	`att.datacats.attribute.translate`	::=	`attribute translate { data.itsBoolean }?`
[11]	`att.datacats.attribute.locInfo`	::=	`attribute locInfo { text }?`
[12]	`att.datacats.attribute.locInfoType`	::=	`attribute locInfoType { data.locInfoType }?`
[13]	`att.datacats.attribute.term`	::=	`attribute term { data.itsBooleanTrue }?`
[14]	`att.datacats.attribute.termRef`	::=	`attribute termRef { xsd:anyURI }?`
[15]	`att.datacats.attribute.dir`	::=	`attribute dir { data.dirValues }?`
[16]	`att.datacats.attribute.rubyText`	::=	`attribute rubyText { text }?`

The elements ruby , rubyBase and rubyText are used for the implementation of the Ruby data category. If change the element markup in an XML document is not possible, the rubyText and rubySelector attributes should be used.

rubyBase

[17]	`rubyBase`	::=	`element :rubyBase { rubyBase.content }`
[18]	`rubyBase.content`	::=	`text`

ruby

[19]	`ruby`	::=	`element :ruby { ruby.content }`
[20]	`ruby.content`	::=	`rubyBase, rubyText`

rubyText

[21]	`rubyText`	::=	`element :rubyText { rubyText.content }`
[22]	`rubyText.content`	::=	`text`

7.4 Declaration of Selector Attributes

The attribute group att.selector is used at the documentRule element to express applicability of ITS information. It must not be used in other positions, e.g. individual elements. It makes use of the data type data.selector.

att.selector

[23]	`att.selector.attributes`	::=	`att.selector.attribute.translateSelector, att.selector.attribute.locInfoSelector, att.selector.attribute.termSelector, att.selector.attribute.dirSelector, att.selector.attribute.rubySelector, empty`
[24]	`att.selector.attribute.translateSelector`	::=	`attribute translateSelector { data.selector }?`
[25]	`att.selector.attribute.locInfoSelector`	::=	`attribute locInfoSelector { data.selector }?`
[26]	`att.selector.attribute.termSelector`	::=	`attribute termSelector { data.selector }?`
[27]	`att.selector.attribute.dirSelector`	::=	`attribute dirSelector { data.selector }?`
[28]	`att.selector.attribute.rubySelector`	::=	`attribute rubySelector { data.selector }?`

7.5 Declarations of the SchemaRule and DocumentRules Elements

The schemaRule element contains rules for ITS information, to be used as schema annotation. It uses attributes from the ITS data categories.

schemaRule

[29]	`schemaRule`	::=	`element :schemaRule { schemaRule.content, schemaRule.attributes }`
[30]	`schemaRule.content`	::=	`empty`
[31]	`schemaRule.attributes`	::=	`att.datacats.attributes, empty`

The documentRules element contains zero or more ns elements, followed by one or more documentRule elements. The documentRule element contains attributes from the data category attributes, and the selector attributes.

documentRules

[32]	`documentRules`	::=	`element :documentRules { documentRules.content, documentRules.attributes }`
[33]	`documentRules.content`	::=	`ns*, documentRule+`
[34]	`documentRules.attributes`	::=	`empty`

ns

[35]	`ns`	::=	`element :ns { ns.content, ns.attributes }`
[36]	`ns.content`	::=	`empty`
[37]	`ns.attributes`	::=	`att.nsident.attributes, empty`

att.nsident

[38]	`att.nsident.attributes`	::=	`att.nsident.attribute.prefix, att.nsident.attribute.uri, empty`
[39]	`att.nsident.attribute.prefix`	::=	`attribute prefix { xsd:NCName }`
[40]	`att.nsident.attribute.uri`	::=	`attribute uri { xsd:anyURI }`

documentRule

[41]	`documentRule`	::=	`element :documentRule { documentRule.content, documentRule.attributes }`
[42]	`documentRule.content`	::=	`empty`
[43]	`documentRule.attributes`	::=	`att.selector.attributes, att.datacats.attributes, empty`

8 Conformance

This section is normative.

Conformance to ITS falls into two categories: conformance to the ITS data categories (cf. Section 5: Description of Data Categories, including data category specific default selections) and conformance to selection mechanisms (cf. Section 4: Selection of ITS information).

8.1 Conformance to the ITS Data Categories and Data Category Specific Default Selection Mechanisms

[Ed. note: We still have to add conformance information for ruby, and possibly for directionality.]

An implementation of the ITS data categories is conformant if it supplies a schema which adopts the ITS data categories, with the following constraints:

The schema must allow the usage of the attribute group att.datacats at every element which is declared in the schema.
The interpretation of data category attributes in instance documents must be conformant to the data category specific default selections described in Section 5.1: Position and Default Selections of Data Categories.
The schema should allow the usage of the documentRules element at one or more elements in the schema.

The schemaRule element is to be used as schema annotations. It is the responsibility of the schema processor to allow for such annotations.

Example 38: A schema which is conformant to the ITS data categories

<xs:schema xmlns:myns="http://example.com/mySchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:its="http://www.w3.org/2005/11/its"
targetNamespace="http://example.com/mySchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
 <xs:import namespace="http://www.w3.org/2005/11/its" schemaLocation="its.xsd"/>
 <xs:element name="document">
  <xs:complexType>
   <xs:sequence>
     <xs:element ref="myns:head"/>
     <xs:element ref="myns:body"/>
   </xs:sequence>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:attributeGroup name="commonAtts">
  <xs:attributeGroup ref="its:att.datacats.attributes"/>
  <xs:attributeGroup ref="its:att.selector.attributes"/>
 </xs:attributeGroup>
 <xs:element name="head">
  <xs:complexType>
   <xs:choice minOccurs="0" maxOccurs="unbounded">
     <xs:element ref="its:documentRules"/>
     <xs:element ref="its:documentRule"/>
   </xs:choice>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:element name="body">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="myns:para" maxOccurs="unbounded"/>
   </xs:sequence>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:element name="para">
  <xs:complexType mixed="true">
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
</xs:schema>

8.2 Conformance to Selection Mechanisms

Conformance to Selection Mechanisms encompasses conformance to the ITS data categories and data category specific default selection mechanisms, with the following changes:

The schema must allow the usage of the documentRules element in at least one element in the schema
An application which processes ITS elements and attributes must process the selection mechanisms described in Section 4.2: Precedence between Selections

A mandatory part of this conformance criterion is the usage of XPath. An application which processes ITS selection rules must be able to process XPath in version 1.0 or higher. It is not required to support a specific host language of XPath, like for example [XSLT 1.0].

Internationalization Tag Set (ITS)

W3C Working Draft 22 February 2006

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Background: Motivation for ITS

1.2 Out of Scope

1.3 Important Design Principles

1.4 Development of this Specification

2 Basic Concepts

3 Notation and Terminology

3.1 Notation

3.2 Schema Language and Schema Annotation

3.3 Data category

3.4 Selection

4 Selection of ITS information

4.1 Locations of Data Categories and Selection Mechanisms

4.1.1 Schema Annotation

4.1.2 Rule-based Selection

4.1.3 Selection in an Instance Document

4.2 Precedence between Selections

4.3 Mapping of ITS Data Categories to Existing Markup

5 Description of Data Categories

5.1 Position and Default Selections of Data Categories

5.2 Translatability

5.2.1 Definition

5.2.2 Implementation

5.3 Localization Information

5.3.1 Definition

5.3.2 Implementation

5.4 Terminology

5.4.1 Definition

5.4.2 Implementation

5.5 Directionality

5.5.1 Definition

5.5.2 Implementation

5.6 Ruby

5.6.1 Definition

5.6.2 Implementation

5.6.3 Handling Legacy Content

6 Modularizations of ITS with Existing Markup Schemes

6.1 ITS and XHTML 1.0

6.2 ITS and DocBook

6.3 ITS and Open Document Format 1.0

6.4 ITS and DITA 1.0

6.5 ITS and TEI

6.5.1 Integration of ITS into TEI

6.6 ITS and XML Spec

6.6.1 Integration of ITS into XML Spec

6.6.2 Relating ITS to Existing Markup in XML Spec

7 Markup Declarations

7.1 Declaration of the Span Element

7.1.1 The ITS Span Element

7.2 Declarations of General Datatypes

7.3 Declarations of Data Categories

7.4 Declaration of Selector Attributes

7.5 Declarations of the SchemaRule and DocumentRules Elements

8 Conformance

8.1 Conformance to the ITS Data Categories and Data Category Specific Default Selection Mechanisms

8.2 Conformance to Selection Mechanisms

A Schemas for ITS

B References

C References (Non-Normative)

D Revision Log (Non-Normative)

E Acknowledgements (Non-Normative)