[ contents ]
Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines data categories and their implementation as a set of elements and attributes called the Internationalization Tag Set (ITS). ITS is designed to be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD, XML Schema and RELAX NG. The document provides examples of how ITS can be used with existing vocabularies. Feedback is especially appreciated on the mechanisms defined for the selection of ITS specific information in documents and schemas, and on the design of the individual data categories.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is an updated Public Working Draft of "Internationalization Tag Set (ITS)".
This document defines data categories and their implementation as a set of elements and attributes called the Internationalization Tag Set (ITS). ITS is designed to be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD, XML Schema and RELAX NG. The document provides examples of how ITS can be used with existing vocabularies. Since the last version of this document, basic concepts of ITS and the definitions of many data categories have been stabilized. Feedback is especially appreciated on the mechanisms defined for the selection of ITS specific information in documents and schemas, and on the design of the individual data categories.
This document was developed by the ITS Working Group, part of the W3C Internationalization Activity. The Working Group expects to advance this Working Draft to Recommendation Status. A list of changes to this document is available.
The Working Group is managing comments on this document using W3C's public Bugzilla system. We recommend using Bugzilla for making comments (instructions can be found at How to use the Issues Tracking System for the ITS Tagset Working Draft). If this is not feasible, comments may also be sent to www-i18n-comments@w3.org. Use "Comment on its tagset WD" in the subject line of your email. ITS tagset related comments and issues in Bugzilla and the www- i18n-comments archives are publicly available.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is informative.
This document defines data categories and their implementation as a schema that can be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD [XML 1.0], XML Schema [XML Schema] and RELAX NG [RELAX NG]. The document provides examples of how ITS can be used with existing vocabularies.
Requirements for the internationalization and localization related to markup are formulated in [ITS REQ]. Not all of these requirements are addressed in this document, for example:
The Working Group will cover some of the requirements in a separate document on techniques for internationalization and localization of schemas and XML instances.
Content or software that is authored in one language (i.e. source language) is often made available in additional languages. This is done through a process called localization, where the original material is translated and adapted to the target audience.
From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization. For a detailed explanation of the terms "localization" and "internationalization", see [l10n i18n].
The increasing usage of XML as a medium for documentation-related content (e.g. DocBook [DocBook], a format for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the eXtensible User Interface Language [XUL]) creates challenges and opportunities in the domain of XML internationalization and localization.
The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism which identifies which parts of an XML instance need to be translated (the text in bold face shows the parts that need to be localized). Tools often cannot automatically do this identification.
PhaseCode
should not be translated; the title
attribute sometimes has to be translated and sometimes must not be translated.
<Manual> <Info> <PhaseCode>Review Level</PhaseCode> <FormNo>8U81-GS-52C</FormNo> <Name>Owner's Manual</Name> ... </Info> <Section id="0" title="#Introduction#"> <Ltitle id="005" title="#ZOOM#"> <Mtitle id="00501" title="Getting started" option="no" cols="1"> <MultiCol cols="1"> <Text>Some text to localize</Text> ... </Multicol> </Mtitle> </Ltitle>... </Manual>
The first file name in the first component
element would not be
translated.
<dialogue xml:lang="en-gb"> <rsrc id="123"> <component id="456" type="image"> <data type="text">images/cancel.gif</data> <data type="coordinates">12,20,50,14</data> </component> <component id="789" type="caption"> <data type="text">Cancel</data> <data type="coordinates">12,34,50,14</data> </component> </rsrc> </dialogue>
In the example below, there are no clear mechanism allowing one to know which string
element needs to be translated.
<resources> <section id="Homepage"> <arguments> <string>page</string> <string>childlist</string> </arguments> <variables> <string>POLICY</string> <string>Corporate Policy</string> </variables> <keyvalue_pairs> <string>Page</string> <string>ABC Corporation - Policy Repository</string> <string>Footer_Last</string> <string>Pages</string> <string>bgColor</string> <string>NavajoWhite</string> <string>title</string> <string>List of Available Policies</string> </keyvalue_pairs> </section> </resources>
The data categories and their implementation as a schema does not address document-external mechanisms or data formats for describing localization-relevant information over and above what is appropriate for inclusion in the format itself. Such mechanisms and data formats, also sometimes called XML Localization Properties, are out of the scope of this document. However, this document specifies a methodology how localization properties and information about internationalization and localization can be applied to various places in schemas and instance documents. See Section 4: Selection of ITS information.
Abstraction via data categories: ITS defines data categories as a description of information for internationalization and localization of XML schemas and documents. This description is independent of its implementation e.g. using an element or attribute. See Section 3.3: Data category for a definition of the term data categories, Section 5: Description of Data Categories for the definition of the various ITS data categories, and Section 7: Markup Declarations for the data category implementations.
Selection mechanisms, here exemplified by the translatability data category: Content authors need a simple way to express
whether the content of an element or attribute should be translated or not, e.g. a
translate
attribute. On the other hand, for translations of large document
sets based on the same schema, a specification of defaults for translatability and
exceptions from the defaults is important (e.g. all p
elements should be
translated, but not p
elements inside of an index
element). This
specification responds to these requirements by introducing mechanisms for specifying
ITS information in XML documents or schemas, see Section 4: Selection of ITS information. This method also provides a means for specifying ITS information for attributes (a
task for which no standard means yet exists). The ITS mechanisms for selection are:
useable for both XML schemas and XML instances
useable local (at the XML node to which it pertains) or globally (not at the XML node to which it pertains)
Extensibility: It may be useful or necessary to extend the set of information available for internationalization or localization purposes beyond what is provided by ITS. This specification does not define a general extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces [XML Names]) may be used.
Ease of integration:
ITS follows the example from section 4 of [XLink 1.1], by providing mostly global attributes for the implementation of ITS data categories. Avoiding elements for ITS purposes as much as possible ensures ease of integration into existing markup schemes, see section 3.14 in [ITS REQ]. Only for some requirements do additional child elements have to be used, see for example Section 5.6: Ruby.
ITS has no dependency on technologies which are yet to be developed
ITS fits with existing work in the W3C architecture (e.g. use of XPath [XPath 1.0] as a selection mechanism)
This specification has been developed using the ODD (One Document Does it all) language of the Text Encoding Initiative ([TEI]). This is a literate programming language for writing XML schemas, with three characteristics:
The element and attribute set is specified using an XML vocabulary which includes support for macros (like DTD entities, or schema patterns), a hierarchical class system for attributes and elements, and creation of modules.
The content models for elements and attributes is written using embedded RELAX NG XML notation.
Documentation for elements, attributes, value lists etc is written inline, along with examples and other supporting material.
XSLT transform are provided by the TEI to extract documentation in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.
This section is informative.
Information (e.g. "translate this") captured by ITS markup (e.g. "its:translate='yes'") always pertains to one or more XML nodes (mainly element and attribute nodes). ITS markup explicitly or implicitly selects these XML node(s). ITS distinguishes two ways of selecting XML nodes: locally, and with global rules.
The mechanisms defined for ITS selection resemble those defined in [CSS2]. Element-specific ITS information can be compared to the style attribute in CSS, and ITS information in global rules is similar to the style element in CSS. In contrast to CSS, ITS uses XPath for identifying nodes.
the selection in an instance approach puts ITS markup in the relevant element of the
host vocabulary (e.g. the author
element in DocBook)
the rule-based approach puts the ITS markup in elements defined by ITS itself (i.e. the documentRule element)
ITS markup can be used with XML instances (e.g. a DocBook article), or schemas (e.g. an XSD for a proprietary document format). Since each usage defines some specific requirements, ITS markup in XML instances may look slightly different than ITS markup in schemas.
The following three examples sketch the distinction between the local and global approaches, and the difference between ITS in XML instances and schemas.
<article xmlns="http://docbook.org/ns/docbook" xmlns:its="http://www.w3.org/2005/11/its" its:translate="yes"> <info> <title>An example article</title> <author its:translate="no"> <personname> <firstname>John</firstname> <surname>Doe</surname> </personname> <affiliation> <address><email>foo@example.com</email></address> </affiliation> </author> </info>... </article>
<dita:topic xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns:its="http://www.w3.org/2005/11/its" DITAArchVersion="1.0" id="myTopic"> <dita:title>ITS and Namespaces</dita:title> <its:documentRules> <its:ns its:prefix="dita" its:uri="http://dita.oasis-open.org/architecture/2005/"/> <its:documentRule its:translateSelector="//dita:term" its:translate="no" /> </its:documentRules> <dita:body> <dita:p>An <dita:term>ITS namespace</dita:term> definition exists ....</dita:p> </dita:body> </dita:topic>
<xs:schema> <xs:element name="term"> <xs:annotation> <xs:appinfo> <its:schemaRule its:translate="no"/> </xs:appinfo> </xs:annotation> ... </xs:element> ... </xs:schema>
The commonality in all of the examples above is the markup "its:translate='no'". This piece of ITS markup can be interpreted as follows:
it pertains to the data category translatability
the ITS data category attribute translate holds a value of "no"
The examples with global and local usage of ITS markup show that ITS data category attributes in some cases appear in elements defined by ITS itself: the documentRule element (embedded within a documentRules element), or the schemaRule element. It should come as no surprise that one difference between these two elements is where they are used:
documentRule : may appear in XML instances and schemas
schemaRule : may only appear in schemas
A less obvious, but important difference between documentRule and schemaRule is the following: in addition to one or more ITS data category attributes, documentRule contains a corresponding set of ITS selector attributes (in the example translateSelector ). As their name suggests, they select (or designate) one or more XML nodes (namely those to which a corresponding ITS data category attribute pertains). The value of ITS selector attributes are XPath absolute location paths. Information for to the handling of namespaces in these path expression is contained in the ITS element ns which is a child of documentRules .
ITS selector attributes allow:
ITS data category attributes to appear in global rules (even outside of an XML instance or schema)
ITS data categories attributes to pertain to sets of XML nodes (for example all
p
elements in an XML instance)
ITS markup to pertain to attributes
ITS markup to map to existing markup (for example the
term
element in DITA)
The power of ITS selector attributes comes at a price: rules related to overwriting/precedence, and inheritance, have to be established.
<text> <head> <its:documentRules> <its:documentRule its:translate="yes" its:translateSelector="//p"/> <its:documentRules> </head> <body> ... <p its:translate="no"> ... <dl><dt>...</dt><dd>...</dd></dl></p> </body> </text>
In this example, the ITS data category attribute
translate
appears twice: in
a
documentRule
, and on a specific p
element. Since the ITS selector
attribute in the
documentRule
selects all p
elements, the question
arises what the value for the translate data category of
the p
element which has local markup is. ITS provides precedence and inheritance
rules which answer questions like this. In the example, the value is "no" (that
is the content of the p
element should not be translated).
This section is normative.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].
The namespace URI that must be used by implementations of this specification is:
http://www.w3.org/2005/11/its
The namespace prefix used in this specification for this URI is "its". It is recommended that implementations of this specification use this prefix.
In addition, the following namespaces are used in this document:
http://www.w3.org/2001/XMLSchema
for the XML Schema namespace, here
used with the prefix "xs"
http://relaxng.org/ns/structure/1.0
for the RELAX NG namespace, here
used with the prefix "rng"
[Definition: Schema language refers in this specification to XML DTD, XML Schema or RELAX NG.]
[Definition: Schema annotation is a schema language specific means to provide information about element, attribute, type etc. declarations. This information is not used by the schema processor, but for external, validation independent applications.]
[Definition: ITS defines data category as an abstract concept for a particular type of information for internationalization and localization of XML schemas and documents.] The concept of a data category is independent of its implementation in an XML environment (e.g. using an element or attribute).
For each data category, ITS distinguishes between the following:
the prose description, see Section 5: Description of Data Categories
schema language independent formalization, see Section 7: Markup Declarations
schema language specific implementations, see Appendix A: Schemas for ITS
The data category translatability conveys information as to whether a piece of content should be translated or not.
The simplest formalization of this prose description on a schema language independent level is a translate attribute with two possible values: "yes" and "no". An implementation on a schema language specific level would be the declaration of the translate attribute in e.g. an XML DTD, an XML Schema document or an RELAX NG document.
An alternative formalization on a schema language independent level is a schemaRule element which conveys with a translate attribute information about translatability. An implementation on a schema language specific level is the declaration of the schemaRule element.
[Definition: selection encompasses mechanisms to specify to what parts of an XML document or schema an ITS data category and its values should be applied to.]. Selection is discussed in detail in Section 4: Selection of ITS information.
This section is normative.
Selections of ITS Information can appear in three places:
in a schema: ITS data categories are expressed as schema annotation, and the selection is the element or attribute declaration which is being annotated
global rules: the selection is realized as a selector attribute, which appears together with a data category attribute. The selector attribute contains an AbsoluteLocationPath as described in [XPath 1.0]
in an instance document: the selection is realized using a data category attribute, which is attached to the selected element node. There is no additional selector attribute. The default selection for each data category defines whether the selection covers attributes and child elements. See Section 5.1: Position and Default Selections of Data Categories.
The various selection mechanisms are defined in detail below.
In Schemas, selection of ITS information is realized with schema annotation. The selection for a data category depends on the position of the schema annotation. Since schema annotation mechanisms are schema language specific, the following definitions are made:
[Definition:
selection of elements in XML Schema is expressed with an
xs:appinfo
element which is a direct child of the xs:element
element and which contains a
schemaRule
element, which has one or more
data category
attributes.]
<xs:element name="p"> <xs:annotation> <xs:appinfo> <its:schemaRule its:translate="yes"/> </xs:appinfo> </xs:annotation> ... </xs:element>
[Definition:
selection of attributes in XML
Schema is expressed with an xs:appinfo
element which is a direct
child of the xs:attribute
element and which contains a
schemaRule
element, which has one or more data category attributes.]
<xs:attribute name="alt"> <xs:annotation> <xs:appinfo> <its:schemaRule its:translate="yes"/> </xs:appinfo> </xs:annotation> ... </xs:attribute>
[Definition:
selection of elements in RELAX NG is expressed with a
schemaRule
element which is a direct child of the
rng:element
element, and which has one or more data category attributes
.]
<element name="p"> <its:schemaRule its:translate="yes"/> ... </element>
[Definition:
selection of attributes in RELAX NG is expressed with a
schemaRule
element which is a direct child of the
rng:attribute
element, and which has one or more data category attributes.]
<attribute name="p"> <its:schemaRule its:translate="yes"/> ... </attribute>
As for XML DTD, this specification defines no selection mechanism within the DTD.
Note: To be able to select elements or attributes defined within a XML DTD, the mechanisms described in Section 4.1.2: Rule-based Selection can be used.
Several data categories on the same element or attribute declaration should be expressed at the same schemaRule element.
Rule-based selection is implemented using the documentRules element. It contains one or more documentRule elements. Each documentRule element has one or more data category attributes, and for each data category attribute an selector attribute which points to the selected information.
The naming convention for the selector attributes
is data category + Selector
, e.g.
translateSelector
. In ITS rules selections, the value of the attribute must be an XPath expression which starts with
"/
", that is, it must be an AbsoluteLocationPath as described in [XPath 1.0].
This ensures that the selection is not relative to a specific location.
If namespaces [XML Names] are used in these XPath expressions, the following rules must be applied while processing XPath:
For each prefix, there must be an ns element as a child of the documentRules element. The ns element has two attributes prefix (for the namespace prefix) and uri (for the namespace URI).
Element and attribute names without a prefix are interpreted as having no namespace.
To avoid a conflict with rule 2., default namespaces must not be used in the XPath expressions.
The term
element from the TEI is in a namespace
http://www.tei-c.org/ns/1.0
. The qterm
element from DocBook
is in no namespace.
<documentRules xmlns="http://www.w3.org/2005/11/its"> <its:ns its:prefix="tei" its:uri="http://www.tei-c.org/ns/1.0"/> <documentRule its:translate="no" its:translateSelector="//tei:term"/> <documentRule its:translate="no" its:translateSelector="//qterm"/> </its:documentRules>
Note: The usage of the ns element is motivated by [Schematron] and compliant to the requirements on namespace bindings described in [Tag Namespace Finding].
Selection can appear in a schema (e.g. as content of the xs:appinfo
element), in an instance file or in a separate XML document. The precedence of the
processing of the selection information depends on these variations. See also Section 4.2: Precedence between Selections.
Note: The difference between schemaRule and documentRule is that schemaRule has no selector attributes, e.g. no translateSelector attribute. The reason is that schemaRule always refers to the element or attribute declaration of which it is part of. In contrast, documentRule can be used everywhere in a schema to express selection information. It is possible to use schemaRule and documentRule together in a schema.
<xs:schema> <xs:annotation> <xs:appinfo> <its:documentRules> <its:documentRule its:translate="no" its:translateSelector="//p[@editor='john']"/> <!-- This rule holds for p elements which are edited by John. --> </its:documentRules> </xs:appinfo> </xs:annotation> <xs:element name="p"> <xs:annotation> <xs:appinfo> <its:schemaRule its:translate="yes"/> <!-- This rule holds for all p elements --> </xs:appinfo> </xs:annotation> ... </xs:element> ... </xs:schema>
In instance documents, selection of ITS information is realized only with data category attributes. It depends on the data category what is being selected. The necessary data category specific defaults are described in Section 5.1: Position and Default Selections of Data Categories.
its:translate="no"
at the head
element means that the textual
content of this element, including child elements and attributes, should not be
translated. its:translate="yes"
at the body
element means that
the textual content of this element, including child elements, but excluding
attributes should be translated.
its:dir="ltr"
at the body
element means that the
directionality of the textual content of this element, including child elements and
attributes, is "left-to-right".
<text> <head its:translate="no"> ... </head> <body its:translate="yes" its:dir="ltr"> ... </body> </text>
The following precedence order is defined for selections of ITS information in various positions (the first item in the list has the highest precedence):
Implicit selection in instance documents (data category attributes on a specific element)
Selections in instance documents (using a documentRules element)
Selections in an external file (using a documentRules element)
In a schema, selections expressed with a documentRules element
Selections expressed with schemaRule (See also the note in Section 4.1.2: Rule-based Selection)
Selections via defaults for data categories, see Section 5.1: Position and Default Selections of Data Categories
In case of conflicts between selections via multiple documentRule elements, the last selector has higher precedence.
Note: The precedence order fulfills the same purpose as the built-in template rules of [XSLT 1.0].
Due to the rules described above, the translatability information from the
translateSelector
attribute on the p
element has precedence over
the translatability information on the first
documentRule
element. A conflict occurs for p
elements inside of entry
elements, because of the two
documentRules
elements. This conflict is resolved via the order of the
documentRules
elements (the last one has higher precedence).
<text> <head> <its:documentRules> <its:documentRule its:translate="yes" its:translateSelector="//p"/> <its:documentRule its:translate="no" its:translateSelector="//index/entry/p"/> <its:documentRules> </head> <body> ... <p its:translate="no"> ... </p> </body> <back><index> <entry><p> ... </p></entry> </index></back> </text>
Some markup schemes provide markup which can be used to express ITS data categories. ITS data categories can be mapped to such existing markup, using the selection mechanism described in Section 4.1.2: Rule-based Selection. In this way, there is no need to integrate ITS markup into documents.
<topic xmlns="http://dita.oasis-open.org/architecture/2005/" xmlns:its="http://www.w3.org/2005/11/its" DITAArchVersion="1.0" id="myTopic"> <title>The ITS Topic</title> <its:documentRules> <its:ns prefix="dita" uri="http://dita.oasis-open.org/architecture/2005/"/> <its:documentRule its:translateSelector="//*[@dita:translate='yes']" its:translate="yes"/> <its:documentRule its:translateSelector="//*[@dita:translate='no']" its:translate="no"/> <its:documentRule its:termSelector="//dita:term" its:term="yes"/> <its:documentRule its:termSelector="//dita:dt" its:term="yes"/> </its:documentRules> <body>[...] <dlentry id="tDataCat"> <dt>Data category</dt> <dd>ITS defines <term>data category</term> as an abstract concept for a particular type of information for internationalization and localization of XML schemas and documents. </dd> </dlentry>[...] <p>For the implementation of ITS, apply the rules in the order:</p> <ul> <li>Default</li> <li>Rules in the schema</li> <li>Rules in the instance document</li> <li>Local attributes </li> </ul> <p> <ph translate="no" xml:lang="fr">Et voilà !</ph> The last rule wins </p> </body> </topic>
This section is normative.
The following table summarizes the relations between data categories, location of their selection mechanisms, and default selections in instance documents.
Data category | Applicable in schema | Rule selection applicable | default selection in instance document |
Translatability | + | + | Textual content of element, including content of child elements, but excluding attributes |
Localization information | + | + | Textual content of element, including content of child elements, but excluding attributes |
Terminology | + | + | Textual content of element, including content of child elements, but excluding attributes |
Directionality | - | + | Textual content of element, including attributes and child elements |
Ruby | - | + | Textual content of element, including content of child elements, but excluding attributes |
Note: The data categories differ with respect to defaults in the instance document for compatibility reasons with existing standards and practices. For example, the dir attribute in [XHTML2] refers to the content of the element and all attributes and child elements. Hence, the data category of directionalty selects the same information as the default. On the other hand, it is common practice that information about translatability refers only to textual content of an element. Hence, the data category of translatability selects as a default the same information.
[Definition: The data category translatability expresses information about whether the content of an element or attribute should be translated or not.]. The values of this data category are "yes" (translatable) or "no" (not translatable).
Translatability can be expressed in a schema, in a set of rules, or on an individual element.
In a schema, translatability is expressed with a schemaRule element with a translate attribute. The attribute has the values "yes" or "no".
<xs:element name="p"> <xs:annotation> <xs:appinfo> <its:schemaRule its:translate="yes"/> </xs:appinfo> </xs:annotation> ... </xs:element>
Translatability is expressed with a documentRule element with a translate attribute. The attribute has the values "yes" or "no". In addition, a translateSelector attribute is required.
<its:documentRules> <its:documentRule its:translate="yes" its:translateSelector="//p"/> <!-- All p elements should be translated--> </its:documentRules>
In an instance document, translatability is expressed with a translate attribute with the values "yes" or "no" The selection is the textual content of the element, including child elements, but excluding attributes.
In the body
element, its textual content and the content of all elements
should be translated. The content of the specified quote element, however, must not
be translated.
<book> <head>...</head> <body its:translate="yes"> ... <p>And he said: you need a new <quote its:translate="no">motherboard</quote> </p> ... </body> </book>
[Definition: The data category localization information is used to communicate information to localizers about a particular item of content.]
This data category has several purposes:
Tell the translator how to translate parts of the content
Expand on the meaning or contextual usage of a specific element, such as what a variable refers to or how a string will be used on the user interface
Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.)
Indicate why a piece of text is emphasized (important, sarcastic, etc.)
Two types of informative notes are needed:
An alert contains information that the translator must read before translating a piece of text. Example: an instruction to the translator to leave parts of the text in the source language.
A description provides useful background information that the translator will refer to only if they wish. Example: a clarification of ambiguity in the source text.
Localization information can be expressed in a schema, in rules, or on individual elements.
In a schema, localization information is expressed with a schemaRule element with a locInfo attribute. The type of the localization information is expressed with a locInfoType attribute with the values "alert" or "description".
<xs:element name="p"> <xs:annotation> <xs:appinfo> <its:schemaRule its:locInfo="This has to be handled carefully" its:locInfoType="alert"/> </xs:appinfo> </xs:annotation> ... </xs:element>
Localization information is expressed with a documentRule element with the attributes locInfo and locInfoType . In addition, a locInfoSelector attribute is required.
<its:documentRules> <its:documentRule its:locInfo="This p element has to be handled carefully" its:locInfoType="alert" its:locInfoSelector="/body/p[1]"/> </its:documentRules>
In an instance document, localization information is expressed with the attributes locInfo and locInfoType . The selection is the textual content of element, including child elements, but excluding attributes.
The terminology data category is used to mark terms. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.
The terminology data category can be expressed in a schema, in rules or on individual elements.
In a schema, the terminology data category is expressed with a schemaRule element with a term attribute, which has the value "yes".
<xs:element name="span"> <xs:annotation> <xs:appinfo> <its:schemaRule its:term="yes"/> <!-- All span elements are used to mark up terms--> </xs:appinfo> </xs:annotation> ... </xs:element>
The terminology data category is expressed with a
documentRule
element with
the
term
attribute, which has the value "yes". A
termSelector
attribute is required. In addition, an optional
termRef
attribute can be used to refer to external information about the
term. The datatype of
termRef
is xs:anyURI
.
<its:documentRules> <its:documentRule its:term="yes" its:termSelector="/body/p[1]/span" its:termRef="http://example.com/termdatabase/#x142539"/> </its:documentRules>
In an instance document, the terminology data category is expressed with a term attribute, which has the value "yes", and an optional termRef attribute. The selection is the textual content of the element, including content of child elements, but excluding attributes.
This data category expresses the directionality of a piece of text. Its values are "ltr", "rtl", "lro" or "rlo". This definition is compliant with the dir attribute in [XHTML2], except that [XHTML2] does not allow for rule-based selection.
The dir attribute is used for the implementation of the directionality data category. It has the four values "ltr", "rtl", "lro" or "rlo".
Directionality can be expressed in rules or on individual elements.
Directionality is expressed in rules using a documentRule element with the dir attribute. In addition, a dirSelector attribute is required.
<its:documentRules> <its:documentRule its:dir="rtl" its:dirSelector="/body/p[1]/quote[xml:lang='he']"/> <!-- Some Hebrew quotation --> </its:documentRules>
In an instance document, directionality is expressed with a dir attribute. The selection is the textual content of the element, including all child elements and attributes.
The data category ruby is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.
Ruby can be expressed in an instance document with or without selections.
Ruby in an instance document without selections is realized with a ruby element which contains a rubyBase and a rubyText element.
<text> <head> ... </head> <body> <p>This is about the <its:ruby> <its:rubyBase>W3C</its:rubyBase> <its:rubyText>World Wide Web Consortium</its:rubyText> </its:ruby>. </p> </body> </text>
Note: The structure of the content model for the ruby element without selection is identical with the structure of ruby in section 5.4 of [OpenDocument], and simple ruby markup as defined in section 1.2.1 in [Ruby-TR].
In legacy situations, where one cannot change the element markup and there one want to apply ruby text to an attribute or existing element content, then the following approaches can be used.
Ruby in an instance document with selections is expressed with a documentRule element with two attributes:
A rubyText attribute contains the ruby text (corresponding to the rubyText element in the case of no selections)
A rubySelector attribute contains the selector. It selects the ruby base text, corresponding to the rubyBase element in the case of no selection.
<text ...> <head> ... </head> <its:documentRules> <its:documentRule its:rubyText="World Wide Web Consortium" its:rubySelector="/body/img[1]/@alt"/> </its:documentRules> <body> <img src="w3c_home.png" alt="W3C"/> ... </body> </text>
This section is informative.
[Ed. note: This section will be mostly written in a subsequent working draft. In a longer term, the working group plans to publish a separate document out of this section.]Two topics are covered in this section:
How should ITS be integrated in specific markup schemes? For example, as for XHTML,
it is helpful for the interoperability of ITS implementations to specify that the
documentRules
element will always be part of the content model of the
head
element.
How should ITS data categories be related to existing markup declarations in a schema, which fulfill identical or overlapping purposes? For example, [Dita 1.0] already has an attribute to indicate translatability of text, but without a mechanism for selection of information in documents and schemas.
The TEI ([TEI]) is intended for literary and linguistic material, and is most often used for digital editions of existing printed material. It is also suitable, however, for general purpose writing. The P5 release of the TEI consists of 23 modules which can be combined together as needed.
The TEI is maintained as a single ODD document, and customizations of it are also written as ODD documents. These are processed using XSLT stylesheets to make a tailored user-level schema in XML DTD, XML Schema or RELAX NG.
The ITS additions involve two changes to TEI:
Allowing
documentRules
to appear in the TEI metadata
section (the teiHeader
).
Adding the ITS data category attributes to the TEI global attribute set.
Both of these can be easily achieved using standard techniques in ODD.
The body of a TEI/ITS customization consists of a
schemaSpec
which lists the modules to be included (this example includes
six common ones):
<schemaSpec ident="tei-its" start="TEI"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <moduleRef key="namesdates"/> <moduleRef key="msdescription"/> ...
In addition, we load the ITS schema (in its RELAX NG XML format, the
language used by the TEI for expressing content models), and overload
the definition of the TEI content class model.headerPart
to include the ITS
documentRules
:
<moduleRef url="its.rng"> <content> <rng:define name="model.headerPart" combine="choice"> <rng:ref name="documentRules"/> </rng:define> </content> </moduleRef>
The content class determines which elements are allowed as children of
teiHeader
. Lastly, we change the definition of the global
attribute class att.global
to reference the ITS data category attributes (available from the ITS schema we loaded earlier):
<classSpec ident="att.global" type="atts" mode="change"> <attList> <attRef name="att.selector.attributes"/> <attRef name="att.datacats.attributes"/> </attList> </classSpec> ... </schemaSpec>
When processing, this customization produces a schema which permits markup like this:
<TEI xmlns:its="http://www.w3.org/2005/11/its" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <!-- details of the file --> </fileDesc> <documentRules xmlns="http://www.w3.org/2005/11/its"> <ns its:prefix="t" its:uri="http://www.tei-c.org/ns/1.0"/> <documentRule its:translate="no" its:translateSelector="//t:body/t:p/@*"/> <documentRule its:translate="yes" its:translateSelector="//t:body/t:p"/> </documentRules> </teiHeader> <text> <body> <p rend="normal">Hello <hi>world</hi></p> <p rend="special">Goodbye</p> <p its:translate="no">This must not be translated</p> </body> </text> </TEI>
In this example, a set of documentRule elements are provided in the header to provide rules, and the body of the text performs a specific override.
[XMLSPEC] is intended for W3C working drafts, notes, recommendations, and all other document types that fall under the category of technical reports. XML Spec is available in the formats of XML DTD, XML Schema and RELAX NG.
ITS has been integrated into xmlspec-i18n.dtd. This is a version of the XML DTD version 2.9 of XML Spec which already supplies various internationalization and localization related features. For example, there is an attribute translate in xmlspec-i18n.dtd, which can be used for the same purposes as the ITS translate attribute. To be able to separate them from original XML Spec declarations, all additions are stored in two separate files i18n-extensions.mod and i18n-elements.mod. Xmlspec-i18n.dtd is used within the W3C Internationalization Activity for the creation of technical reports.
For the integration of ITS, the following modifications to the xmlspec-i18n.dtd have been made:
A new entity <!ENTITY % its SYSTEM "its.dtd">
and the
entity call %its;
have been added to xmlspec-i18n.dtd.
The existing XML Spec entity %common.att;
has been modified . The ITS
entities %att.datacats.attributes;
and
%att.selector.attributes;
have been added to %common.att;
. In
this way, the data category attributes
and the selector attributes can be used
at any element defined in the XML Spec DTD.
The XML Spec entity %header.mdl;
contains the content model of the
header
element. The ITS element
documentRules
has been added as
the last element to this content model. In this way,
documentRules
can be
used inside an XML Spec instance. The header
element of the XML Spec DTD
has been chosen as the place for
documentRules
, to avoid the impact of ITS
markup on XML Spec markup.
The ITS element
ruby
has been added to the XML Spec entity
%p.pcd.mix;
. In this way it is possible to use
ruby
as an
inline element.
As mentioned before, xmlspec-i18n.dtd has its own existing markup declarations for
various internationalization and localization related purposes. In the original XML
Spec 2.9 DTD, there is a term
element which fulfills the same purpose as the
ITS
term
attribute.
To relate such existing XML Spec and xmlspec-i18n.dtd related markup to ITS markup (see Section 4.3: Mapping of ITS Data Categories to Existing Markup), the following documentRules has been created. [Ed. note: This is not an exhaustive list of mappings yet, but only a first attempt].
<its:documentRules xmlns:its="http://www.w3.org/2005/11/its"> <!--The following rules are for xmlspec-i18n.dtd--> <its:documentRule its:term="yes" its:termSelector="//qterm"/> <its:documentRule its:dir="ltr" its:dirSelector="//*[@dir='ltr']"/> <its:documentRule its:dir="rtl" its:dirSelector="//*[@dir='rtl']"/> <its:documentRule its:dir="lro" its:dirSelector="//*[@dir='lro']"/> <its:documentRule its:dir="rlo" its:dirSelector="//*[@dir='rlo']"/> <its:documentRule its:locInfo="" its:locInfoType="alert" its:locInfoSelector="//@locn-alert"/> <its:documentRule its:locInfo="" its:locInfoType="description" its:locInfoSelector="//@locn-note"/> <its:documentRule its:translate="yes" its:translateSelector="//*[@translate='yes']"/> <its:documentRule its:translate="no" its:translateSelector="//*[@translate='no']"/> <!--This rule is for the original XML Spec DTD--> <its:documentRule its:term="yes" its:termSelector="//term"/> </its:documentRules>
Since both XML Spec and xmlspec-i18n.dtd do not define a namespace, the mappings use XPath expressions with unqualified element and attribute names.
This section is normative.
The span element can be used if a markup scheme has no element to which data category attributes can be attached. span contains these attributes and serves as a hook for using them in XML documents.
<text> <head>[...]</head> <body> ... <its:span its:translate="no"> ... </its:span> </body> </text>
The span element contains the data category attributes.
[1] | span | ::= | element :span { span.content, span.attributes } |
[2] | span.content | ::= | text |
[3] | span.attributes | ::= |
att.datacats.attributes, empty |
A data type data.selector is defined for selector attributes. Its value is an XPath expression
[XPath 1.0]. A data type itsBoolean is defined for boolean values, e.g. to express
translatability
. The data type dirValues
is used for the data category attribute
dir
. The data type locInfoType is used to express the type of the
locInfo
attribute. The data type itsBooleanTrue is used for the
term
attribute.
[4] | data.selector | ::= | text |
[5] | data.itsBoolean | ::= | "yes" | "no" |
[6] | data.dirValues | ::= | "ltr" | "rtl" | "lro" | "rlo" |
[7] | data.locInfoType | ::= | "alert" | "description" |
[8] | data.itsBooleanTrue | ::= | "yes" |
The attribute group att.datacats is used to express the ITS data categories. It makes use of the data type data.itsBoolean.
[9] | att.datacats.attributes | ::= |
att.datacats.attribute.translate,
att.datacats.attribute.locInfo,
att.datacats.attribute.locInfoType,
att.datacats.attribute.term,
att.datacats.attribute.termRef,
att.datacats.attribute.dir,
att.datacats.attribute.rubyText,
empty |
[10] | att.datacats.attribute.translate | ::= | attribute translate { data.itsBoolean }? |
[11] | att.datacats.attribute.locInfo | ::= | attribute locInfo { text }? |
[12] | att.datacats.attribute.locInfoType | ::= | attribute locInfoType { data.locInfoType }? |
[13] | att.datacats.attribute.term | ::= | attribute term { data.itsBooleanTrue }? |
[14] | att.datacats.attribute.termRef | ::= | attribute termRef { xsd:anyURI }? |
[15] | att.datacats.attribute.dir | ::= | attribute dir { data.dirValues }? |
[16] | att.datacats.attribute.rubyText | ::= | attribute rubyText { text }? |
The elements ruby , rubyBase and rubyText are used for the implementation of the Ruby data category. If change the element markup in an XML document is not possible, the rubyText and rubySelector attributes should be used.
[17] | rubyBase | ::= | element :rubyBase { rubyBase.content } |
[18] | rubyBase.content | ::= | text |
[19] | ruby | ::= | element :ruby { ruby.content } |
[20] | ruby.content | ::= |
rubyBase, rubyText
|
[21] | rubyText | ::= | element :rubyText { rubyText.content } |
[22] | rubyText.content | ::= | text |
The attribute group att.selector is used at the documentRule element to express applicability of ITS information. It must not be used in other positions, e.g. individual elements. It makes use of the data type data.selector.
[23] | att.selector.attributes | ::= |
att.selector.attribute.translateSelector,
att.selector.attribute.locInfoSelector,
att.selector.attribute.termSelector,
att.selector.attribute.dirSelector,
att.selector.attribute.rubySelector,
empty |
[24] | att.selector.attribute.translateSelector | ::= |
attribute translateSelector { data.selector }? |
[25] | att.selector.attribute.locInfoSelector | ::= |
attribute locInfoSelector { data.selector }? |
[26] | att.selector.attribute.termSelector | ::= | attribute termSelector { data.selector }? |
[27] | att.selector.attribute.dirSelector | ::= | attribute dirSelector { data.selector }? |
[28] | att.selector.attribute.rubySelector | ::= | attribute rubySelector { data.selector }? |
The schemaRule element contains rules for ITS information, to be used as schema annotation. It uses attributes from the ITS data categories.
[29] | schemaRule | ::= | element :schemaRule { schemaRule.content, schemaRule.attributes } |
[30] | schemaRule.content | ::= | empty |
[31] | schemaRule.attributes | ::= |
att.datacats.attributes, empty |
The documentRules element contains zero or more ns elements, followed by one or more documentRule elements. The documentRule element contains attributes from the data category attributes, and the selector attributes.
[32] | documentRules | ::= |
element :documentRules { documentRules.content, documentRules.attributes } |
[33] | documentRules.content | ::= |
ns*, documentRule+ |
[34] | documentRules.attributes | ::= | empty |
[35] | ns | ::= | element :ns { ns.content, ns.attributes } |
[36] | ns.content | ::= | empty |
[37] | ns.attributes | ::= |
att.nsident.attributes, empty |
[38] | att.nsident.attributes | ::= |
att.nsident.attribute.prefix,
att.nsident.attribute.uri,
empty |
[39] | att.nsident.attribute.prefix | ::= | attribute prefix { xsd:NCName } |
[40] | att.nsident.attribute.uri | ::= | attribute uri { xsd:anyURI } |
[41] | documentRule | ::= |
element :documentRule { documentRule.content, documentRule.attributes } |
[42] | documentRule.content | ::= | empty |
[43] | documentRule.attributes | ::= |
att.selector.attributes, att.datacats.attributes, empty |
This section is normative.
Conformance to ITS falls into two categories: conformance to the ITS data categories (cf. Section 5: Description of Data Categories, including data category specific default selections) and conformance to selection mechanisms (cf. Section 4: Selection of ITS information).
An implementation of the ITS data categories is conformant if it supplies a schema which adopts the ITS data categories, with the following constraints:
The schema must allow the usage of the attribute group att.datacats at every element which is declared in the schema.
The interpretation of data category attributes in instance documents must be conformant to the data category specific default selections described in Section 5.1: Position and Default Selections of Data Categories.
The schema should allow the usage of the documentRules element at one or more elements in the schema.
The schemaRule element is to be used as schema annotations. It is the responsibility of the schema processor to allow for such annotations.
<xs:schema xmlns:myns="http://example.com/mySchema" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:its="http://www.w3.org/2005/11/its" targetNamespace="http://example.com/mySchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="http://www.w3.org/2005/11/its" schemaLocation="its.xsd"/> <xs:element name="document"> <xs:complexType> <xs:sequence> <xs:element ref="myns:head"/> <xs:element ref="myns:body"/> </xs:sequence> <xs:attributeGroup ref="myns:commonAtts"/> </xs:complexType> </xs:element> <xs:attributeGroup name="commonAtts"> <xs:attributeGroup ref="its:att.datacats.attributes"/> <xs:attributeGroup ref="its:att.selector.attributes"/> </xs:attributeGroup> <xs:element name="head"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="its:documentRules"/> <xs:element ref="its:documentRule"/> </xs:choice> <xs:attributeGroup ref="myns:commonAtts"/> </xs:complexType> </xs:element> <xs:element name="body"> <xs:complexType> <xs:sequence> <xs:element ref="myns:para" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="myns:commonAtts"/> </xs:complexType> </xs:element> <xs:element name="para"> <xs:complexType mixed="true"> <xs:attributeGroup ref="myns:commonAtts"/> </xs:complexType> </xs:element> </xs:schema>
Conformance to Selection Mechanisms encompasses conformance to the ITS data categories and data category specific default selection mechanisms, with the following changes:
The schema must allow the usage of the documentRules element in at least one element in the schema
An application which processes ITS elements and attributes must process the selection mechanisms described in Section 4.2: Precedence between Selections
A mandatory part of this conformance criterion is the usage of XPath. An application which processes ITS selection rules must be able to process XPath in version 1.0 or higher. It is not required to support a specific host language of XPath, like for example [XSLT 1.0].
This log records major changes that have been made to this document since the previous publication (November 2005).
A section about basic concepts of the ITS tagset has been created.
Terminology has been modified: the terms for position of ITS information in situ versus dislocated have been replaced by selection in an instance document versus global, rule-based selection.
The definition of the directionality data category has been changed, to be compliant to various other specifications. See the comment on bidirectionality for further information.
Terminology within the text of this document and within the markup declarations has been modified: scope of ITS information has been replaced with selection of ITS information.
The schemaRules element has been removed. For ITS information as schema annotation, where is now only a schemaRule element.
All ITS attributes are now defined as qualified attributes. This leads to changes in the generated ITS schemas, for example the generation of parameter entities for prefixes in the XML DTD. This allows for easy changing of prefixes in element or attribute names.
The possibility of selector attributes in instance documents (in the previous draft this was called scope in an instance document) has been removed. Selection in an instance document now relies only on default selections of data categories. Due to this change, the definition of precedence between selections and conformance criteria have been simplified, and the issue on namespace requirements and selector values could be resolved.
Definitions of default selections of data categories have been modified.
An ns element has been added to the documentRules element to allow for specifying namespace bindings.
The implementation of the ruby data category has been modified, to reflect the removal of selector attributes in instance documents.
A section on mapping of ITS data categories to existing markup has been created.
Examples of integrating ITS markup into a TEI schema and into XML Spec have been created.
A span element has been created, see Section 7.1.1: The ITS Span Element.
The examples have been modified to reflect changes mentioned above.
For clarity, various sections have been reworded and re-structured, and the visualization of ITS markup within the text of this document has been modified.
Tracking of issues is now handled via Bugzilla.
A revision log has been added.
This document has been developed with contributions by the ITS Working Group. At the date of publication, the members of the Working Group were: Damien Donlon (Sun Microsystems), Martin Dürst (Invited Expert), Richard Ishida (W3C), Masaki Itagaki (Invited Expert), Christian Lieske (SAP AG), Naoyuki Nomura (Ricoh), Sebastian Rahtz (Invited Expert), François Richard (HP), Goutam Saha (CDAC), Felix Sasaki (W3C), Yves Savourel (ENLASO), Dianne Stoick (Boeing), Najib Tounsi (Ecole Mohammadia d'Ingénieurs Rabat (EMI)) and Andrzej Zydroń (Invited Expert).
A special thanks goes to Sebastian Rahtz who introduced us to the ODD language, which was used to create this document, and who provided the stylesheets to generate schemas and the XHTML version out of an ODD document. The generation of XHTML from ODD takes an intermediate step through the xmlspec-i18n.dtd, see Section 6.6: ITS and XML Spec.
$Id: Overview.html,v 1.19 2017/10/02 10:32:22 denis Exp $