Topics / Goals of this Presentation
- Describe purposes of the W3C Internationalization Tag Set (ITS) Working Group: Development of a tag set and guidelines for Internationalization (i18n) and Localization (l10n)
- Describe issues of the ITS requirements which arise related to schema languages
- Get feedback from the audience! Main questions:
- Is the purpose of ITS useful? Would you use ITS for your schema(s)?
- How would you solve the issues which arise related to schema languages?
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
Scope: Different types of XML Documents
- center on text e.g. OpenOffice
- focus on code e.g. XUL Mozilla
- mix prose and code e.g. DocBook
- include presentational aspects e.g. XHTML
Requirement: Bidirecional Text Support
- Issue: Correct rendering of character directionality cannot rely only on Unicode directionality properties for characters (LTR, RTL, indifferent, weakly typed)
- Correct display:
The title is "مفتاح معايير الويب!" in Arabic.
- Markup to indicate the directionality of text. Wrong display without markup (overall directionality is LTR):
The title is "مفتاح معايير الويب!" in Arabic.
Markup to specify directionality
- Markup to specify RTL directionality:
<p>The title is
"<span xml:lang="ar"
dir="rtl" lang="ar">مفتاح معايير الويب!</span>
"in Arabic.</p>
Requirement: Support for Ruby Markup
- Issue: Annotation of additional information mainly used for Japanese / Chinese Text
- Visualization:
<p>これは
<ruby><rb>紙芝居</rb><rt>かみしばい</rt></p>です。
Requirement: Span-like Element
- Issue: Add various i18n or l10n specific information
- Available in existing markup schemes like XHTML:
<code>System.out.println("
<span xml:lang="ja" translate="no">
W3C国際活動</span>");</code>
Requirement: Retrieving External Information
- Sample application: localized error message in a user interface documentation:
<para>If you create a typing error like "strs(s)",
you will get the message
<xref id="resfile.resx">
<subst>
<search>{0}</search>
<replace><Filename></replace>
</subst>
</xref>.<para>
Overview of Further Requirements
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
The Need to go Beyond Schemas (I)
- Issue: Specifying uniqueness of elements in a collection of documents → Not possible with schema languages
- Issue: Expressing information about e.g. translatability of attribute (sub)content → Not possible with schema languages. Example: (Non-)Translation of program code:
<window>
<box align="center">
<button label="hello xFly"
onclick="alert('Hello World');"/>
</box>
</window>
The Need to go Beyond Schemas (II)
- Issue: ITS should not break processing chains. Example: XPath expressions might break due to ITS specific markup:
<code>System.out.println("
<span xml:lang="ja" translate="no">
W3C国際活動</span>");</code>
code/text()
- Result with <span> element:
System.out.println("");
System.out.println("W3C国際活動");
The Need to go Beyond Schemas (III)
- Issue: Versioning of existing schemas:
- Version "A" of a schema might have elements which fit ITS purposes
- Version "B" might not have the elements
- Sustainable, hard-wired integration of ITS markup into a schema becomes impossible
- Example: Markup for bidirectional text in various versions of (X)HTML
Deployment options for ITS
- existing schemas extend with ITS vocabulary
- schemas to be developed
- independent ITS schema
- need to work with different schema languages
- additions / fallback: guidelines for schema authors
→A single ITS schema in one schema language is insufficient
Status Quo:
A Word on Technologies
- In scope: review survey useful technologies for the realization of ITS
- Out of scope: Development of these technologies
- Danger: realization of ITS with too many, possibly not well accepted technologies; would
hinder the widespread adoption of ITS (e.g. in the localization industry)
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
Schema languages: ITS relevant Characteristics
- Namespaces
- Pattern-based descriptions
- Usefulness of modularization and typing mechanisms for ITS
Namespaces
- Important means to separate markup, e.g. ITS versus XHTML
- Not supported in XML DTDs
- Do not solve the versioning problem, e.g. how to descripe relations of different XHTML versions to ITS
Pattern-Based Descriptions
- Complementary to grammar based schema languages
- Realizable e.g. with Schematron
- Sample use in ITS: By using XPath 2.0's collection() function, uniqueness in a collection of documents can be assured
Usefulness of Modularization and Typing Mechanisms for ITS
We will show the usage of existing mechanisms to realize the ITS locinfo data category
- XML DTDs: Marked sections
- XML Schema: Typing
- RELAX NG: Named patterns with ambiguous content models
XML DTDs: Marked Sections for ITS
- Parameter entities and marked sections for an XHTML versus an XHTML+ITS module:
<!ENTITY % para "INCLUDE">
<![%para;[
<!ELEMENT p (#PCDATA)>
<!ATTLIST p id ID #IMPLIED>
<!ENTITY % its.para "IGNORE">]]>
<![%its.para;[
<!ELEMENT p (#PCDATA)>
<!ATTLIST p id ID #IMPLIED
locinfo CDATA #IMPLIED>]]>
- Advantage: Can be adjusted in the XML document
- Disadvantage: Modularization information exists only before validation
XML Schema: Typing for ITS
- XHTML type and ITS sub type
<xs:complexType name="paraContent" mixed="true">
<xs:attribute name="id" type="xs:ID"/>
</xs:complexType>
<xs:complexType name="itsParaContent">
<xs:complexContent>
<xs:extension base="paraContent">
<xs:attribute name="locinfo" type="xs:string"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="p" type="itsParaContent"/>
- Typing information exits after validation (thus is available e.g. for type based queries)
doc("mydoc.xml")//element(*, itsParaContent?)
RELAX NG: Ambiguity for ITS
- Description of named patterns p-html and p-its in RELAX NG:
p-html =
element p { xhtml-p.content }
p-its =
element p { xhtml-p.content,
attribute locinfo { text } }
p = p-html | p-its
element div = p+
- Advantage: No restriction on ambiguity (no UPA Constraint): p-html and p-its are applicable in the same content model (e.g. <div>)
- Disadvantage: Named pattern exist only before validation (like parameter entities)
Assessing the Alternatives
- What is most important for the widespread adoption of ITS:
- Possibility to adjust ITS in the XML document (relying on parameter entities and marked sections of XML DTDs)?
- Ease of integration into existing schemas without ambiguity constraints (via named patterns of RELAX NG)?
- Availability of ITS specific information for and after validation (via typing Information of XML Schema)?
- A possible, schema language independent answer to the schema integration / schema mapping problem: "namespace sectioning"
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
Namespace Sectioning (I)
- Separation of a document into element and attribute sections, e.g. for ITS and XHTML:
<h:html xmlns:h="http://www.w3.org/1999/xhtml"
xmlns="http://www.example.org/its"
translate="yes">
<h:head>...<h:meta translate="no">...
<span>...<span>
</h:html>
- Validation against an ITS schema (here RELAX NG):
default namespace = "http://www.example.org/its"
attribute translate { "yes" | "no" }?
element span { text }
Namespace Sectioning (II)
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
ITS as Schema Annotation
- Issue: XHTML document contains ITS markup, validation against an XHTML schema is not possible:
<!DOCTYPE
SYSTEM "xhtml-plus-its.dtd">
<html> ...
<its-span
locinfo="...">...</its-span>
...
</html>
- Solution: schema for ITS is enhanced with annotations which specify relations to XHTML
ITS as Schema Annotation:
Architectural Forms (Annex of HyTime)
<?IS10744:arch name="its-arch"
bridge-form="archbridge"
renamer-att="its-arch.atts"
dtd-system-id="xhtml1-transitional.dtd"?>
<!ELEMENT html (head, body)>
<!ATTLIST html its-arch NAME #FIXED "archbridge">
...
<!ELEMENT its-span (#PCDATA)>
<!ATTLIST its-span
its-arch NAME #FIXED "span"
its-arch.atts CDATA #FIXED "locinfo title">
- Annotation can be a basis for a transformation:
<its-span> to <span>,
and @locinfo to @title
ITS as Schema Annotation: Further approaches (I)
- RDF-based approach: statements about data categories (e.g. "span") and their realization
- No need to change the schemas
- Independent of namespace mechanism
- Applicable to all three schema languages (XML DTDs, XSD, RELAX NG)
http://example.com/its#span realizedAs
http://www.mySchema2.xsd#(/~element::span).
http://example.com/its#span realizedAs
"<!ELEMENT span (...)>".
http://example.com/its#span realizedAs
"element span { ... }".
ITS as Schema Annotation: Further approaches (II)
<purposeSpec>
<servesPurpose origVoc="span" its="its-span"/>
</purposeSpec>
- Not schema annotation really, but schema language specific abstraction mechanisms: XLink 1.1
A Word on Schema Annotation for ITS in General
- Common property of all approaches
- Approaches describe relations between schemas and/or instantiation of schemas inside the schema ("inline schema annotation") or
outside of the schema (e.g. RDF based approach)
- Difference to fixed modularizations / application of typing mechanisms: No need to change the schemas
- Caveats:
- No stable implementations
- No widespread adoption
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
Processing Model for ITS
ITS might be implemented with presumably several processes
- "nrl" for namespace sectioning
- "validation" for validation
- "xslt" for transformations
- parallel execution of processes, merging of results presumably are needed
Sample Processing related to ITS
Technologies for Processing Models
- Various possible (emerging) standards for the description of processing models:
Pipeline Definition Language,
MT Pipeline, or XPL.
- Could solve issues which arise with requirements like retrieval of external information
- Question: Should ITS a processing model with a specific technology or would that hinder the adoption of ITS?
Overview
- Scope of W3C ITS WG and ITS Requirements
- The Need to go Beyond Schemas; possibly useful Technologies
- ITS relevant Characteristics of Schema Languages
- Namespace Sectioning
- Schema Annotation
- Processing Models
- Questions to the Audience
Instead of a Summary - Questions:
- Is the purpose of ITS useful? Would you use ITS to internationalize / localize your schema(s)?
- What do you prefer - Typing (XML Schema), no constraints on ambiguity (RELAX NG), ITS parametrization in XML document (XML DTDs)?
- Do we need a namespace for ITS, with the danger of not supporting DTDs?
- Do we need an abstract description of ITS, i.e. via schema annotation? What schema annotation mechanism would you propose (ArchForms, RDF, ITS-specific, ...), or what other abstraction mechanism?
- Do we need a processing model for ITS?
Please read
http://www.w3.org/TR/itsreq/
and send comments to
www-international@w3.org