graphic with four colored squares
Cover page image (keys)

XML 1.0 and XML 1.1

Presentation for SML Working Group

C. M. Sperberg-McQueen

29 August 2007

Overview

  • XML 1.1 changes
  • how to refer to other specs
These are thoroughly intertwingled.

XML 1.1: Fear and trembling

XML 1.1 changes could break things:
  • Arbitrary new characters can be added
  • ... and be allowed in names of 1.1 elements.
  • Character references to C0 characters can be used.
  • Other specs refer (exclusively) to XML 1.0:
    • XML Schema 1.0 string, NCName, and related types
    • XPath 2.0 Functions and Operators*
Noah Mendelsohn, “Making the XML stack work with XML 1.1” Lightning talk, W3C Tech Plenary, March 2003. <URL:http://www.w3.org/2004/03/nmXML11.pdf>

XML 1.1: for the World Wide Web

XML 1.1 changes could make the Web more accessible:
  • Corrected / augmented treatment of some scripts incomplete in 2.0
  • Scripts for minority languages being normalized now
  • ... e.g. ancient Berber script, Tifinagh, normalized in Unicode 4.1.
  • XML 1.0 doesn't allow Berber names for elements, attributes, or even enumerated values. Nor Ethiopic (used in Ethiopia, Eritrea, Somalia). Nor Sinhala (Sri Lanka). Nor Myanmar (Myanmar). Nor Khmer (Cambodia). Nor Canadian Syllabics (Canada). Nor Thaana (Maldives). Nor Mongolian (Mongolia). That's 150 million people currently excluded.
What does the X in XML stand for, anyway?
Richard Ishida, “Web for Everyone?” Lightning talk, W3C Tech Plenary, March 2005. <URL:http://www.w3.org/2005/03/02-ishida-tech-plen/>

Side issue: are XML names supposed to be human-readable?

Perhaps human readability of XML names isn't important. Some say:
Tags are not part of the user interface.
But is that really so? Remember the proposal:
Let's restrict tags to ASCII, to keep the lexing tables small (8 bits).
And the counter-proposal which no one liked (why?):
Let's restrict tags to uppercase A-P, to keep them even smaller (four bits).

XML 1.1: what it does

XML 1.1: current status

Implemented, but not often used.
Future plans: possible XML 1.2, more compatible with 1.0?

XML identifiers

Q. What's a name?
A. The grammar says
Name ::= Letter (Letter | Digit | OtherNameChar)*
Q. What's a letter? a digit?
A. Whatever the Unicode property database says. (Life is too short to make those decisions ourselves.)
But — which version of the Unicode / ISO 10646 spec?

Referring to other specs

When one spec refers normatively to another, how should they do it?
  • Refer to a specific version?
  • Refer to “the current” or “the most recent” version?
Concrete example: XML 1.0 reference to Unicode and ISO 10646.

Most-recent version references

Plus:
  • automatically up to date
Minus:
  • action at a distance:
    • Last night your parser was conformant.
    • Today the Unicode Consortium and ISO / IEC JTC1 SC2 released a new document, and your parser is non-conformant.
  • more variation among processors (possible interop problem)

Specific version references

Plus:
  • clear
  • specific
Minus:
  • frozen
  • gets outdated if the other spec changes (and specs stop changing only when dead)

Concretely

[84] Letter   ::= BaseChar | Ideographic

[85] BaseChar ::= [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] 
                | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131]
                | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E]
                | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5]
                | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1]
                | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1]
                | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC
                | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C]
                | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481]
                | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC]
                | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9]
                | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] 
                | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] 
                | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] 
                | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 
                | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D 
                | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] 
                | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 
                | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] 
                | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] 
                | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] 
                | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] 
                | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D 
                | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] 
                | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 
                | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] 
                | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] 
                | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] 
                | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] 
                | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] 
                | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] 
                | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] 
                | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] 
                | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] 
                | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] 
                | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] 
                | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] 
                | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 
                | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] 
                | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D 
                | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] 
                | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] 
                | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] 
                | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] 
                | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] 
                | [#x1105-#x1107] | #x1109 | [#x110B-#x110C]
                | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C 
                | #x114E | #x1150 | [#x1154-#x1155] | #x1159 
                | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 
                | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E 
                | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] 
                | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 
                | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] 
                | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] 
                | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D 
                | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] 
                | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] 
                | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] 
                | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 
                | [#x212A-#x212B] | #x212E | [#x2180-#x2182] 
                | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] 
                | [#xAC00-#xD7A3]

[86] Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]

[87] CombiningChar ::= [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] 
                | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] 
                | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] 
                | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] 
                | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] 
                | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D 
                | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] 
                | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] 
                | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 
                | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F 
                | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] 
                | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC 
                | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] 
                | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] 
                | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] 
                | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] 
                | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] 
                | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] 
                | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] 
                | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] 
                | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] 
                | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] 
                | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] 
                | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] 
                | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F 
                | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] 
                | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 
                | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 
                | #x309A

[88] Digit    ::=  [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] 
                | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] 
                | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] 
                | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] 
                | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]


[89] Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 
                | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]
(The XML 1.0 definition)

Leading the receiver

When passing to a teammate, don't aim the ball at their current position; aim it ahead of them.
In spec terms:
  • Anticipate how the other spec may change.
  • Retain specificity, but allow for its changes
  • ... by allowing for what might be added.

Leading the receiver (2)

XML 1.1 does this:
  • Allow code points where name characters may be placed in future.
  • Negotiate with Unicode Technical Committee and SC2 for commitments.
  • An instance of Postel's Law:
    Be conservative in what you send, but liberal in what you accept.
    =
    Send only current UCS, but accept present and future UCS.
    maybe =
    SHOULD send only things valid against current X (external spec), MUST accept current and (predicted) future X.

Concretely


[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] 
                    | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] 
                    | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] 
                    | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] 
                    | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a] NameChar     ::= NameStartChar | "-" | "." | [0-9] | #xB7 
                    | [#x0300-#x036F] | [#x203F-#x2040]
[5] Name          ::= NameStartChar (NameChar)*
(The XML 1.1 definition)

Implementation-defined choice

A third option: limited implementation-defined choice.
XSDL 1.1, XQuery 1.0, XSLT 2.0, all allow either XML 1.0 or XML 1.1, or both, to be supported.
[XML Schema: Datatypes] defines some datatypes which depend on definitions in [XML 1.1] and [XML-Namespaces 1.1]; those definitions, and therefore the datatypes based on them, vary between version 1.0 ([XML 1.0], [XML-Namespaces 1.0]) and version 1.1 ([XML 1.1], [XML-Namespaces 1.1]) of those specifications. In any given schema-validity-assessment episode, the choice of the 1.0 or the 1.1 definition of those datatypes is implementation-defined.
Conforming implementations of this specification may provide either the 1.1-based datatypes or the 1.0-based datatypes, or both. If both are supported, the choice of which datatypes to use in a particular assessment episode should be under user control.
Cf. also “Processing XML 1.1 documents with XML Schema 1.0 processors” <URL:http://www.w3.org/TR/2005/NOTE-xml11schema10-20050511/>

Loose coupling / optional updates

A fourth option: open-ended choice.
ISO boilerplate reads:
The following normative documents contain provisions which, through reference in this text, constitute provisions of this [spec]. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this [spec] are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies. Members of ISO and IEC maintain registers of currently valid International Standards.

What does this mean?

Concrete example

Each document in the model MUST be a well-formed XML document [XML 1.0]
Does this mean
  • “... must be well-formed XML, in current version (1.0) or any later version”?
  • “... must be well-formed XML, in current version (1.0) or anything else consumer and producer agree on”?
  • “... must be well-formed XML, in version 1.0; use of any later version of XML is non-conforming”?

Points in solution space

  • Single dated reference, fixed for all time.
  • Sinle reference to “most recent” version.
  • Implementation-defined choice of versions, from fixed set. (“1.0, or 1.1, or both, but not 1.2 or anything newer.”). I.e. Explicit floor, explicit ceiling.
    N.B. full characterization of software requires more information.
  • Implementation-defined choice from open set. (“1.0 or any newer version.”). Variations:
    • Require one specific version, allow others? (“1.0 and optionally any newer version”) I.e. Floor, no ceiling.

Final thoughts

Q. Why have more than one spec?
A. Modularization, separation of concerns.
Q. What made the Web grow?
A. Loose coupling.
Q. What do you need (minimally) for interop?
A. ...