This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
There is a Note in G.1.1: Note: [Unicode Database] is subject to future revision. For example, the mapping from code points to character properties might be updated. All ·minimally conforming· processors ·must· support the character properties defined in the version of [Unicode Database] cited in the normative references (Normative (§K.1)). However, implementors are encouraged to support the character properties defined in any future version. I'm not sure that it is possible to do both. In Unicode 3.1, and therefore in XML Schema 1.0, the Ethiopic digits x1369-x1371 were in group Nd (and therefore matched \d). In Unicode 4.1 they have been moved to group No (so they no longer match \d). A given processor, unless it has configuration options to put this under user control -- which seems unduly onerous -- is either going to support the new version or the old. In one case, x1369 will match \d, in the other case it won't. In practice, it's quite likely to depend on which version of Java or .NET you are using. So I think we should either pin things down so processors are required to support Unicode version 4.1 and no other, or we should remove the "must" from the above note, and make it implementation-defined which version of Unicode is used. (In any case, what is a "must" doing in a Note?) Test case reS17 in the Microsoft regex test suite is relevant: its results depend on which version of Unicode you believe in.
During the telcon of 2008-07-18 the WG decided to classify this bug as being editorial, since the issue is with a "Note:", and those are non-normative by definition. The WG instructed the editors to make a small change to the note to soften its strictness. Discussed was the option to change the last sentence of the note to read as follows: However, implementors are encouraged to support the character properties defined in any future version, possibly with such support being engaged under user control.
See also http://lists.w3.org/Archives/Public/www-xml-schema-comments/2008OctDec/0076.html from James Clark
Reviewing this bug report with a view toward proposing a change to resolve it, I have come to believe that the changes made to appendix G.1.1 in connection with bug 5948 may already have addressed the concerns raised here (although not the concerns about 1.0 2E in the email from James Clark cited in comment 2). The note on which the comment was originally raised has been split into a normative paragraph and a note. The current text is [Unicode Database] is subject to future revision. For example, the mapping from code points to character properties might be updated. All ·minimally conforming· processors ·must· support the character properties defined in the version of [Unicode Database] cited in the normative references (Normative (§K.1)). However, implementors are encouraged to support the character properties defined in any later versions. When the implementation supports multiple versions of the Unicode database, and they differ in salient respects (e.g. different properties are assigned to the same character in different versions of the database), then it is ·implementation-defined· which set of property definitions is used for any given assessment episode. Note: In order to benefit from continuing work on the Unicode database, a conforming implementation might by default use the latest supported version of the character properties. In order to maximize consistency with other implementations of this specification, however, an implementation might choose to provide user options to specify the use of the version of the database cited in the normative references. The PropertyAliases.txt and PropertyValueAliases.txt files of the Unicode database may be helpful to implementors in this connection. In addition, there is a later reference to changes in the Unicode database; the current text at that location now reads: [Unicode Database] has been revised since XSD 1.0 was published, and is subject to future revision. In particular, the grouping of code points into blocks has changed, and may change again. All ·minimally conforming· processors must support the blocks defined in the version of [Unicode Database] cited in the normative references (Normative (§K.1)). However, implementors are encouraged to support the blocks defined in earlier and/or later versions of the Unicode Standard. When the implementation supports multiple versions of the Unicode database, and they differ in salient respects (e.g. different characters are assigned to a given block in different versions of the database), then it is ·implementation-defined· which set of block definitions is used for any given assessment episode. In particular, the version of [Unicode Database] referenced in XSD 1.0 (namely, Unicode 3.1) contained the following blocks which have been renamed in the version cited in this specification. Since these block names may appear in regular expressions within XSD 1.0 schemas, implementors are encouraged to support the superseded block names in XSD 1.1 processors for compatibility, either by default or at user option: #x0370 - #x03FF: Greek #x20D0 - #x20FF: CombiningMarksforSymbols #xE000 - #xF8FF: PrivateUse #xF0000 - #xFFFFD: PrivateUse #x100000 - #x10FFFD: PrivateUse To see the text in context, consult the current CR document at http://www.w3.org/TR/xmlschema11-2/#charcter-classes or the current status-quo document at http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.html#charcter-classes I'm marking this issue as needsReview to signal that I think the WG needs to consider whether this issue has already been resolved (and should have been so marked when we resolved bug 5948).
WG agrees with MSM's assessment. Closing as overtaken.
The WG reported this bug as FIXED on 2010-06-24. We are closing this bug as requiring no futher work. If there are issues remaining, you can reopen this bug and enter a comment to indicate the problem. Thanks very much for the feedback.