This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The whole expression in this test is "^(?:[^-z]+)$" (without quotes). I am reporting this because a) either the rules are not clear or ambiguous and the test is correct in one reading of the spec b) the test is not correct In XSD 1.0, the production rules of [17] charRange apply. In the accompanying text, the author states that the rules are ambiguous and then goes on that they are not: 1) The [, ], - and \ characters are not valid character ranges; A: this does not apply 2) The ^ character is only valid at the beginning of a ·positive character group· if it is part of a ·negative character group· A: this applies, and gets the meaning of negating the character group 3) The - character is a valid character range only at the beginning or end of a ·positive character group·. A: ambiguous in this case, as the production rules do not allow this here. [14]: posCharGroup ::= ( charRange | charClassEsc )+ [17]: charRange ::= seRange | XmlCharIncDash [18]: seRange ::= charOrEsc '-' charOrEsc [20]: charOrEsc ::= XmlChar | SingleCharEsc [21]: XmlChar ::= [^\#x2D#x5B#x5D] [22]: XmlCharIncDash ::= [^\#x5B#x5D] Following this production rules, in part, we get: 4) it is a posCharGroup 5) it is a charRange 6) that range is "^" to "z" Now back at rule (2) above. The "^" is only valid in this position if it is also part of a negative character group. All in all, I think if the intended meaning was "from '^' to 'z'" then it should have been written as [\^-z], if it was "not from '^' to 'z'" then it should have been written as [^^-z]. If the intention was "from ^ to z" then it should have been written as [\^-z] If the intention was "not from ^ to z" then [^^-z] appears to be allowed (though [^\^-z] makes more sense to me) If the intention was "either ^, - or z", then [\^\-z] If the intention was "not - or z", then [^\-z] I think that the expression as written does not fit the production rules or description and should raise FORX0002.
The rules for character ranges in XSD 1.0 are known to be a complete mess. XSD 1.1 indicates what the WG intended. Although we don't require support for XSD 1.1, in cases like this referring to the XSD 1.1 spec is the best way of sorting out the ambiguity. The fact that the Schema WG chose to fix this bug in XSD 1.1 but not to issue a correction for 1.0 shouldn't inhibit us, I think, from having tests that assume the corrected interpretation. The XSD 1.1 rules make it clear that [^-z] means "any character other than hyphen or "z"".
> The XSD 1.1 rules make it clear that [^-z] means "any character other than > hyphen or "z"". Ok, thanks. Is there room for a Note in the XP31 spec, along the lines of "Where XSD 1.0 show ambiguity for character classes and ranges, refer to XSD 1.1. It is recommended that implementers take the production rule changes of XSD 1.1 in favor of XSD 1.0 where such ambiguities arise."?
F+O 5.6.1 does say: Note: In [Schema 1.1 Part 2] the rules for the interpretation of hyphens within square brackets in a regular expression have been clarified; and the semantics of regular expressions are no longer tied to a specific version of Unicode.
I am marking this one as resolved as invalid. I could not find a better option in the list.
(In reply to O'Neil Delpratt from comment #4) > I am marking this one as resolved as invalid. I could not find a better > option in the list. I agree. Let's leave it at that. I've closed the bug.