This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
QT approved comment: Section 3.3.18 (anyURI) appears to lack any definition of the value space, or any definition of identity and equality. (One gets the impression that the types defined later in the spec have received less attention from the editors, just as they have from this reviewer).
The WG discussed this briefly with QT at the October 2007 ftf meetings. The nature of the value space is currently entailed by the description of the lexical space and the description of the lexical mapping as the identity function. The nature of the value space should probably be stated explicitly. It does not need to change, it just needs to be clearly and explicitly stated.
Since this issue (in the WG's analysis as summarized in comment #1, at any rate) is directed at clarifying, rather than changing, the definition of the value space in question, I am tentatively marking it editorial. This may have the effect that the issue will be addressed after, rather than before, the next published working draft (but it also will have the effect of ensuring that the issue is not closed as WONTFIX before that WD is published).
At its call today, the XML Schema WG agreed to instruct the editors to revise the discussion of anyURI a) to refer non-normatively to the Note on Legacy Encoded IRIs (LEIRIs) b) to mention explicitly that (since the lexical space of anyURI is essentially the set of strings, and the lexical mapping is identity, it follows that) the value space is isomorphic to that of xsd:string
A wording proposal intended to resolve this issue is at http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3264.html (member-only link)
Noah Mendelsohn has pointed out a flaw in the proposed wording, which also applies to the description of the lexical space. ("The set of possibly empty character sequences, huh? OK, here's a character sequence: 'http://www.w3.org/'. Could it possibly be empty? No, it's not empty, and it couldn't possibly be empty without no longer being itself: I can see characters right there. I guess it's not a member of the value space, or of the lexical space either.") Perhaps it would be better to change the proposed new section on value space and the existing first sentence of the section on lexical mapping from: 3.3.18.1 Value Space The value space of anyURI is the set of possibly empty character sequences. 3.3.18.2 Lexical Mapping The ·lexical space· of anyURI is the set of possibly empty finite-length character sequences. to 3.3.18.1 Value Space The value space of anyURI is the set of finite-length sequences of zero or more characters. 3.3.18.2 Lexical Mapping The ·lexical space· of anyURI is the set of finite-length sequences of zero or more characters. The proposed amendment is modeled on the formulation used to describe the value space of string.
(In reply to comment #5) > Noah Mendelsohn has pointed out a flaw in the proposed wording, which > also applies to the description of the lexical space. ("The set of possibly > empty character sequences, huh? OK, here's a character sequence: > 'http://www.w3.org/'. Could it possibly be empty? No, it's not empty, > and it couldn't possibly be empty without no longer being itself: I can > see characters right there. I guess it's not a member of the value space, > or of the lexical space either.") > > Perhaps it would be better to change the proposed new section on value > space and the existing first sentence of the section on lexical mapping IIRC, the "possibly empty" was directed by the WG some time ago (since some FLCS are required to be non-empty, and we wanted to be explicit at every occurrence. There are 8 occurrences of 'possibly empty' in the current spec. Either we expect readers to understand what's meant, or we must fix all eight. I suspect that anyone who comes up with Noah's "flaw" will understand what we meant. If we're going to fix things at this level of nit-pick, we've got a lot of other changes to be made too. Let's not go down that slippery slope. OTOH, note that the lexical space is limited to finite sequences; the value space (by the proposed wording) is not. Since when we insured we were explicit about allowing or disallowing the empty string, we also chose to be careful to disallow infinite strings, the wording from Lexical Mapping should be used.
Having reviewing the occurrences of "possibly empty", I believe that it is only those occurring in the descriptions of hexBinary, base64Binary, and anyURI that are prone to the unsatisfactory reading identified by Noah. The others differ in syntax or context enough that I do not believe they need to change. So I propose to amend my proposed amendment to the wording proposal to include analogous changes in hexBinary and base64Binary, from ... the set of possibly empty finite-length sequences of binary octets to ... the set of finite-length sequences of zero or more binary octets The press of time is a good reason for not going out of our way to find minor improvement to the spec and raise new issues about them. But in this case we have an issue and need to change the spec in either case; if we can do so without delaying the WG I don't see that we should not try to make the wording as clear as we can, in the time available.
Michael Sperberg-McQueen writes (after suggesting changes to the binary types as well as to anyURI): > The press of time is a good reason for not going > out of our way to find minor improvement to the > spec and raise new issues about them. But in this > case we have an issue and need to change the spec > in either case; if we can do so without delaying > the WG I don't see that we should not try to make > the wording as clear as we can, in the time > available. As Michael knows but perhaps others do not, I made a point of making my original comment to Michael privately, just to ensure that there is no added burden on the working group of trying to satisfy >me< in particular with respect to this concern. I am grateful that he felt it worth the trouble to carry them forward to the working group. For the record, I am very pleased with Michael's proposal to update that descriptions of the binary types as well as the proposed new description for anyURI, but I have not formally asked for any of these changes. If other commentators and members of the working group are happy with changing or not changing these, then so am I. In short: don't let me slow you down. Thank you. Noah
One comment on the proposal in comment #5. I like the idea of following the value space description for string. I think we should go all the way and copy the entire sentence. "The ·value space· of anyURI is the set of finite-length sequences of zero or more characters (as defined in [XML]) that ·match· the Char production from [XML]." Otherwise anyURI could have values that can't be represented in XML and xs:string. (Unless that was intended by the proposal in comment #5.)
(In reply to comment #7) > So I propose to amend my proposed amendment to the wording > proposal to include analogous changes in hexBinary and > base64Binary, from > > ... the set of possibly empty finite-length sequences of > binary octets > > to > > ... the set of finite-length sequences of zero or more > binary octets > > The press of time is a good reason for not going out of our > way to find minor improvement to the spec and raise new > issues about them. But in this case we have an issue and need > to change the spec in either case; if we can do so without > delaying the WG I don't see that we should not try to make > the wording as clear as we can, in the time available. In which case: Presumably a binary octet is a sequence of bits. A sequence of sequences of bits is not a sequence of bits, since a bit is not a sequence of bits. (Please, let's not violate the axiom of regularity!) Therefore, a finite-length sequence of zero or more binary octets is not a sequence of bits. What we want is the concatenation of all the terms of the sequence. So: ... the set of finite-length concatenations of sequences of zero or more binary octets. or "the set of finite-length bit-strings of zero or more binary octets" (because concatenation is generally implied for strings but not sequences; that's one of the more important connotational distinctions between strings and sequences). And that's why I didn't want to start the slippery slope.
(In reply to comment #7) > So I propose to amend my proposed amendment to the wording > proposal to include analogous changes in hexBinary and > base64Binary, from > > ... the set of possibly empty finite-length sequences of > binary octets > > to > > ... the set of finite-length sequences of zero or more > binary octets Every definition I've come across defining 'octet' in computer science contexts is effectively "bit-string of length 8". So 'binary' is redundant.
(In reply to comment #10) > ... the set of finite-length concatenations of sequences of zero > or more binary octets. > > or "the set of finite-length bit-strings of zero or more binary octets" or, perhaps most concisely, "the set of finite-length concatenations of zero or more octets".
in re comment #11: do the sources you cite specify whether the high-order bit comes first or last? I had the impression that in network protocols the notion of octet was carefully formulated to remain agnostic on that point. (But IANAEE.) If that's so, then RFC 3548 may be being careful instead of careless when it describes the base 64 encoding as encoding sequences of octets, rather than sequences of bits.
This is not worth the time we're spending discussing it. Pick an option (MSM's, SG's, or mine) and go with it. I can live with any.
Decided: - 3264 (XML Query and XSL WGs): xs:anyURI definition. http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3264.html Summary: the value space of anyURI needs to be specified. Note: some discussion in Bugzilla, some amendments proposed. One possible point of controversy: are base64Binary and hexBinary intended to encode sequences of bits or of octets? MSM's recommendations: fairly quick. - Amend as described in comment 7: in hexBinary and base64Binary, change ... the set of possibly empty finite-length sequences of binary octets to ... the set of finite-length sequences of zero or more binary octets - Amend as described in comment 9: in the new 3.3.18.1, read The ·value space· of anyURI is the set of finite-length sequences of zero or more characters (as defined in [XML]) that ·match· the Char production from [XML]. - Amend as suggested in comment 5, in the light of comment 9: in 3.3.18.2 (anyURI lexical mapping) for The ·lexical space· of anyURI is the set of possibly empty finite-length character sequences. read The ·lexical space· of anyURI is the set of finite-length sequences of zero or more characters (as defined in [XML]) that ·match· the Char production from [XML]. - And for the record, optionally reaffirm in the minutes that base64Binary encodes octet sequences, not (by itself) bit sequences.
As noted in comment 15, the XML Schema WG discussed this issue today and resolved it as described in comment 15. Michael, if you as the originator of the issue would report the disposition to the XSL and XML Query WGs, we'll be grateful. Close or reopen the issue in the usual way to signify agreement or disagreement with our disposition; if we don't hear from you or QT in 10 days or so, we'll assume you are happy with this disposition. Thank you.