This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The notion of identity with regard to targets of elements needs to be clarified. For example, the spec dictates that if multiple schemes are present, they must resolve to the same element. It is not easily specified how to detmine whether two elements are in fact the same.
Agree that identity of element information items is not always possible to detect; and because we don't mandate XPath 2.0, we can't rely on node identity defined in "XQuery 1.0 and XPath 2.0 Data Model (XDM)". A concrete example: <p><c/><c/></p> Nothing in the infoset can tell the 2 "c" sub-elements apart. Suggestion for SML core: 1. SML processors MUST NOT treat two element information items as identical (or the same) if they can be distinguished using information in the infoset. (i.e. their infoset properties are not recursively pair-wise equal.) 2. If two element information items have pair-ware equal properties, SML processors MAY still treat them as different, if processors have other ways to distinguish them. Note: The above #2 could be read to allow a (not very useful) conforming SML processor to treat *all* resolvable references as invalid. But it is important to allow it, to be able to use existing XML processing services. (Many XML APIs can tell the difference between the 2 "c" elements in the above example.) It doesn't seem like SML-IF can be more deterministic. Because documents are allowed to be included "by reference", and resolving these references may involve using (in Java terms) entity resolvers, which may return the same resource for different URIs, hence making it very difficult or impossible to tell whether 2 elements are meant to be the same. So I suggest that SML-IF should following the same suggestion as that for SML core.
Moved keyword "needsAgreement" from "status whiteboard" field to "keyword" field.
The actual text in the spec says "When a scheme or multiple schemes in a reference resolve to more than one target then the model is declared invalid." While the original author may not have intended this to allow for the cases where one or more schemes is not resolveable, it fits with that interpretation. In view of the discussions at the F2F and Sandy's comment below, I propose that we take the existing wording and strengthen it by addin something like "If the resolution of the reference results in multiple targets and the target elements are not the same then the model is declared invalid." Optionally, we could also add that how a validator detarmines this is implementation defined.
There are 2 different scenarios here and I think we have to address each separately: 1 - A (one) scheme resolves to more that one target (i.e., the target nodeset consists of >1 node). This is clearly invalid. 2 - Two schemes in the same SML reference each resolve to a valid target (single node). If the 2 targets are not the same, then the model is invalid. However, it is not so easy to determine if the 2 targets are the same or not. Sandy and MSM discussed one proposal during the Oct 16 meeting which I'll try to recap here: a. String-compare the uri path part. If the paths are the same, then xpath can tell us if the fragments are the same. If the fragments are not the same, the targets are not the same. If the paths are not the same, validators can stop here and say the targets are not the same. b. Optionally, if the paths are not the same, validators are free to implement further tests for equality if they choose. c. If all optional tests do not prove equivalence, then the targets are not the same. This applies to URIs. What do we say about EPRs? new schemes?
To restate the the proposal from the F2F in a slightly different fashion, and to answer Ginny's questions. The proposal was: 1. There are cases where processors MUST treat nodes as "the same". The only known case is when the targets are identified using URIs in contexts where the URI has all the information about locating/identifying the target, then 2 targets are the same if they are identified by the same (codepoint-by-codepoint comparison) URI. SML URI scheme is one such example. EPR is not because URIs used in EPR don't have all the information. It should be clear from new schemes definitions whether they fall into this category or not. 2. There are cases where processors MUST treat nodes as "different". This happens when there is something available in the element information items for the targets that tells them apart. If there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties. 3. For all other cases, it's impl-dependent whether they treat nodes as different or same. This (especially #2) may sound like a time-consuming task. But I imagine in most implementations, this can be tested very easily. e.g. in DOM, if you only construct one DOM document for each model instance document, then the "==" comparison suffices.
I like Sandys original proposal (in comment# 1). Repeated below for easy reference: ---------- Suggestion for SML core: 1. SML processors MUST NOT treat two element information items as identical (or the same) if they can be distinguished using information in the infoset. (i.e. their infoset properties are not recursively pair-wise equal.) 2. If two element information items have pair-ware equal properties, SML processors MAY still treat them as different, if processors have other ways to distinguish them. ---------- If we talk about any specific scheme, it brings in additional complexity. Even for a scheme that we all know well, such as the SML uri scheme, it is not easy to define identity looking at the scheme data alone. We may have different URIs pointing to the same document. At runtime, an implementation will resolve 2 or more schemes to get 1 element from each resolution. At this point, the implementation has to decide looking at the element instances whether they are the same. Once the result of each resolution is obtained, the implementation does not really need to look at scheme data to decide whether the element instances are the same or not. It can apply the above criteria to decide equality. In most implementations, where each instance document has only 1 runtime DOM instance, this simply means invoking the == operator yielding a very fast comparison. Most implementations will never need to perform recursive pair-wise comparison. Since these criteria define equality in terms of element info items (leaving out scheme complexity), they are very easy to define and understand.
ok
In case people are converging on one of the 2 proposals, as listed in comment #5 and comment #6, the following example should show their difference: <ref sml:ref="true"> <sml:uri>file:///c:/abc/def.xml</sml:uri> <my_uri_scheme:uri>file:///c:/abc/def.xml</my_uri_scheme:uri> </ref> where "my uri scheme" is identical to SML URI scheme except that it restricts the references to the "file" scheme. Proposal in comment #5 requires that processors MUST treat the 2 schemes as resolving to the same target and the "at most one target" constraint is satisfied. Proposal in comment #6 leaves it to implementations.
I thought about this issue more and I see value in going with the proposal in comment# 5. The proposals in comment# 1 and comment# 5 are essentially identical except for one difference. The one in comment# 5 tightens the restriction on what must be treated equal when sufficient info is available. This is a good thing.
pls fix as per Comment #5
The following has been added to section 4.2.2 Consistent Reference Schemes. The complete section now reads: ---------------- 4.2.2 Consistent Reference Schemes An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not. To determine if two targets are the same or different, a model validator MUST obey the following rules. 1. A model validator MUST consider both targets to be the same when the scheme is defined such that all information required to locate the target is contain within the scheme and a case-sensitive, codepoint-by-codepoint comparison of the two reference scheme instances determines that the scheme representations are identical. This is the case with the 4.3.1 SML URI Scheme. Two targets MUST be considered the same if they are identified by the same URI as determined by a case-sensitive, codepoint-by-codepoint comparison. New schemes MUST state whether they fall into this category or not. 2. A model validator MUST consider both targets to be different when there is something available in the element information items for the targets that tells them apart. For example, if there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties. 3. For all other cases, it is implementation-defined whether to treat the targets as the same or not. ============ Note: the EPR scheme definition must be updated to comply with #1.
Based on a recommendation from Sandy, bullet #1 has been clarified. The section now reads: ================================ 4.2.2 Consistent Reference Schemes An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not. To determine if two targets are the same or different, a model validator MUST obey the following rules. 1. A model validator MUST consider both targets to be the same when (a) the scheme(s) used to locate the targets use URIs or IRIs, (b) these URIs or IRIs contain all information required to locate the targets, and (c) the two URIs or IRIs used to locate the targets are identical using a case-sensitive, codepoint-by-codepoint comparison. The 4.3.1 SML URI Scheme satisfies conditions (a) and (b). Whether new schemes satisfy these conditions will be clear from their scheme definitions. 2. A model validator MUST consider both targets to be different when there is something available in the element information items for the targets that tells them apart. For example, if there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties. 3. For all other cases, it is implementation-defined whether to treat the targets as the same or not.
Fix per comment #12 to include rewording of last sentence in bullet #1 to state that scheme authors should specify whether the scheme satisfies condition a and b.
Also change section title.
Changed last sentence in bullet #1 from: ------- Whether new schemes satisfy these conditions will be clear from their scheme definitions. ------- to: ------- Authors of new SML reference schemes MUST specify whether or not the scheme satisfies conditions (a) and (b). ------- Changed section title from: Consistent Reference Schemes to: Identical Targets
+1 for applied changes, as described in comments #12 and #15
Some organization changes suggested in http://lists.w3.org/Archives/Public/public-sml/2007Dec/0035.html
The following editorial (organizational) change was made per comment #17. This affects only 4.2.1 and 4.2.2 (and a newly created 4.2.3). ===================== 4.2.1 At Most One Target Every non-null reference MUST target at most one element in a model. When a recognized scheme in a reference resolves to more than one target then the model MUST be declared invalid. 4.2.2 Consistent References An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not. 4.2.3 Identical Targets To determine if two targets are the same or different, a model validator MUST obey the following rules. ... remainder of section unchanged....
#1 in 4.2.3 contains options a/b/c in the same paragraph. They should use a bulleted list. The same bullet requires scheme authors to specify whether their scheme satisfies a & b. If so, we must update the definitions of URI scheme and EPR scheme accordingly. Currently this is located in 4.2.3 (and only for URI scheme). This should be grouped together with the def of each scheme.
Suggest to merge this requirement "Authors of new SML reference schemes MUST specify whether or not the scheme satisfies conditions (a) and (b)." with the 3rd requirement in scheme definition: "3. An assertion whether the scheme can be used in an SML-IF [SML-IF 1.1] document to reference documents in the interchange set." They are essentially about the same condition: URIs/IRIs that contains all the information. I had hoped that we didn't need the 3rd requirement, because whether (a) and (b) are satisfied should be clear from the 2nd requirement in scheme definition: "2. The set of rules that, when evaluated, resolve the containing reference to a set of target element nodes." But it's now clear that many WG members feel the need to explicitly specify this requirement. Then maybe we should try to define a term for things that satisfy (a) and (b), then refer to it from all 3 places: 1. 3rd requirement of scheme definition (section 4.3) 2. target identity (section 4.2.3) 3. (In IF) references that get baseURI/alias treatment (IF section 5.3.3) Something like "self-contained URI" or "complete URI" or ... (Not good at coming up with names.)
I agree with comment #20 and with making an appropriate change to the URI scheme per comment #19.
Fix per comment #20 and comment #19.
=>New definition in spec: Target-complete URI A target-complete URI is a URI or IRI that contains all the information required to locate a target of an SML reference. ========= =>section 4.2.3, 1st point now reads: 4.2.3 Identical Targets To determine if two targets are the same or different, a model validator MUST obey the following rules. 1. A model validator MUST consider both targets to be the same when both of the following are true. a. The definition of the scheme(s) used to locate the targets specifies that the scheme uses target-complete URIs. [4.3 Reference Schemes] b. The two URIs or IRIs used to locate the targets are identical using a case-sensitive, codepoint-by-codepoint comparison. ======== =>section 4.3, rules for defining reference schemes now reads: All of the following MUST be defined for each SML reference scheme, 1. The set of rules that, when satisfied, identify a reference element as containing one and the only instance of the scheme within that reference element. 2. The set of rules that, when evaluated, resolve the containing reference to a set of target element nodes. 3. An assertion that states whether or not the scheme uses target-complete URIs. =========== =>section 5.3.3 in SML-IF now reads: When processing an SML-IF document, there are 3 categories of URI references that may need to be resolved: 1. schemaLocation attributes on xs:include and xs:redefine in schema documents, when they are model definition documents. 2. Target-complete URIs [SML 1.1] used in SML reference schemes. For a URI reference to be in this category, its non-fragment URI components have all the information to uniquely identify at most one model document that potentially contains the target(s) of the URI reference. 3. URI references used in SML reference schemes which are not in category #2. ...
Per 1/3/08 meeting, fixed URI scheme to agree with new scheme definition requirements. Changed 3rd bullet point in URI scheme definition FROM: 3. The SML URI Scheme can be used in an SML-IF [SML-IF 1.1] document to reference documents from the interchange set. TO: 3. The SML URI Scheme's uri element contains a target-complete URI.
Some further changes suggested in http://lists.w3.org/Archives/Public/public-sml/2008Jan/0056.html
*** Bug 5387 has been marked as a duplicate of this bug. ***
Fix per comment #25 along with Kirk's response.
RE item 27: See comment regarding section 5.3.3 in SML-IF in: http://lists.w3.org/Archives/Public/public-sml/2008Jan/0058.html
Fixed per Sandy's proposal, comment #25. See SML diff at http://dev.w3.org/cvsweb/2007/xml/sml/build/sml.html.diff?r1=1.146&r2=1.147&f=h In SML-IF, the beginning of section 5.3.3 now reads: 5.3.3 URI Reference Processing When processing an SML-IF document, there are 3 categories of URI references that may need to be resolved: 1. schemaLocation attributes on xs:include and xs:redefine in schema documents, when they are model definition documents. 2. Target-complete URIs [SML 1.1] used in SML reference schemes. For a URI reference to be in this category, its non-fragment URI components have all the information to uniquely identify at most one model document that potentially contains the target(s) of the URI reference. 3. URI references used in SML reference schemes which are not in category #2.
All changes made to the editors' drafts look good. Note that the text quoted in comment #29 for SML-IF seems to be the old text. The editors' draft has the correct version.