4992 – Object identity needs to be clarified

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4992 - Object identity needs to be clarified

Summary: Object identity needs to be clarified

Status:	RESOLVED FIXED

Alias:	None

Product:	SML
Classification:	Unclassified
Component:	Core (show other bugs)
Version:	unspecified
Hardware:	PC other

Importance:	P2 normal
Target Milestone:	LC
Assignee:	Virginia Smith
QA Contact:	SML Working Group discussion list

URL:
Whiteboard:
Keywords:	resolved

Duplicates (1):	5387 (view as bug list)
Depends on:
Blocks:

Reported:	2007-08-29 17:37 UTC by James Lynn
Modified:	2008-02-07 20:56 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description James Lynn 2007-08-29 17:37:52 UTC

The notion of identity with regard to targets of elements needs to be clarified. For example, the spec dictates that if multiple schemes are present, they must resolve to the same element. It is not easily specified how to detmine whether two elements are in fact the same.

Comment 1 Sandy Gao 2007-09-19 20:23:54 UTC

Agree that identity of element information items is not always possible to detect; and because we don't mandate XPath 2.0, we can't rely on node identity defined in "XQuery 1.0 and XPath 2.0 Data Model (XDM)".

A concrete example:
<p><c/><c/></p>
Nothing in the infoset can tell the 2 "c" sub-elements apart.

Suggestion for SML core:
1. SML processors MUST NOT treat two element information items as identical (or the same) if they can be distinguished using information in the infoset. (i.e. their infoset properties are not recursively pair-wise equal.)
2. If two element information items have pair-ware equal properties, SML processors MAY still treat them as different, if processors have other ways to distinguish them.

Note: The above #2 could be read to allow a (not very useful) conforming SML processor to treat *all* resolvable references as invalid. But it is important to allow it, to be able to use existing XML processing services. (Many XML APIs can tell the difference between the 2 "c" elements in the above example.)

It doesn't seem like SML-IF can be more deterministic. Because documents are allowed to be included "by reference", and resolving these references may involve using (in Java terms) entity resolvers, which may return the same resource for different URIs, hence making it very difficult or impossible to tell whether 2 elements are meant to be the same.

So I suggest that SML-IF should following the same suggestion as that for SML core.

Comment 2 Virginia Smith 2007-10-03 06:12:50 UTC

Moved keyword "needsAgreement" from "status whiteboard" field to "keyword" field.

Comment 3 James Lynn 2007-11-06 01:47:42 UTC

The actual text in the spec says "When a scheme or multiple schemes in a reference resolve to more than one target then the model is declared invalid." While the original author may not have intended this to allow for the cases where one or more schemes is not resolveable, it fits with that interpretation. 
In view of the discussions at the F2F and Sandy's comment below, I propose that we take the existing wording and strengthen it by addin something like "If the resolution of the reference results in multiple targets and the target elements are not the same then the model is declared invalid." Optionally, we could also add that how a validator detarmines this is implementation defined.

Comment 4 Virginia Smith 2007-11-07 23:34:48 UTC

There are 2 different scenarios here and I think we have to address each separately:

1 - A (one) scheme resolves to more that one target (i.e., the target nodeset consists of >1 node). This is clearly invalid.

2 - Two schemes in the same SML reference each resolve to a valid target (single node). If the 2 targets are not the same, then the model is invalid. However, it is not so easy to determine if the 2 targets are the same or not. Sandy and MSM discussed one proposal during the Oct 16 meeting which I'll try to recap here:

a. String-compare the uri path part. If the paths are the same, then xpath can tell us if the fragments are the same. If the fragments are not the same, the targets are not the same. If the paths are not the same, validators can stop here and say the targets are not the same. 
b. Optionally, if the paths are not the same, validators are free to implement further tests for equality if they choose.
c. If all optional tests do not prove equivalence, then the targets are not the same.

This applies to URIs. What do we say about EPRs? new schemes?

Comment 5 Sandy Gao 2007-11-08 17:13:54 UTC

To restate the the proposal from the F2F in a slightly different fashion, and to answer Ginny's questions. The proposal was:

1. There are cases where processors MUST treat nodes as "the same".

The only known case is when the targets are identified using URIs in contexts where the URI has all the information about locating/identifying the target, then 2 targets are the same if they are identified by the same (codepoint-by-codepoint comparison) URI.

SML URI scheme is one such example. EPR is not because URIs used in EPR don't have all the information. It should be clear from new schemes definitions whether they fall into this category or not.

2. There are cases where processors MUST treat nodes as "different".

This happens when there is something available in the element information items for the targets that tells them apart. If there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties.

3. For all other cases, it's impl-dependent whether they treat nodes as different or same.


This (especially #2) may sound like a time-consuming task. But I imagine in most implementations, this can be tested very easily. e.g. in DOM, if you only construct one DOM document for each model instance document, then the "==" comparison suffices.

Comment 6 Kumar Pandit 2007-11-14 06:52:47 UTC

I like Sandys original proposal (in comment# 1). Repeated below for easy reference:

----------
Suggestion for SML core:
1. SML processors MUST NOT treat two element information items as identical (or the same) if they can be distinguished using information in the infoset. (i.e. their infoset properties are not recursively pair-wise equal.)
2. If two element information items have pair-ware equal properties, SML processors MAY still treat them as different, if processors have other ways to distinguish them.
----------

If we talk about any specific scheme, it brings in additional complexity. Even for a scheme that we all know well, such as the SML uri scheme, it is not easy to define identity looking at the scheme data alone. We may have different URIs pointing to the same document. At runtime, an implementation will resolve 2 or more schemes to get 1 element from each resolution. At this point, the implementation has to decide looking at the element instances whether they are the same. Once the result of each resolution is obtained, the implementation does not really need to look at scheme data to decide whether the element instances are the same or not. It can apply the above criteria to decide equality. In most implementations, where each instance document has only 1 runtime DOM instance, this simply means invoking the == operator yielding a very fast comparison. Most implementations will never need to perform recursive pair-wise comparison.

Since these criteria define equality in terms of element info items (leaving out scheme complexity), they are very easy to define and understand.

Comment 7 John Arwe 2007-11-15 17:27:18 UTC

ok

Comment 8 Sandy Gao 2007-11-15 18:30:39 UTC

In case people are converging on one of the 2 proposals, as listed in comment #5 and comment #6, the following example should show their difference:

<ref sml:ref="true">
  <sml:uri>file:///c:/abc/def.xml</sml:uri>
  <my_uri_scheme:uri>file:///c:/abc/def.xml</my_uri_scheme:uri>
</ref>

where "my uri scheme" is identical to SML URI scheme except that it restricts the references to the "file" scheme.

Proposal in comment #5 requires that processors MUST treat the 2 schemes as resolving to the same target and the "at most one target" constraint is satisfied.

Proposal in comment #6 leaves it to implementations.

Comment 9 Kumar Pandit 2007-11-18 07:03:58 UTC

I thought about this issue more and I see value in going with the proposal in comment# 5. The proposals in comment# 1 and comment# 5 are essentially identical except for one difference. The one in comment# 5 tightens the restriction on what must be treated equal when sufficient info is available. This is a good thing.

Comment 10 Pratul Dublish 2007-11-19 21:46:09 UTC

pls fix as per Comment #5

Comment 11 Virginia Smith 2007-11-29 16:49:29 UTC

The following has been added to section 4.2.2 Consistent Reference Schemes. The complete section now reads:

----------------
4.2.2 Consistent Reference Schemes

An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not.

To determine if two targets are the same or different, a model validator MUST obey the following rules.

1. A model validator MUST consider both targets to be the same when the scheme is defined such that all information required to locate the target is contain within the scheme and a case-sensitive, codepoint-by-codepoint comparison of the two reference scheme instances determines that the scheme representations are identical. This is the case with the 4.3.1 SML URI Scheme. Two targets MUST be considered the same if they are identified by the same URI as determined by a case-sensitive, codepoint-by-codepoint comparison. New schemes MUST state whether they fall into this category or not.
2. A model validator MUST consider both targets to be different when there is something available in the element information items for the targets that tells them apart. For example, if there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties.
3. For all other cases, it is implementation-defined whether to treat the targets as the same or not.

============
Note: the EPR scheme definition must be updated to comply with #1.

Comment 12 Virginia Smith 2007-11-30 23:50:52 UTC

Based on a recommendation from Sandy, bullet #1 has been clarified. The section now reads:

================================
4.2.2 Consistent Reference Schemes

An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not.

To determine if two targets are the same or different, a model validator MUST obey the following rules.

   1. A model validator MUST consider both targets to be the same when (a) the scheme(s) used to locate the targets use URIs or IRIs, (b) these URIs or IRIs contain all information required to locate the targets, and (c) the two URIs or IRIs used to locate the targets are identical using a case-sensitive, codepoint-by-codepoint comparison. The 4.3.1 SML URI Scheme satisfies conditions (a) and (b). Whether new schemes satisfy these conditions will be clear from their scheme definitions.
   2. A model validator MUST consider both targets to be different when there is something available in the element information items for the targets that tells them apart. For example, if there is an infoset property for which the 2 targets have different values, they are different. This applies recursively for complex-valued properties.
   3. For all other cases, it is implementation-defined whether to treat the targets as the same or not.

Comment 13 Virginia Smith 2007-12-03 21:31:06 UTC

Fix per comment #12 to include rewording of last sentence in bullet #1 to state that scheme authors should specify whether the scheme satisfies condition a and b.

Comment 14 Virginia Smith 2007-12-03 21:45:01 UTC

Also change section title.

Comment 15 Virginia Smith 2007-12-03 22:13:14 UTC

Changed last sentence in bullet #1 from:
-------
Whether new schemes satisfy these conditions will be clear from their scheme definitions.
-------
to:
-------
Authors of new SML reference schemes MUST specify whether or not the scheme satisfies conditions (a) and (b).
-------

Changed section title 

from:  Consistent Reference Schemes
to:    Identical Targets

Comment 16 Valentina Popescu 2007-12-05 14:44:32 UTC

+1 for applied changes, as described in comments #12 and #15

Comment 17 Sandy Gao 2007-12-05 16:47:57 UTC

Some organization changes suggested in
http://lists.w3.org/Archives/Public/public-sml/2007Dec/0035.html

Comment 18 Virginia Smith 2007-12-05 22:27:51 UTC

The following editorial (organizational) change was made per comment #17. This affects only 4.2.1 and 4.2.2 (and a newly created 4.2.3).
=====================
4.2.1 At Most One Target

Every non-null reference MUST target at most one element in a model. When a recognized scheme in a reference resolves to more than one target then the model MUST be declared invalid.

4.2.2 Consistent References

An SML model MUST be declared invalid when a recognized scheme resolves to a target that's different from the target resolved to by another recognized scheme or when one recognized scheme resolves and another does not.

4.2.3 Identical Targets

To determine if two targets are the same or different, a model validator MUST obey the following rules. 

... remainder of section unchanged....

Comment 19 Kumar Pandit 2007-12-06 06:46:12 UTC

#1 in 4.2.3 contains options a/b/c in the same paragraph. They should use a bulleted list.

The same bullet requires scheme authors to specify whether their scheme satisfies a & b. If so, we must update the definitions of URI scheme and EPR scheme accordingly. Currently this is located in 4.2.3 (and only for URI scheme). This should be grouped together with the def of each scheme.

Comment 20 Sandy Gao 2007-12-06 14:31:20 UTC

Suggest to merge this requirement

"Authors of new SML reference schemes MUST specify whether or not the scheme
satisfies conditions (a) and (b)."

with the 3rd requirement in scheme definition:

"3. An assertion whether the scheme can be used in an SML-IF [SML-IF 1.1] document to reference documents in the interchange set."

They are essentially about the same condition: URIs/IRIs that contains all the information.

I had hoped that we didn't need the 3rd requirement, because whether (a) and (b) are satisfied should be clear from the 2nd requirement in scheme definition:

"2. The set of rules that, when evaluated, resolve the containing reference to a set of target element nodes."

But it's now clear that many WG members feel the need to explicitly specify this requirement. Then maybe we should try to define a term for things that satisfy (a) and (b), then refer to it from all 3 places:
1. 3rd requirement of scheme definition (section 4.3)
2. target identity (section 4.2.3)
3. (In IF) references that get baseURI/alias treatment (IF section 5.3.3)

Something like "self-contained URI" or "complete URI" or ... (Not good at coming up with names.)

Comment 21 Virginia Smith 2007-12-13 18:13:23 UTC

I agree with comment #20 and with making an appropriate change to the URI scheme per comment #19.

Comment 22 Virginia Smith 2007-12-13 19:19:08 UTC

Fix per comment #20 and comment #19.

Comment 23 Virginia Smith 2008-01-03 16:44:12 UTC

=>New definition in spec:

Target-complete URI
    A target-complete URI is a URI or IRI that contains all the information required to locate a target of an SML reference.

=========
=>section 4.2.3, 1st point now reads: 

4.2.3 Identical Targets

To determine if two targets are the same or different, a model validator MUST obey the following rules.

   1.   A model validator MUST consider both targets to be the same when both of the following are true.
         a.  The definition of the scheme(s) used to locate the targets specifies that the scheme uses target-complete URIs. [4.3 Reference Schemes]
         b.  The two URIs or IRIs used to locate the targets are identical using a case-sensitive, codepoint-by-codepoint comparison.

========
=>section 4.3, rules for defining reference schemes now reads:

 All of the following MUST be defined for each SML reference scheme,

   1. The set of rules that, when satisfied, identify a reference element as containing one and the only instance of the scheme within that reference element.
   2. The set of rules that, when evaluated, resolve the containing reference to a set of target element nodes.
   3. An assertion that states whether or not the scheme uses target-complete URIs.

===========
=>section 5.3.3 in SML-IF now reads:

When processing an SML-IF document, there are 3 categories of URI references that may need to be resolved:

   1. schemaLocation attributes on xs:include and xs:redefine in schema documents, when they are model definition documents.
   2. Target-complete URIs [SML 1.1] used in SML reference schemes. For a URI reference to be in this category, its non-fragment URI components have all the information to uniquely identify at most one model document that potentially contains the target(s) of the URI reference.
   3. URI references used in SML reference schemes which are not in category #2. 
...

Comment 24 Virginia Smith 2008-01-08 18:11:11 UTC

Per 1/3/08 meeting, fixed URI scheme to agree with new scheme definition requirements. Changed 3rd bullet point in URI scheme definition
FROM:
3. The SML URI Scheme can be used in an SML-IF [SML-IF 1.1] document to reference documents from the interchange set.

TO:

3. The SML URI Scheme's uri element contains a target-complete URI.

Comment 25 Sandy Gao 2008-01-17 15:38:44 UTC

Some further changes suggested in

http://lists.w3.org/Archives/Public/public-sml/2008Jan/0056.html

Comment 26 Virginia Smith 2008-01-17 19:40:25 UTC

*** Bug 5387 has been marked as a duplicate of this bug. ***

Comment 27 Virginia Smith 2008-01-17 19:58:35 UTC

Fix per comment #25 along with Kirk's response.

Comment 28 Kirk Wilson 2008-01-17 20:31:46 UTC

RE item 27:  See comment regarding section 5.3.3 in SML-IF in:

http://lists.w3.org/Archives/Public/public-sml/2008Jan/0058.html

Comment 29 Virginia Smith 2008-01-28 21:44:15 UTC

Fixed per Sandy's proposal, comment #25. See SML diff at
http://dev.w3.org/cvsweb/2007/xml/sml/build/sml.html.diff?r1=1.146&r2=1.147&f=h

In SML-IF, the beginning of section 5.3.3 now reads:
5.3.3 URI Reference Processing

When processing an SML-IF document, there are 3 categories of URI references that may need to be resolved:

   1. schemaLocation attributes on xs:include and xs:redefine in schema documents, when they are model definition documents.
   2. Target-complete URIs [SML 1.1] used in SML reference schemes. For a URI reference to be in this category, its non-fragment URI components have all the information to uniquely identify at most one model document that potentially contains the target(s) of the URI reference.
   3. URI references used in SML reference schemes which are not in category #2.

Comment 30 Sandy Gao 2008-01-31 18:34:47 UTC

All changes made to the editors' drafts look good.

Note that the text quoted in comment #29 for SML-IF seems to be the old text. The editors' draft has the correct version.