This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
I had always assumed that when we specified processor conformance profiles, I.e. what's now captured as Checklist of implementation-defined features and Terminology for implementation-defined features (§D), we would include an axis for schemaLocation hints. These might be captured in something like an: Appendix D.2.5: xsi:schemaLocation policies Unconditionally follow xsi:schemaLocation Applies to a processor that dereferences every supplied xsi:schemaLocation, and which reflects a (fatal) processor-specific error if any one or more such references fail to resolve to schema documents for the appropriate namespace. Conditionally follow xsi:schemaLocation Same as above, but no error is reflected if any one or more such references fail to resolve, resolve to something other than a schema document, or to a schema document for the wrong namespace. If any of those conditions occur, then that schemaLocation is treated as if it were not supplied. Unconditionally ignore xsi:schemaLocation Applies to a processor which in all cases ignores xsi:schemaLocation attributes in instance documents. Maybe or maybe not we would have a similar: Appendix D.2.6: Policies for schemaLocation attributes on xsd:import I think we should briefly discuss the possibility have including such terminology. We obviously have users who wish to have schemaLocation hints reliably followed or reliably ignored, and I think that providing this terminology will help in a) the documentation of processors providing such features and b) the specification of systems that use XML schema and that depend on particular policies for schemaLocation handling. Noah
This information (unconditionally or conditionally follow, or unconditionally ignore, schemaLocation hints) looks at first glance as if it were mostly covered by the current appendix D.2 (both in the most recent public working draft, and in the status quo documents). The list of possible component sources in D.2.1 includes schemaLocation hints, and the introductory prose in D.2 says "General-purpose processors SHOULD ... provide user control over which methods are used and how to fall back in case of failure." That covers the distinction drawn in the sample text in the description, does it not? But since the originator of the comment is clearly familiar with appendix D, I assume the comment is asking for something that is not in fact already there. I don't know what, though. Could you elaborate? D.2 does not provide terminology for "how to fall back in case of failure"; is the proposal in essence that appendix D should define standard terminology to distinguish fatal errors, non-fatal errors, warnings, or silence on the part of processors? I wonder: is something like that really needed? The topic worries me; I fear that discussion on that topic would prove to be a tar-pit; members of the WG and readers of WG minutes will recall that even the distinctions made in 5.2 between strict wildcard validation and lax wildcard validation struck some WG members as saying too much about the context within which validators operate. Trying to provide standard terminology for behavior when a URI does or doesn't resolve, or resolves to something unexpected, seems like a very high-cost, and relatively low-benefit, errand. If serious readers believe that D.2 as currently drafted requires either that references always succeed or that failures of reference never be errors, then it might be worth adding a sentence or two to dispel that confusion. If on the other hand the proposal is that we should provide such terminology, but ONLY for use in describing behavior vis-a-vis schemaLocation hints, and not for use when other locations are consulted, then I don't understand the motive for the lack of orthogonality. (In reviewing the relevant text just now, I note two points that need correction: in the intro to D, for "and to provide user control" read "and provide user control". And in the list of component sources, either edit the entry for schemaLocation hints to cover schemaLocation hints in schema documents [hints in the case of import, at least], or add a separate entry for them.)
During the WG telcon on 16 March, the WG adopted a wording proposal which addresses this issue by adding new terms to the appendix on terminology for process-variable behavior in schema construction. So I'm marking this as FIXED. Noah, as the originator, please change the status from RESOLVED to CLOSED to indicate your assent to the decision; if you don't do so, in a couple of weeks someone else will on your behalf.
I believe that there is in principle an opportunity to do better if we had the time and were so inclined, but I think that what the workgroup has agreed is an acceptable compromise. As a signal of the sorts of things I think one could do, one could have properties such as: schemaLocationHintsIgnored: if true, the processor guarantees not to dereference a URI as a result of its appearance in an instance schemaLocation. There are some other variants possible in principle. Given my feelings, I would have some temptation to resolve the issue with status LATER, but for several reasons I will mark it CLOSED: 1) I think this is in the spirit of what the workgroup has agreed, and I have no objection at all to that agreement. While this is not the only possible or most aggressive resolution, it is a reasonable one IMO. 2) The terminology in Appendix D in no way restricts the conformance profiles that others may choose to document. Accordingly, if any other conventions prove important, no changes to the recommendation will be necessary. 3) There is always the opportunity to open a new issue if new information obtained as a result of experience with Schema 1.1 suggests that further work would be beneficial. Noah
For the record, the reason the editors did not propose, and WG did not consider, adding a keyword like schemaLocationHintsIgnored with the semantics described in comment # 3 is that the proposition "this process ignores schema location hints in the instance" is already expressible with the existing vocabulary defined in appendix D (specifically, as noted in comment #1, in appendix D.2.1).
I'm feeling dense. I've looked at comment #1 and Appendix D.2.1 of the editor's draft, both before posting comment #3 and again. I am seeing how to say "Try to dereference hints" (schemaLocation hints in D.2.1); I am seeing how to say "There exist non-schemaLocation sources of some schema documents to be used (hard-coded schema locations, named pairs, schema documents, etc ); I am seeing that some "schemas" (I presume for things like HTML) can be built into a validator (hard-coded schemas). What I'm still missing is: if you see a schemaLocation in an instance, perhaps for some namespace not addressed by the other mechanisms, or perhaps for a namespace for which some declarations already were brought in by those other mechanisms, you MUST NOT dereference the schemaLocation URI and in any case MUST NOT use it as inspiration to change the schema you would have otherwise constructed. What am I missing?
Perhaps we are imagining different uses for the terms defined here. I expect them to be used in describing processors, and the requirement to be allowing the behaviors described in the initial description, in comment #3, and in comment #5 to be described in English prose using the terms defined in the spec. So to use the existing terminology to say that schema location hints in the instance are not followed, it would suffice to say something like Schema location hints in the document instance are not followed. or The --nohints option means that the processor should not follow schemaLocation hints in the document instance, even if no components for the namespace in question are available. If you want something more elaborate, I can offer the following sample documentation for an imaginary processor named Figment which can be invoked with run-time options directing the various behaviors described. ... Figment schema construction options ... Figment assembles a schema by looking for schema components in different places. In general, for each namespace used in the input document as the namespace of any element or attribute, Figment looks for schema components. The user may control where Figment looks for components by means of the --where and --how options: --where=LOCATION LOCATION may be any of: cache: look in Figment's local schema cache cli: look for a location passed on the command line, using the --load option ask: ask the user by means of a prompt on stderr ns: dereference the namespace name hints: look in the locations indicated in xsi:schemaLocation attributes in the input The --where option can be given more than once on the command line. If no --where options are specified, the default behavior of Figment is equivalent to --where=cache --where=hints --where=ns. If any --where options are specified, Figment will look only in the indicated locations. --how=METHOD METHOD may be: literal: Figment will attempt to dereference the URI given as a location. If that produces a schema document, Figment will read the schema document and load the components it defines. catalog: means that Figment will look up the URI in the Oasis XML catalog at /usr/local/Figment/catalog and attempt to dereference the location given by the catalog, if any. rddl: means that if dereferencing a URI produces a RDDL document, Figment will look for the well-known purpose Figment-validation, and follow the link given. The --how option can be given more than once. The order of options determines the order in which the methods are tried. If --eager=yes is specified, then all methods will be tried for each namespace; if --eager=no is specified, then later methods will be tried only if earlier methods don't succeed in finding a schema document. The default is --how=catalog --how=literal --how=rddl The --how option does not affect searching in the Figment cache. The --eager option controls what Figment does when it succeeds in finding a schema document which defines components for the namespace in question. --eager=yes means Figment will read and process the schema document it has found, and then continue looking for more components in the namespace, using other methods or in other locations, until there are no more places to look. --eager=no means Figment will read and process the schema document it has found, and stop looking for components for the namespace. The --onfailure option controls what Figment does when searching for components for a given location fails to produce any schema documents for the namespace being sought. --onfailure=continue means that Figment will try the next location on the list. --onfailure=halt means that Figment will stop looking for components for this namespace and move on to the next namespace --onfailure=error means that Figment will stop looking for components, issue an error message, and move on to the next namespace --onfailure=fatal means that Figment will stop looking for components, issue an error message, and exit. No validation will be performed. These options can be used to produce a variety of behaviors. The following examples are drawn from discussions of schema construction in public records of the XML Schema Working Group. 1) Unconditionally follow xsi:schemaLocation Applies to a processor that dereferences every supplied xsi:schemaLocation, and which reflects a (fatal) processor-specific error if any one or more such references fail to resolve to schema documents for the appropriate namespace. --where=hints --how=literal --onfailure=fatal 2) Conditionally follow xsi:schemaLocation Same as above, but no error is reflected if any one or more such references fail to resolve, resolve to something other than a schema document, or to a schema document for the wrong namespace. If any of those conditions occur, then that schemaLocation is treated as if it were not supplied. --where=hints --how=literal --onfailure=continue 3) Unconditionally ignore xsi:schemaLocation Applies to a processor which in all cases ignores xsi:schemaLocation attributes in instance documents. This one can be achieved using any set of options that does not include --where=hints. For example: --where=cache --where=cache --where=namespace --how=catalog --how=literal --eager=yes --onfailure=continue ... End of Figment schema construction options ... I think example 3) illustrates that what is requested in comment #3 and comment #5 is possible. Or am I missing something?
Michael Sperberg-McQueen writes: > Perhaps we are imagining different uses for the > terms defined here. I expect them to be used in > describing processors, and the requirement to be > allowing the behaviors described in the initial > description, in comment #3, and in comment #5 to > be described in English prose using the terms > defined in the spec. Ooops, yes we are. Were it earlier in the process I would suggest that would be a reason for reopening the issue. Given where we are in going to Last Call, I can let it go. FWIW, my intended use of the D.2 terminology would be more along the lines of saying in my processor documention: "This processor implements schema location strategies XXX, YYY and ZZZ", where XXX, YYY ZZZ are terms from D.2.x. Interestingly, taken together, the text seems mildly contradictory on this point. D.2 says: "Conforming processors may implement any combination of the following strategies for locating schema components, in any order. They may also implement other strategies.", suggesting that terms like "hard coded schemas" are not just handy noun phrases for use in constructing sentences in conformance prose, they are something you can conform to, as I had expected. Then D.2.1 says: "Some terms describe how a processor identifies locations from which schema components can be sought:" which says, as you suggest, "these are just terms with definitions." Anyway, I think it's late for rototilling this. I'm a little disappointed not to have noticed what appears to be a deeper misunderstanding than my original question about schemaLocation hints. Still, unless this exchange leads you or others to want to discuss again and clarify the draft, I'm willing to let it go in the interest of moving on. Noah