This document is also available in these non-normative formats: XML.
Copyright © W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This finding attempts to address the questions "When should URNs or URIs
with novel URI schemes be used to
name information resources for the Web?" and "Should registries be provided for
such identifiers?". The answers given are "Rarely if ever" and "Probably not". Common arguments in favor
of such novel naming schemas are examined, and their properties compared with
those of the existing http:
URI scheme.
Editorial note: HST | 2006-03-14 |
Further to a request from Roy Fielding, I had a brief look at XCAP, seems to be using http: URIs now, although it introduces a new Application UID registry, and uses ietf: URNs for its namespaces. . . If anyone (including Roy) remembers what Roy was particularly concerned at here, please let me know. |
This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG issue URNsAndRegistries-50.
This is the second draft of this finding. This finding is an editorial draft, not yet accepted by the TAG.
Additional TAG findings, both accepted and in draft state, may also be available. The TAG expects to incorporate this and other findings into [what?] that will be published according to the process of the W3C Recommendation Track.
Editorial note: HST | 2005-03-29 |
Are we ready to tell the world what will follow AWWW? |
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Introduction
2 Examining the need for new approaches to naming information resources
2.1 Persistence
2.2 Standardized
2.3 Protocol Independence
2.4 Location Independence
2.5 Structured names
2.6 Uniform access to metadata
2.7 Rich authority
2.8 Trusted resolution
3 The value of http: URIs
4 Case study: Naming namespaces
5 Detailed Illustration
In [AWWW] we find the following recommendations:
"A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource."
"A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources."
"Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource."
"A URI owner SHOULD provide representations of the resource it identifies."
Recently, however, a number of proposals have emerged to create new
identification mechanisms for the Web. They propose new URN (sub-)namespaces or
URI schemes and provide
registries for instances thereof, in order to allow them to be used to identify
and retrieve information resources. This
would appear to be incompatible with [AWWW]'s simple positive
recommendations. In this finding we enumerate the arguments given in favor of
these new proposals, which often turn out to be arguments against
using http:
URIs, and explain why they are mistaken and how the above
principles can be understood to point the way constructively to alternative
designs which do in fact make use of http:
URIs.
This section is structured in terms of goals or requirements for resource
identification mechanisms which have been offered as justifications for
adopting a new approach. They are drawn from a number of recent proposals
([RFC 3688], [oasis URN], [XRI])
abstracting, merging and summarizing them. [Definition: Throughout these summaries we will refer to instances of the proposed new identifier mechanism as NRIs.] In each case we state the requirements and examine the extent to which the existing http:
-based identifier mechanism addresses them.
NRI Goal
The relation between NRIs and the information resource they identify should persist indefinitely.
Or, more realistically, that individual NRIs should manifest syntactically whether or not they are intended to persist indefinitely.
This goal is difficult to get to grips with, as it appears to mean different things in different contexts:
At its simplest, this is just a wish for an end to 404 Not
Found
, i.e. that you should always be able to resolve an NRI.
In the Information Science community, 'persistence' is a stronger requirement, namely, that what you get when you resolve an NRI should never change.
http: fact
http:
URIs support persistence as well as
it is in-practice possible to do so.
As has been frequently observed, achieving either of the numbered types of
persistence above is not a
technology issue, it's a management issue. It's up to the owners and operators
of the mechanisms which implement NRI resolution to enforce whatever degree
of persistence they choose. It follows that there is no difference here
between NRI and http:
.
What of the more sophisticated reading, that an NRI should manifest its
minter's intentions with respect to persistence? That's just a
matter of naming conventions, and perfectly possible using http:
. We could, for example, say that all
versionable/time-varying resources on our site are named with all lower-case
letters, and all persistent/stable/non-varying resources are named with all
upper-case letters.
NRI Goal
NRIs should be susceptible to standardization within administrative units
This goal appears to be directed at guaranteeing certain invariants, for example with respect to the structure of identifiers and the availability of the resources they identify. This means they should not be creatable in a distributed or unsupervised fashion.
http: fact
Again, this is largely a management issue, not a
technical one. Whatever invariants are in view can as well be enforced on
(sub-parts of) http:
-served resource collections as on those
identified via NRIs.
Nothing in a specification can stop people from uttering URIs of any kind. Domain names are as good, or as bad, at conveying ownership of a particular form of URI as URN namespaces or URI schemes.
Centralized authorities can be established for parts of domain space as easily as for areas "off the web", and enforcement mechanisms can be as effective. For example, my employers constrain the mechanisms by which web pages are accepted for serving from certain parts of their domain so as to enforce invariants both of path structure and content markup.
NRI Goal
Access to resources identified by NRIs should not be dependent on any particular protocol.
Exactly what this means is not clear -- although it is listed as a requirement in several cases, there is little or no discussion, so exactly why it should be a requirement for NRIs is not clear.
http: fact
http:
URIs are no more protocol-dependent
than any other identification mechanism.
For pure naming, that is, if retrieval is never intended,
http:
is as good as any NRI approach, because no protocol at
all is involved. If retrieval is anticipated, then any NRI
approach must specify a mapping to one or more
protocols. All existing NRI approaches in practice specify only one such
mapping, to the HTTP
protocol. So they are in exactly the
same position as http:
-- if for some reason in the
future the HTTP
protocol becomes unavailable or inappropriate,
both NRIs and http:
will have to specify a new mapping.
True protocol independence is difficult to imagine in practice, as many
protocols depend on a tight coupling between message formats and client/server
application models. Protocols which don't allow servers any escape mechanism
are thereby pretty much ruled out as transports for retrieval from NRIs (or
http:
URIs).
It's appropriate to note here that in cases where the necessary form of client/server interaction for a particular kind of information resource, for example streaming video, cannot be provided by the protocols normally associated with existing URI schemes, new schemes may be appropriate. Detailed discussion of this point can be found in [Schemes and Protocols]. But none of the NRI proposals are for resources of this kind.
NRI Goal
NRIs should not be locations.
Practical realities and administrative changes will always defeat any attempt to guarantee that the representation of a particular resource will always be stored in exactly the same host/server/filestore/directory/file. Any naming mechanism which equates locations in that sense with names is by construction inadequate. It follows that this goal is a sensible one.
http: fact
http:
URIs are not locations.
Misunderstanding of http:
URIs as locations has a long and,
in part, justifiable history (they were, after all, originally call Uniform
Resource Locators). But it's not longer justifiable either in principle (the
RFC for URIs [RFC 3986] is quite clear on
the subject) or in practice (there's lots of software support for server-side
management of the
relationship between http:
URIs and their representations). See
for example the classic [Cool URIs] for a more detailed discussion
of these points.
NRI Goal
NRIs should provide for structuring resource identifiers with shareable tags
This requirement has only been suggested by the authors of [XRI]. It amounts to a wish to structure resource names using name/value pairs, with the names having some standardized, widely understood meaning. This requirement is related to requirements appealed to in the design of End Point References [EPRs], [TAG on EPRs].
http: fact
The query component of http:
URIs supports
non-hierarchical structured naming.
It is open to any naming authority to establish
conventions for the use of the query component of http:
URIs under
its control. Since the query component is already structured in terms of
simple name/value pairs, it is a good fit for the requirement.
NRI Goal
NRIs should provide as well for access to metadata about as to representations of a resource.
http: fact
[conneg does this for http: just fine]
NRI Goal
NRIs should allow for sophisticated authority models, including delegation.
http: fact
[not sure]
Editorial note: HST | 2006-03-14 |
Not clear what this means -- the XRI examples suggest it's mostly
about late-binding/encapsulation, e.g.
xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1) and
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1) are
given as examples of 'copies of the same book at two different libraries' |
NRI Goal
NRIs should support trusted resolution.
http: fact
[not sure]
Editorial note: HST | 2006-03-14 |
This appears only in XRIs, and appears to me to be self-contradictory -- it says a "trusted resolution protocol [is] independent of DNS". |
http:
URIsThe http:
URI scheme implements a two-part approach to
identifying resources. It combines a universal distributed naming scheme for
owners of resources with a hierarchical syntax for distinguishing
resources which share the same owner. Widely available mechanisms (DNS and web
servers, respectively) exist to support the use of http:
URIs to
not only identify but actually retrieve representations of information resources.
Any requirement for naming resources, particularly if not only naming but
also retrieval of representations is in prospect, which admits to a similar
decomposition, that is, into a universal owner name and a hierarchical
owner-relative name, can almost certainly be satisfied by the
http:
URI scheme. http:
provides substantial benefits, in terms of installed
software base, user comprehension, scalability and, if required, security, at
very low cost.
Anyone developing an alternative approach, that is, some form of NRI,
should consider carefully whether that approach is either isomorphic to
http:
, or makes covert appeal to http:
for its
implementation. In either case, this strongly suggests that the fundamental
requirements of the new approach do in fact admit to the two-part description
given above, and therefore that http:
itself would be a viable,
and therefore a preferred, way forward.
Editorial note: HST | 2006-03-21 |
This text from DO is currently homeless: A main advantage of http URIs is the use of DNS to allow decentralized creation of vocabularies, but this does bear the cost that humans can be confused by the mixing of location and identifiers. Another possibility is to create a scheme that does not have any protocol associated with it, which I was thinking of at one point. The reason that this does not work and I did not proceed is that it does not address the issue of humans needing to understand context and it does not allow the flexibility of providing a namespace document. |
In this section we look in some detail into some of the background assumptions for the utility of NRIs for one particular purpose, namely for naming namespaces.
A common reason given for needing NRIs for namespace
names, is that an http:
identifier appears to humans as a location and
hence dereferencable. Another common reason is to come up with an
identifier that is location-independent or that is "movable" from one
location to another.
The first argument, that http:
URIs are "locations", is based upon
incomplete understanding of the use of URIs. Any datatype, in this case URIs, exists in a
context. The context will define the use of a URI,
and includes social and technical context. A URI on the side of a van
will convey the social meaning that it can be typed into a browser and
used. Other contexts for the use of URIs include namespace names,
references to documents, and identifiers for things. There is never
the case that a URI is simply "found" without a context.
The case of using NRIs for namespace names is enlightening. Imagine two
scenarios, one using an NRI as a namespace name and another using an http:
URI. The namespace
specification defines a context, which roughly speaking says that a
namespace name should not be considered dereferenceable. Any software
component that is written assuming that any namespace name must be
dereferencable is violating the namespace specification. It may be that
the namespace owner has guaranteed that they will provide a document at
the namespace name, but this must be on a subset of the entire set of
namespace names. Clearly generic XML software should not be written to
assume dereferencability of namespace names.
It is natural for a human reading an XML document with a namespace name
that they do not know to want to understand more about the namespace.
This is why [AWWW] recommends providing a document at a namespace name
that provides both human and machine readable information. The use of
http:
namespace names enables 3 separate scenarios:
an identifier can be created in a decentralized manner;
an identifier may be dereferenced by a person via a browser to aid understanding;
an identifier may be dereferenced by a computer and exploited for automatic processing by reason of its identifying schemas, WSDLs, policies, etc.
These are two distinct interaction patterns, without and with human involvement. The software-only interaction pattern is clearly erroneous if it assumes that a namespace name is dereferenceable, and it is unlikely that XML software written today requires this assumption be valid, but much such software definitely exists which exploits dereferencability when it is present.
Contrasting with this is the approach of using an NRI. An NRI provides an identifier, though in some cases these are not decentralized. A human looking at an xml document with an NRI namespace name will not be confused about whether it is dereferencable or not.
In the http:
identifier scenario, the "location" to be used for
knowledge is embedded in the identifier and available in a decentralized
manner via DNS. In the NRI identifier scenario, the "location" to be
used for knowledge is hardcoded somewhere in the application or in some
property of the NRI such as a URI scheme or URN (sub)scheme. It is substantially easier for software to use a
single identifier and existing DNS/HTTP infrastructure, than to use an
intermediary identifier and quite probably the existing DNS/HTTP
infrastructure.
Imagine that I create a URI scheme called nri
that uses the exact same
syntax as the http
scheme and specifically does not define a protocol.
I can create the nri://example.org/ns/foo
URI and start to use it
as a namespace name. There is no confusion
about the name being dereferencable. But what value is there? If
one of these URIs shows up somewhere in a document, how will the human
find out about the meaning? They must either try to examine the context
surrounding the URI datatype - in which case there is no benefit to nri:
versus http:
as the work is the same - or they try to examine the
namespace name - but it's not deferenceable so they can't do that. The
amount of work is either the same or more using an nri
scheme.
If the scheme definition for nri
says that it
is dereferencable, and specifies a mechanism, then either that
mechanism is HTTP, or it will have to provide all the
functionality, and thus be heir to all the weaknesses, of HTTP.
In either case no benefit has been gained over just using the http
scheme itself.
Namespace names are just one example of a context of use. Any use of the
URI datatype in an XML document has the same issues. A provider of a
URI must specify how the URI will be used in each specific sub-context of their XML language, whether it
is intended as an identifier, a location, or both. Using an NRI instead of an http:
URI does not make the software or human's job any easier.
It's perhaps worth noting that all the NRI proposals include means to
transform an NRI into
a dereferencable address via lookup using some form of registry server. This in turn requires the use of a
deferencable address for the server, or else all software intended
for use with NRIs must have the registry server location
"hard-coded". As far as we can tell all the NRI
proposals expect the results of server lookup to be an http:
URI,
and also appear to use an http:
URL to identify the location of the
registry server.
In this section we compare the sequence of messages between client,
registry(s) and server(s) involved in both simple and complex cases of
retrieval of resources named with NRIs and http:
URIs, in order
to elaborate on the assertions made above about their functional equivalence.
. . .[to be filled in?]. . .