http:
URIs to address the requirementshttp:
URIs are identifiablehttp:
URIs are useableThe first "Good practice" note in AWWW gives the following advice:
To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources.
But existing Web technologies and specifications offer a wide range of options to anyone embarking on a project which involves the creation and management of Web-enabled names. As a result, for anyone attempting to follow the above advice, a question immediately arises, namely "What kind of URIs should agents provide?"
This document is intended to supplement AWWW by addressing this question, for which AWWW does not provide much help.
Following the precedent of AWWW, we proceed in terms of a simple scenario which supplies motivating context for a subsequent more detailed exploration of requirements for Web-enabled names and best practices for addressing those requirements.
Nadia and Dirk's work on film industry data (see AWWW section 4.2.2) has been successful up to a point, but their employers, a consortium of film studios called FSC, are not happy that they have used URIs from a widely-used public film database to identify films, actors, directors, etc. "There's no guarantee those URLs will always give the right information" says Robin, the project officer at FSC, "and anyway, FSC should control all of this. And also, we want names, not old-fashioned locations. Isn't there a way we can have a bunch of names that are clearly ours, and guarantee what they identify?"
At this point Dirk and Nadia both reply at once:
urn:fii:studio:megapictures
."fii://studio/megapictures/
.""Well, sounds like there are some choices to make. You two get your story straight, and bring me back a proposal with a rationale and some costings", says Robin.
So, who's right? What should the consensus proposal look like? Not
surprisingly, that depends on the requirements, which need to be articulated
in a lot more detail than is provided by Robin's off-the-cuff remarks. In what follows
we'll explore the requirements space and the solution space, and conclude that
in a large number of cases both Dirk and Nadia are on the wrong track, because
http
-scheme URIs can satisfy Robin's requirements while looking
very attractive from a cost-benefit perspective.
What Robin says FSC wants for its web-enabled naming scheme is very close to what many groups and projects have specified as requirements in similar situations: they all want names which are identifiable, reliable and stable. The technology behind the names should allow delegation of naming authority and provide uniform access to metadata.
It's important to be clear what each of these requirements really means, so let's start with a more detailed list, making some key distinctions along the way, and reflecting a common rough order of importance:
Hereafter we'll use the phrase [URI] in the scheme to refer to the set of URIs identifiable as part of a naming scheme, and the phrase access a URI as shorthand for the retrieval of a representation of the resource identified by a URI.
https:
to bang out of the park?If the distinction between resource stability and representation stability isn't clear, consider the difference between http://www.w3.org/ and http://www.w3.org/TR/1998/REC-xml-19980210. The each identify a resource, the W3C home page and the first edition of the XML language specification, respectively. The W3C observes the maxim "Cool URIs don't change", which is to say, it is commited to resource stability across the board. The specific consequence of that for the examples at hand is that the W3C is commited to maintain the use of those two URIs to identify those two resources, in perpetuity. But the W3C is only commited to representation stability for the second of these two URIs. Indeed a significant contribution to the value of the first URI, which identifies the W3C home page, is that the representations which can be retrieved from it are not stable: they change on a regular basis, to provide up-to-date information about W3C activities. On the other hand, representation stability is important for the date-containing-URI names of W3C specifications: the W3C is commited to always provide exactly that original representation of the XML specification.
So, let's try a rewrite: If the distinction between resource stability and representation stability isn't clear, consider the difference between http://www.w3.org/ and http://web.archive.org/web/19990503021459/http://www13.w3.org/. The each identify a resource, the W3C home page as such and the W3C home page as it was at a particular date and time, respectively. The W3C observes the maxim "Cool URIs don't change", which is to say, it is commited to resource stability across the board. The specific consequence of that for the first example is that the W3C is commited to maintain the use of that URI to identify the resource which is its home page in perpetuity. But the W3C is not commited to representation stability. Indeed a significant contribution to the value of that URI is that the representations which can be retrieved from it are not stable: they change on a regular basis, to provide up-to-date information about W3C activities. On the other hand the value of the second URI depends on representation stability: that is, retrieving from that URI will always give you the same representation the Wayback machine retrieved from the W3C site on May 3rd 1999.
A link to my PGP public key, which includes a SHA1 digest of the representation it should retrieve. . ..
http:
URIs to address the requirementsAfter looking into their requirements more carefully, as tabulated
above, and investigating the space of possible solutions more thoroughly, Dirk
and Nadia come to the conclusion that a carefully designed scheme using
http:
URIs can come as close if not closer to satisfying all their
requirements than either of their original suggestions, or any other
non-http:
scheme.
http:
URIs aren't perfect, and in some cases there are
trade-offs: fully satisfying one requirement may require some compromises with
respect to another: The Challenges and
tradeoffs section below introduces some of these, but a complete
analysis is beyond the scope of this finding.
http:
URIs are identifiableThe desire for branding to be evident in URIs is both widespread and understandable. URI identifiability is a form of advertising, where the admittedly modest impact of a single use of an identifiable URI is potentially magnified greatly by widespread replication. Identifiability also is a cornerstone of trust: brand recognition and successful URI access are mutually reinforcing.
RFC 3986 [ref], the standard which governs all URIs,
provides for both a registry-based
authority segment and a local, typically hierarchical, path segment
in URIs, and recommends both, together with the use of the IANA Domain Name system [ref IANA] for
the authority segment, for any URI scheme that intends to be global in scope.
http:
URIs do exactly that, and so clearly follow the RFC
recommendation and thus satisfy the identifiability requirement, since all the
participants in a web-targeted naming scheme can be assumed to already have
domain names which are readily identifiable, or can come to be readily
identifiable, as theirs.
http:
URIs are useablePervasive support for http:
URIs is the foundation of the
success of the Web today. A wide range of user agents, not only web browsers,
recognize http:
URIs and know how to access them using widely
deployed software support for the DNS and HTTP protocols.
At the other end, again a wide range of server software is available, both free and commercial, ranging from fully-integrated website and document management systems with support for on-the-fly synthesis of documents to simple lightweight filesystem-backed servers.
With the exception of the legacy ftp:
scheme and the
non-Web file:
scheme, no other URI scheme has anything like this
degree of ubiquity.
Why is this hard? Dirk and Nadia's requirements all seem sensible, and we've had names for use on the Web for nearly 20 years now. What stands in the way of a naming scheme satisfying those requirements?
The stability of name ownership is at risk for at least two pretty much incorrigible reasons: The instability of human institutions, and the contingent nature of name registration.
It is in the very nature of owning anything that the first kind of risk inheres: the owner of a name may sell it, or give it away, or lease it, or indeed go out of business, or sell their business, or the relevant division. Any of these changes amount to or result in a change of ownership.
The second kind of risk arises from the nature of names on the Web. Virtually all naming schemes used on the Web are based on a division of
names into a global part, managed by a global registry, and a local part,
typically involving some form of hierarchical decomposition. The syntax of
URIs is designed to support this decomposition. The RFC
which governs URIs [ref 3986] distinguishes between the authority segment
(registry-based) and the path segment (local, hierarchical) of a URI, and recommends the use of the IANA Domain
Name system [ref IANA] for the authority segment of any URI scheme that intends to be global in scope. Any
naming scheme that follows this recommendation, and thus equates URI ownership to Domain Name ownership,
such as the http:
URI scheme, depends on the stability of ownership of
Domain Names for URI owner stability. But Domain Names are not really
owned, only leased, for fixed terms, with no guarantee of
renewability, with the possibility of expropriation and with the in-principle risk, however unlikely in practice, that the Domain Name
registration system itself may cease to function.
There is ultimately no way around this. In particular, there is no point in proposing naming schemes that use their own registries and/or lookup mechanisms (not involving IANA) solely in order to get around this, because the reasons IANA operates Domain Name registration the way they do, and the vulnerabilities that the Domain Name system has, are universal and inescapable, given the requirements it must satisfy. See the appendix below for a discussion of why this is the case.
The owner of a URI has the right to determine what it means, that is, what resource it identifies, and the responsibility to respond to requests to access representations thereof. It follows that any change of ownership may (but need not) mean a change here too. And even without a change in ownership, control over resource identity and/or responsibility for reliability may change if the owner delegates that control or responsibility, or changes an existing delegation.
The most common threat to both reliability and resource stability in a global plus local naming system is the single point of failure implicit in registry-based ownership. The technical aspect of this isn't the problem: multiple servers, aliasing, failover, etc. are all well-understood, widely and successfully deployed techniques. Rather it's the management aspect: Even if no real or effective change in ownership occurs, once again it's the frailty of human institutions that is the problem: a change in business focus, or loss of interest in the relevant aspect of their business, or just misconfiguration of a DNS entry or server, may compromise reliability or resource stability. That is, at worst, the (new) owner of some names in a scheme will stop responding to requests for what they see as old and irrelevant URIs (a failure of reliability), or, worse, will decide to re-use those 'old' URIs for different resources (a failure of resource stability). Users of the URIs will no longer be able to access such URIs in the most obvious way or will not get what they expect when they do.
Somewhat perversely, the main challenge here is that it's actually rarely if ever really what is wanted -- to tie a URI to a particular character sequence to be interpreted as a particular media type is a very strong constraint.
And if it really is what is wanted, an externally verifiable guarantee is probably wanted as well, which in turn at least compromises transparency, because it means that the URI for a representationally stable resource will have to include both the intended media type and a hash or checksum of the intended character sequence, as for example has become common practice among peer-to-peer sharing of Anime [ref http://animechecker.sourceforge.net/].
The ultimate in delegation is a fully decentralised system, in which anyone can mint URIs in the scheme. The minimum necessary to avoid collisions is the use of a central registry such as the Domain Name system for the authority part of the scheme. The challenge here of course is that there is no place for any structure to ensure that minters of scheme URIs respect whatever constraints the scheme owners have specified to guarantee that other requirements on the scheme are satisfied.
Furthermore, the more entities actually mint scheme URIs, the more likely it is that one of them will undergo one of status changes mentioned above under Challenges to owner stability.
So the fundamental challenge is to find the right point on the continuum from fully centralised to fully decentralised which delivers on all the other requirements.
The desire for branding to be evident in URIs is both widespread and understandable. URI identifiability is a form of advertising, where the admittedly modest impact of a single use of an identifiable URI is potentially magnified greatly by widespread replication.
Identifiability seems to follow naturally from delegation at the highest level: if different entities are free to mint URIs in the scheme, and Domain Names have a place in the scheme, then identifiability is provided. But the previous section suggests that a fully decentralised scheme is unlikely to satisfy other requirements, so a place for identifiability in a less than fully decentralised scheme has to be found.
Many of the requirements listed above are not essentially technical in nature. Rather they are social. That is, they impose conditions on the management of the names, not their essential nature. We'll start by looking at name management policy, then move on to specific mechanisms which can be deployed to assist in name management, or to some extent protect against potential breakdowns of name management policy.
In the simplest, and very common, situation with respect to URIs, resources and representations, the owner of a Domain Name is the implicit proprietor of a naming scheme, consisting of all URIs which use that Domain Name (and its sub-domains) as their authority. The owner decides what resources will be given names in that scheme, what those names will look like, how representations will be stored and/or computed and provides the necessary computing resources for storage, computation and servicing of access requests.
Once more than one party is involved, as is the case in the Dirk and Nadia FII scenario we are considering, choices arise with respect to each of the decisions and provisions just listed, and these choices in turn have implications for the requirements placed on the scheme. In what follows we consider what choices affect the stability and persistence requirements.
Three options arise here:
The same three options arise here:
Once any of those decisions or provisions are placed in hands other than the owner's, we have an instance of delegation. Almost any combination of retention of some aspects control/provision and delegation of the rest is possible in principle—in practice we observe a small number of common patterns, which we will explore below.
Some patterns of delegation can go a long way towards mitigating the negative impact of institutional frailty on naming schemes. There are two primary delegation patterns we will look at: centralisation (or delegation upwards, from the members of a group to the group itself) and replication (or delegation downwards, from a group to its members).
The framework is theoretical, which it has to be in order to catch all the objections that are going to be thrown at it. The general reader (scheme designer) won't be able to apply it without help, so it will have to be augmented (as the existing draft finding was) with examples.
Expropriation on appeal and repossession are not necessarily the issues. One could assert that effective permanence happens sometimes in practice (e.g. scheme names, chemical element names) and could happen more often if we just figured out how, so this document has to answer:
http:
scheme at least twice
since its original binding. There is no guarantee that it won't change again.
[RFC 4395] defines the scheme registration process, and the IANA scheme registry tabulates the registered schemes. [RFC 4395] makes explicit provision for bindings to be changed, and defines the process whereby such changes are made. Insofar, then, as your question means "Is the binding of a URI scheme name to an RFC which defines it permanent", the answer is clearly "no". And note that at a deeper level, the IETF is free to change the whole story, by issuing a new RFC which obsoletes (their term) [RFC 4395]. Indeed [RFC 4395] obsoletes a predecessor which told a different story.tag:
URIs [ref?] are owned by their creator permanently—even if the
host or email address is recycled. This is accomplished by putting
timestamps in the URIs, much as publisher names are bound to
particular corporations by affixing the year of publication in
scholarly citations. Why can't we do the same thing for fii
—maybe
even use tag: URIs with some special protocol?tag:
URIs have no resolution mechanism. If they did,
wouldn't there have to be an appeals process, which would operate in cases
where one of the must nots of section 2.2 is alleged to have been violated?(JAR's naming scheme: some subspace of tag: URIs resolvable through some cockamamie protocol... now convince JAR that he should use http: URIs instead.)
However, I agree that the XRI folks are not putting immortality out as a major issue for them, and my constituency may not be important enough to spend column-inches on right now.
There are evidently stable, managed sets of names in existence: the periodic table, the names of surface features of planets and satellites. What is it about names for use on the Web that precludes true stability for them? The combination of arbitrary, dereferencable and identifiable seems to be the source of the problem. These three together means there is real value in owning a name, and that there can, and therefore will, be dispute about what legal entity gets to use what name. This in turn requires a dispute resolution procedure for registered names, which in turn means expropriation must be possible. Because supporting dereferencing requires resources, owning names incurs costs, which means they will be abandoned, which in turn, along with the fact that name ownership has real value, means that it makes sense to lease, rather than sell, registration.
If we look at existing systems on the Web, that is URN namespaces and URI schemes, which do not rely (entirely) on IANA Domain Names, we find broadly speaking three cases:
doi
(not
registered), and URN namespaces (such as uuid
) use opaque strings,
typically numbers, either self-allocated (uuid
) or via registries (doi
).
Such approaches may involve outright ownership (uuid
), or may not (doi
, at
least from some registrars), and since they don't provide identifiability, need not provide for
expropriation, but they are none-the-less heir to the other vulnerabilities of
owned names.info
and xri
(not registered)
URI schemes, provide identifiability and operate their own registries, distinct
from the IANA Domain Name registry (although their current lookup mechanisms
do rely on the DNS system). The xri
registry is
parallel to IANA's in all the aspects relevant to lack of stability discussed
above. The info
registry is qualitatively quite different, as it
is restricted to names for the operators of large public
namespaces, and is clearly intended to operate in terms of dozens or at most
hundreds of registrations. No appeal or expropriation mechanisms are defined
for it, and since dereferencing is explicitly not required to be
supported, the impact of a registered info
name owner going out of
business is not necessarily very great.tag
URI scheme and the NEWSML
URN namespace, combine
a Domain Name with a date, in an attempt to avoid the majority of the
vulnerabilities we've identified. However, tag
URIs explicitly do
not support resolution, and NEWSML
URN resolution is
left unspecified in principle, and in practice seems not to be supported.http://www.w3.org/1999/xhtml
might be stable even if W3C loses
ownership of w3.org
. . .In summary, a number of schemes exist whose vulnerability to the challenges to ownership stability identified above is reduced, but they all achieve this at the expense of one or more of the Dirk and Nadia's other requirements.
Or, "Put all your eggs in one basket, and watch that basket!" In its simplest form, centralisation means that all the participants in a common naming scheme agree that there will be only one repository of representations, and one domain name used for names. This is technically very simple, but has several major constraints which make it less likely to be a satisfactory solution:
It should be clear that it is straightforward to use http:
URIs to implement this variant of delegation.
Or, "Split up, one of us is bound to survive!" Technical replication at the level of DNS cannot solve the problem of domain name loss. Only methods that involve a second domain name can do that.
In the same way that a web cache or proxy server provides an alternative to a standard DNS lookup plus hierarchy, a naming system may specify an arbitrary alternative algorithm for looking up its URIs. Such an algorithm may be invoked as an alternative to other methods that might contain unreliable steps.
Delegation here means two lookups plus hierarchy. The first lookup gets you to a naming-system-specific naming authority, whose name is known ahead of time to clients of the naming system. This authority itself implements the second lookup, which leads to the repository where the hierarchical part can be interpreted.
The good news is that most of the drawbacks of the centralised approach have been removed:
There are downsides of this approach too:
The ARK
naming system is a good example of a naming system along these lines using http:
URIs.
It should be clear that there is no necessary association between centralisation of naming and centralisation of storage: a middle way that centralises naming, but leaves storage in the hands of content owners, is clearly possible. Doing things this way could also accommodate preservation of branding.
6.1. Name management policy
6.1.1. Providing owner stability
As things stand all that anyone, or any group, can do is to put carefully-designed mechanisms in place to ensure that all Domain Name registrations are legitimate (that is, not vulnerable to expropriation for cause), monitored for impending expiry, and renewed in a timely fashion.
Providing persistence and resource stability
Assuming owner stability, good will, and continuing commitment to participation in the scheme, these requirements are entirely in the hands of the originators, operators, and participants in any naming scheme. They are nonetheless among the hardest to address well. Restricting naming authority to trusted participants whose corporate self-interest is evidently tied up with their commitment to maintain their web presence and not change what their URIs in the scheme mean is an obvious starting point, but deciding just how commitments are to be phrased and what sanctions, if any, are to be available to enforce those commitments is inevitably a difficult business.
Some degree of protection against this kind of failure of the scheme can be provided by delegation and/or replication, see below.
6.1.2. Providing representation stability
The solution is partly management, i.e. the imposition of participation requirements, and partly technical, e.g. include the representation's checksum in the URI itself, or provide the checksum in a metadata record... it's not a guarantee (nothing is) but it at least provides a way to check, and if a mismatch is detected the site operator can be shamed into fixing it (same as for checksum-in-the-URI). OBO [ref?] does this.