Providing and discovering definitions of URIs

Editor's Draft 31 May 2011

This version:
http://www.w3.org/2001/tag/awwsw/issue57/20110531/
Latest version:
http://www.w3.org/2001/tag/awwsw/issue57/latest/
Previous version:
http://www.w3.org/2001/tag/awwsw/issue57/20110410/
Editor:
Jonathan A. Rees <rees@mumble.net>

This document is also available in these non-normative formats: XML.


Abstract

The specification governing Uniform Resource Identifiers (URIs) [rfc3986] allows URIs to mean anything at all, and this unbounded flexibility is exploited in a variety contexts, notably the Semantic Web and Linked Data. To use a URI to mean something, an agent (a) selects a URI, (b) provides a definition of the URI in a manner that permits discovery by agents who encounter the URI, and (c) uses the URI. Subsequently other agents may not only understand the URI (by discovering and consulting the definition) but also use it themselves.

A few widely known methods are in use to help agents provide and discover URI definitions, including RDF fragment identifier resolution and the HTTP 303 redirect. Difficulties in using these methods have led to a search for new methods that are easier to deploy, and perform better, than the established ones. However, some of the proposed methods introduce new problems, such as incompatible changes to the way metadata is written. This report brings together in one place information on current and proposed practices, with analysis of benefits and shortcomings of each.

The purpose of this report is not to make recommendations but rather to initiate a discussion that might lead to consensus on the use of current and/or new methods.

Status of this Document

This document is an editor's copy that has no official standing.

This report has been developed by the AWWSW Task Group of the W3C Technical Architecture Group in order to provide background material for further discussion among those affected by this architectural question, and to help drive TAG issue 57 [issue-57] to a conclusion.

This version has not received review within the task group or the TAG. The content is the sole responsibility of the editor.

Publication of this draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced, or obsoleted by other documents at any time.

Please send comments on this document to the publicly archived TAG mailing list www-tag@w3.org (archive). The development of this report is discussed among the task group on the public-awwsw@w3.org mailing list, with archives at http://lists.w3.org/Archives/Public/public-awwsw/.

Table of Contents

1 Introduction
2 Use case scenarios
    2.1 Choosing a URI, providing a definition of the URI, using the URI
    2.2 Using a document as a definition by reference to its primary topic
3 General definition methods in current use
    3.1 Colocate definition and use
    3.2 Point to the document that contains the URI's definition
    3.3 Register a URI scheme or URN namespace
    3.4 Use the LSID getMetadata() method
    3.5 'Hash URI'
        3.5.1 'Hash URI' semantics is sensitive to media type
        3.5.2 The common 'hash URI' pattern fails with large namespaces
        3.5.3 Fragment identifiers are easily lost
        3.5.4 'Hash URIs' don't support REST architecture
    3.6 Absolute URI with HTTP 303 See Other redirect
        3.6.1 303 is difficult, sometimes impossible, to deploy
        3.6.2 303 leads to too many round trips
        3.6.3 303 makes the URI difficult to bookmark
        3.6.4 303 has no normative specification
4 Possible mitigations
    4.1 Use something other than a URI
    4.2 Absolute URI with site-specific discovery rules
    4.3 Absolute URI with new HTTP request or response
    4.4 Coerce an information resource to what it defines its URI to name
    4.5 Change what dereferenceable absolute URIs mean
5 Summary
6 Glossary
7 Acknowledgments
8 References

End Notes


1 Introduction

This is an old issue, and people are tired of it. — Sandro Hawke, January 2003 [disambiguating]

In any kind of discourse it is very useful for an agent to be able to provide a definition of a term, in such a way that other agents can discover and use that definition in order to make sense of utterances that use that term, and to compose new ones.

Example: Definition discovery
Definition of "EQ 018"

Suppose that Alice, in communication with Bob, uses the term "EQ 018" to mean the Loma Prieta earthquake, as in "Alice was in the laboratory during EQ 018". If Bob does not know what "EQ 018" means, he will have to find out. He might be able to ask Alice directly, although this may be impossible, as Alice might be too busy, or otherwise unavailable. Lacking that option he does some research, consulting a dictionary or similar resource (reference book, database, search engine) in order to obtain the explanation of Alice's use of the term "EQ 018".

In this report, the terms to be defined are assumed to be URIs. URIs can be used to mean all sorts of things in many different technical contexts. Contexts of special interest to this report are those processed by machine, including the RDF and OWL family of languages. The question may appear to be limited to RDF and its derivatives, but to the extent that there is supposed to be a single meaning for each URI common to RDF and Web architecture [webarch], the issue transcends RDF.

The nature of definitions need not concern us here - many forms are familiar, including translation between languages (e.g. providing an English or Spanish phrase equivalent to a URI), descriptions (the URI refers to an entity possessing some set of properties), explanation by example, axiomatic method, and so on. Also not of concern here are the many ways in which meaning can fail as a result of what a definition says or doesn't say about the URI in question, or the particular way in which a URI is used. Our concern is only with the method by which definitions are conveyed, and with meaning only to the extent the method impinges on interpretation.

When the term to be defined is a URI, discovery methods include, in addition to those already mentioned, network protocols such as HTTP that involve the URI directly.

Definition discovery is similar to Web dereference in that in both cases one starts with a URI and ends with a document. The two must not be confused, however, since dereference often yields a document that does not define the URI, and even when it does it may not be recognized as such either normatively or by one of the parties involved. At present, by convention, a dereferenceable URI refers to the information resource on the Web at that URI (see [ir]), and that may not be what a putative definition of the URI defines or describes.

The reason we define definition discovery methods is interoperability: so that each URI is understood in the same way by everyone. In principle, we only need consensus on methods such as the ones surveyed here for URIs that are to be shared widely. If agents that use a URI in one way never use it in communication with agents that use it in another way, then it is OK for the URI to have distinct senses in the two communities, and there is no problem to be solved - each community can use the URI in its own way, and there will be no confusion.

The operative word here is "if". Isolation is fragile and means lost opportunities for synergy and unintended reuse. All the arguments in favor of a World Wide Web, which depends on the global nature of the URI vocabulary, apply here.

This report presents discovery methods in current use, reports some criticisms of them, and describes some additional discovery methods that have been proposed to address the criticisms.

2 Use case scenarios

Use cases need to be presented as being independent of any particular solution to be used, in order that the solution space can be explored without bias. This leads to some frustrating vagueness in the following, but the vagueness is intentional and necessary.

2.1 Choosing a URI, providing a definition of the URI, using the URI

Alice wants to refer to a particular earthquake. Alice "mints" a new URI (one that is not yet in use) with the purpose of using that URI to refer to the earthquake. Alice publishes a document containing a definition of the URI, i.e. a document that would lead a reader to understand that the URI refers to the earthquake.

Bob then learns of Alice's URI and its definition, and uses the URI in a document of his own.

Subsequently Carol encounters Bob's document. Wanting to know what the URI means, she is led somehow to Alice's published definition, which she reads. She is enlightened.

Any method for implementing this use case would need to explain: what kind of URI Alice should use (syntactic constraints); where and how should Alice should publish the definition so that it can be found; and how Carol might come to discover Alice's definition, given the URI.

2.2 Using a document as a definition by reference to its primary topic

Editorial note2011-04-14
Consider dropping this use case, and explain the situation in some less prominent way. The only evidence we have for this situation is from Hugh Glaser's message, and most of the discussion in this document does not apply to this case.

Bob desires to refer to Chicago. He finds a Web page on the Web at 'http://example/about-chicago' (provided by, say, Alice) that consists of a description of Chicago, and wants to use it for the purpose of referring to Chicago. He chooses a URI and associates it with Alice's Web page in such a way that Bob's URI will be understood as referring to Chicago.

Carol encounters Bob's URI, is led to 'http://example/about-chicago' and thence to Alice's description of Chicago, and then somehow understands that Bob's URI is meant to refer to Chicago.

Any method for implementing this use case would need to explain: what are the syntactic constraints on the URI Bob chooses; what Bob needs to do to associate his URI with the document about Chicago; and how Carol comes to discover and use that association.

(This differs from the previous use case in that the document about Chicago was not written with the purpose of defining Bob's URI. In fact Bob's URI doesn't even occur in it. Rather than look in the document for a definition mentioning Bob's URI, Carol must determine the topic of the document and take the topic as the meaning of Bob's URI.)

3 General definition methods in current use

This section describes currently accepted methods for providing and discovering definitions of URIs.

3.1 Colocate definition and use

One way to lead someone encountering a URI to a definition of the URI is to make sure that the definition of the URI occurs in each document in which the URI occurs. This makes the definition easy to find, since anyone who encounters the URI will have in hand the definition that they need. The form of the URI in this case is arbitrary.

This method treats URIs similarly to blank nodes in RDF, which have to stay close to their own definition, since they are scoped to a graph. An example of the application of this approach would be the use of a URI in an OWL ontology file that defines that URI.

Criticism: In RDF, this method is fragile in the same way as are blank nodes, because use and definition can get separated, e.g. when uses of the URI are deposited into a triple store and then retrieved by a query. Carrying a definition around with a reference does not help in the common case where an out-of-context reference is needed (as one would want in, say, a Semantic Web).

3.2 Point to the document that contains the URI's definition

When using a URI, provide, again in the document in which the URI occurs, a reference to a document that carries a definition of the URI. This is the approach taken by OWL; the document containing the URI is related to the one from which the definition of the URI should be obtained via the owl:imports relation.[1]

The rdfs:isDefinedBy property might also be used for this purpose, but it probably isn't.

Criticism: Like the previous approach, this one is good so far as it goes, but it suffers in similar ways. The URI and the link to its definition can get separated, or keeping the definition link close to the occurrence of the URI may prove to be too difficult for applications.

3.3 Register a URI scheme or URN namespace

In principle, one could create a new URI scheme or URN namespace, in which case the registration document would constitute a definition (although perhaps not on its own; often there is delegation of some kind to other documents). A recent example is RFC 5870 for URIs defined to name geographic locations. Another is the definition of the URI about:blank, which is in progress as of this writing. A "tdb:" (thing-described-by) URI scheme has also been proposed. [TBD: cite Masinter] See [rfc4395] and [rfc3406] for details.

Criticism: The review process for new URI schemes and URN namespaces is probably too stringent for all but a very few definition discovery applications. There would likely be poor protocol support for discovering definitions in a new URI scheme or URN namespace. It is possible, manually, to look up a scheme or namespace in the appropriate registry, but few client applications are able to do this, and the resulting document is not machine actionable in any standard way. One could attempt to modify all Web clients to understand the new scheme, but this would be difficult.

3.4 Use the LSID getMetadata() method

A URN namespace for which there is a general definition method is the 'lsid' namespace.[2] URIs beginning 'urn:lsid:' are called LSIDs. [lsid] LSIDs have an associated SOAP-based protocol that has separate methods for dereference (getData) and discovery (getMetadata). According to the LSID specification, an LSID for which the getData method yields nonempty content refers to a representation, while the LSID could refer to anything at all if getData yields empty content. In the latter case the information yielded by the getMetadata method generally constitutes, or at least contains, a definition of the LSID.

For clients lacking an LSID protocol implementation, HTTP/LSID gateways are available.

Criticism: LSIDs rely on an unregistered URN namespace, calling their consensus status into question and making them impossible to understand through the usual chain of IETF URI specifications. The LSID protocol itself is poorly deployed. As currently used, LSIDs have the same vulnerabilities as http: URIs, since they rely on the domain name system for both authority and resolution. What advantage they have over http: URIs (below) for the application is not clear.

3.5 'Hash URI'

With this method, the URI must be a 'hash URI', i.e. must contain a hash character '#'. (For historical reasons the part of the URI following '#' is called the 'fragment identifier', even when it is null.) The definition of the URI is placed in the document on the Web at the URI that is the pre-hash stem of the URI.

Example: 'Hash URI'

The interpretation of a 'hash URI', say 'http://example/eq#eq018', depends (according to [rfc3986]) on the media types of representations of the information resource on the Web at its stem URI 'http://example/eq'. For media type application/rdf+xml, the media type registration defers to the content of the representation — that is, the representation itself gets to arbitrarily define what the 'hash' URI means.[3]

Criticism: Using 'hash URIs' in this way is a retrofit of an existing architecture intended for locating parts (fragments) of documents to definition discovery. As such the mechanism has some rough edges. Some of the objections to the use of 'hash URIs' are as follows.

3.5.1 'Hash URI' semantics is sensitive to media type

If there is content negotiation, session sensitivity, etc., then the definition that is intended and sought may not be present in the representation that is accessed. Worse, the definition that is found may be incompatibly different from the one that is meant. For example, if there is an application/rdf+xml representation and a text/html representation, then the former may define the URI to name an earthquake, while the latter may define it to name an HTML element.

Response: The answer to this objection is that a server that wants to avoid risking such confusion shouldn't do this. A server should either avoid content negotiation completely, or if it must do CN, it should make sure that the URI is defined in all representations, and in the same way in all of them.

At present the only media type registration that supports defining 'hash URIs' in arbitrary ways is application/rdf+xml. Since this media type has no human-friendly presentation and is not enabled for XSLT, many providers (e.g. FOAF, dx.doi.org) use CN between HTML and RDF so that access in a browser delivers information that is useful to a human. E.g. if you access FOAF without special CN parameters you will not get discoverable definitions of its non-element fragids.

The advent of RDFa, which should eliminate the need for HTTP/RDF CN, may create an opportunity to smooth this inconsistency over.

3.5.2 The common 'hash URI' pattern fails with large namespaces

When a large number of URIs are formed by combining a fixed "namespace" prefix with many suffixes using hash as a connector, there will be a single underlying document at the pre-hash URI that must provide definitions of all of the large number of URIs. This is an unacceptable performance hit for the server, the network, and the client. Absolute URIs don't have this problem as the response can be specific to each URI.

Response: The answer to this has been reported a number of times [degraauw]. For a set of namespace members a, b, c, ... instead of using URIs

  http://example/ns#a  http://example/ns#b  http://example/ns#c ...

use URIs that look like

  http://example/ns/a#_  http://example/ns/b#_  http://example/ns/c#_ ...

where _ is a common suffix of your choice, perhaps even empty:

  http://example/ns/a#  http://example/ns/b#  http://example/ns/c# ...

3.5.3 Fragment identifiers are easily lost

Harry Halpin [halpin] says that fragment identifiers are often lost during document preparation and cut/paste operations.

Response: It's not obvious that this should be the case. More detail is needed on this objection. Use cases would help.

3.5.4 'Hash URIs' don't support REST architecture

Manu Sporny says that hash URIs should work with HTTP PUT, POST, and DELETE methods; they don't.

Response: More information needed. Why not use a separate dereferenceable URI for REST controls related to the referent and/or documentation of a hash URI?

3.6 Absolute URI with HTTP 303 See Other redirect

In the 2002-2005 time period, when 'hash URIs' were advanced as the recommended method for definition discovery, a demand arose for a discovery method applicable to absolute URIs. This led to the invention of the following protocol.

In this approach, one mints an absolute (i.e. hashless) http: URI, puts a definition of it on the Web at a second URI, and then arranges for a GET request of the first URI to redirect, using a 303 'See Other' status code, to the second URI. The first URI is not dereferenceable, and therefore does not name the information resource at that URI (since there is none). The first URI then gets its meaning according to the document on the Web at the second URI. [Draft note: TBD: cite HTTPbis]

Example: 303 redirect

Alice chooses 'http://example/eq018' as the way she will refer to a particular earthquake. At 'http://example/about-eq018' she publishes text and/or RDF that defines 'http://example/eq018', explaining the URI by providing details about the earthquake (date, location). For the URI 'http://example/eq018', which will not be dereferenceable (since otherwise, it would refer to the information resource at that URI [ir], not the earthquake), she arranges that a GET request yields a 303 redirect with a Location: header specifying 'http://example/eq018' as the redirect target.

Those encountering 'http://example/eq018' will attempt to dereference it, but this will fail, with a 303 redirect delivered instead. The 303 redirect indicates that the document at 'http://example/about-eq018' provides a definition of the URI 'http://example/eq018'.

Another pattern is to use a 303 redirect to a document whose primary topic is the intended referent, similar to the Chicago use case (2.2 Using a document as a definition by reference to its primary topic). This could, in theory, lead to ambiguities, as the primary topic of the document and the entity referred to using the URI might be different things.

Criticism: Again, a number of objections to this approach have been raised:

3.6.1 303 is difficult, sometimes impossible, to deploy

Deploying a 303 redirect requires giving the correct directive to a web server, for example adding a Redirect line to .htaccess in Apache HTTPD. Unfortunately many hosting solutions do not allow this, putting this manner of publishing definitions off limits to many who would otherwise like to use it.

Response: Web publishers whose ISP does not permit them to set up a 303 redirect, or for whom the overhead such as expertise acquisition is prohibitive in some other way, could choose to use a service that provides 303 redirects to a location of their choosing. One such service is purl.org, operated by OCLC, which permits anyone to set up a 303 or other redirect from their domain. The URI to be defined would have to have the form http://purl.org/..., while the URI for the document carrying the definition could be anything at all.

Unfortunately, use of a redirect service makes one dependent on two service providers instead of one, making one's definitions more vulnerable than if only one provider were involved.

3.6.2 303 leads to too many round trips

To get definitions of N URIs by redirecting through 303 responses, you need to do 2N HTTP requests. This is a frustrating and apparently gratuitous performance hit for those interested in publishing and accessing large numbers of definitions.

Response: See 4.2 Absolute URI with site-specific discovery rules.

3.6.3 303 makes the URI difficult to bookmark

"Redirection has in fact very confusing side effects; as we expect the semantic web to work seamlessly with the web, it is very odd that a semantic web uri cannot be copy pasted to a browser without seeing it change to something that is not the same as before." [tumarello]

Response: The location bar issue is discussed here. [TBD: citation] The content from the redirect target does not originate from the referent of the original URI, so an interface that suggests otherwise is guilty of misattribution. The best answer to this is that an additional user interface element should be added to browsers that provides access to the original URI.

3.6.4 303 has no normative specification

"The hash 303 redirect method in common use has not received adequate review such as W3C recommendation track; in fact it is not really documented at all in any adequate form." [halpin]

Response: The IETF HTTP working group has taken on this issue. HTTPbis's new text for GET/303 specifies the pattern, which is now in common use in RDF deployment.

4 Possible mitigations

With 'hash URIs' and the 303 redirect identified as the sources of current difficulties, a number of new methods have been suggested to get around their problems.

4.1 Use something other than a URI

Editorial note2011-04-14
This section derives from JAR's TAG F2F presentation slides. The purpose of talking about this idea is mainly to remind people that the problem is one of notational engineering, not philosophy. This doesn't work very well, though, and I will probably flush this section.]

URIs are just one kind of term that might be used to refer to something. If defining a URI is too difficult or costly, then perhaps one might do without. In RDF serializations such as Turtle, for example, we have blank node notation:

  [ foaf:isPrimaryTopicOf <http://example/about-chicago> ] 

Here we have managed to refer to Chicago without defining a new URI; we have simply referred indirectly using a URI that refers to an information resource according to a generic method (see [ir]).

A more concise alternative is syntactic sugar:

  *<http://example/about-chicago> 

which might be supported in a hypothetical new RDF serialization as a shorthand for the previous example. (The asterisk is meant to be suggestive of indirection in the C programming language.)

Criticism: This is agood as far as it goes, but does not meet the demand for defined URIs.

4.2 Absolute URI with site-specific discovery rules

The network round-trip (303 redirect) to map the URI whose definition is to be discovered to the URI of the information resource that defines it can be avoided if we know a general rule that maps one kind of URI to the other, as such a rule can be applied on the client without server involvement. It is too much to hope that a single rule could work uniformly for all URIs whose definition might be sought, but an individual host may have a rule that applies for URIs at that host.

The "well known URIs" protocol gives a place where such a file containing such rules can be stored [rfc5988]. The rule might be stored in a well-known file 'definition-rule', as in 'http://example/.well-known/definition-rule'. To obtain a definition of 'http://example/eq018', obtain the definition-rule file for its host. Then if the rule says to map 'http://example/{path}' to, say, 'http://example/{path}.about', a definition of 'http://example/eq018' can be sought by dereferencing 'http://example/eq018.about'.

When the mapping is cached, this reduces the number of round trips from two (in the 303 case) to one.

This would be a new protocol and the name and format of the definition-rule file would have to be pinned down. One option might be to use the link-template feature of the host-meta file [rfc5988].

Looking for a definition-rule file for every host that has URIs for which definitions need to be discovered would be expensive if only a few of them have such files, but with some cleverness the number of such failed requests can probably be kept small. The details would have to be worked out, but this approach could be a boon to bulk consumers of absolute URI definitions.

Criticism: Web site authors without write access to the appropriate .well-known file would not be able to take advantage of this facility.

4.3 Absolute URI with new HTTP request or response

To reduce the number of round trips, we might use a new HTTP method to request a definition of a URI, or the server could use a new status code to indicate that what it is returning is a definition of the request URI.

The URIQA specification [uriqa] defines MGET, a new HTTP request method. An MGET request on a URI yields a response containing a definition of that URI.

In response to GET of a URI, a server might provide a definition in a non-success response. (A successful response would mean that the URI refers to the information resource at the URI.) Possibilities for HTTP response status codes that might signal this situation: 203 Non-Authoritative Information, a new 2xx status (e.g. 209), a new 3xx status (e.g. 308), or a variety of 4xx codes. (Placing the definition in the content of a redirect response (status code 301, 302, 303, and 307) is unsatisfactory as the content would not be displayed in a Web browser.)

[The Link: header or other HTTP response header might play a role here? TBD: explain and cite Web Linking and HTTPbis.]

Any of these options would mean fewer round trips than following a 303 redirect.

Criticism: Although they reduce the expected number of round trips, all such methods are generally as difficult, or more difficult, to deploy than 303 redirects.

4.4 Coerce an information resource to what it defines its URI to name

[Draft note: We are trying to represent Ed Summers's proposal, which others have echoed, in this section.]

Currently we use a dereferenceable absolute URI, e.g. 'http://example/eq018', to refer to the information resource at that URI, IR('http://example/eq018') (see [ir]). To use an absolute URI to refer to anything else, one uses an HTTP 303 redirect. To address performance and deployment difficulties with 303 redirects, it has been suggested that the same URI be used for two purposes: to refer to the information resource at that URI, and to refer to some entity given by a definition that is carried by the information resource itself.

Example: Combining metadata and data using the same URI

Suppose that Alice wants to record some information about an earthquake. She publishes a definition containing the following so that it's on the Web at the URI 'http://example/eq018':

  <http://example/eq018> eq:magnitude 6.9.
  <http://example/eq018> eq:epicenter <geo:37.040,-121.877>. 

Bob then comes along and writes the following metadata about IR('http://example/eq018') in the usual way, i.e. using the URI to refer to the information resource, based on what information is accessed via that URI:

  <http://example/eq018> dc:creator "Alice".
  <http://example/eq018> dc:title 
    "Loma Prieta earthquake URI definition".

Suppose that Carol encounters both bits of RDF (or either) and needs to make sense of them. She is aware that 'http://example/eq018' might be used in both ways - in metadata, with the intent that the metadata is about IR('http://example/eq018'); and also related to an earthquake as described in IR('http://example/eq018'). For each use of 'http://example/eq018' she (or her software) needs to determine which sense is supposed to apply.

In general, what agents using this protocol need - both those composing statements and those deciphering them - is an agreed rule for classifying each occurrence of a URI u as referring either to the information resource IR(u) or to what the content at IR(u) describes.

There are probably many ways in which one might accomplish this; the following method is provided for illustration. Suppose that it "makes sense" or "is appropriate" for the subject of a particular property to be an information resource. For example, the subject of Dublin Core properties might be seen as "making sense" when the subject is an information resource. The judgment of "making sense" might be made according to an asserted or inferred domain constraint, or it might simply be by fiat (asserted). Call such a property a subject-IRS property. A property that is not subject-IRS would be subject-NIRS. Similarly, we would have object-IRS and object-NIRS properties.

We now consider an implicit coercion to be applied any time an information resource occurs in an NIRS position of a subject- or object-NIRS property, replacing the information resource with something (the same thing each time, for a given URI) that it describes. That is, the information resource acts as a proxy for what it is about.

In the example, dc:creator and dc:title would be classified as subject-IRS object-NIRS, while eq:epicenter and eq:magnitude would be classified as subject-NIRS object-NIRS.

To avoid mistakes, the IRS/NIRS classifications of all properties would have to be understood in the same way by both Bob and Carol, i.e. the classifications would have to be part of the shared meaning of all properties.

Criticism: This approach presents a couple of challenges.

First, not all subject or object positions of properties are easily classified as IRS vs. NIRS. For example, the object of "likes" and the subject of "is located at" are not obviously either IRS or NIRS. Even if the choice is agreed, no matter what the choice is, meanings that required the other choice would be difficult to express - you would have to revert to a mode of expression that did not involve a 200 response (hash, 303, blank node, etc.).

Second, this method, by design, creates the illusion that the URI refers to what the information resource is about (e.g. an earthquake). Because predicates that already possess meaning are being reinterpreted, there is risk that an agent will draw unsound conclusions. For example, if two URIs u, v refer to distinct descriptions of the same thing, and one then writes <u> owl:sameAs <v>, then one incorrectly imputes that the two descriptions are identical. A similar situation holds for situations involving functional properties.

4.5 Change what dereferenceable absolute URIs mean

Under this proposal, some dereferenceable URIs - call them "indirect" URIs - would get their meaning according to a definition found in the information resource (document, usually) at the URI, rather than referring to that information resource [ir]. This approach avoids the deployment and performance difficulties of 303 redirects. Defining an indirect URI is easy — it is the same as publishing any Web document — and access to its definition is also easy, not requiring an indirection step.

How does one learn whether a URI is indirect or not? One might like to say that an indirect URI is one that dereferences to a definition of itself, and that all others are direct. But this criterion is not machine actionable as stated, both because the definition might be couched in an arbitrary language or notation (the number of RDF serializations is increasing steadily), and because even for a known notation it may not be obvious how to distinguish content that contains a definition of a particular URI from content that doesn't. One actionable approximation that has been proposed is as follows: If IR(u) has an associated representation with media type 'application/rdf+xml', then take u to be defined by IR(u), otherwise take u to refer to IR(u). This rule would generate false positives (e.g. documents not containing u) and false negatives (e.g. those defining the URI only in an associated text/owl-manchester representation), but it illustrates the idea.

In order to compose or use metadata, an agent would first check whether a URI is direct by requesting an application/rdf+xml representation. If the URI is direct, the agent could compose or use metadata in the usual way (at some risk that the URI might change status in the future from direct to indirect). If the URI is indirect, the agent would write or interpret the metadata in some other way.

This proposal would need to compensate for the loss of a way to refer to information resources on the Web at indirect URIs. A standard way to refer to IR(u) is needed in a variety of circumstances:

  1. when u is an indirect URI
  2. when it is not known whether u is direct or indirect
  3. when the cost of determining whether u is direct or indirect is judged to be too high
  4. when it is desired not to impose on others the cost of determining whether u is direct

Although direct URIs might still be used to refer to their information resources, the risks and costs of doing so would probably lead people to stop using them.

In any case, there are many design alternatives for referring to an information resource without depending on its URI being direct. For example, the Turtle term

  [ ir:onWebAt "http://example/eq018"^^xsd:anyURI ] 

could be a new way to refer to IR('http://example/eq018'), which we formerly referred to in Turtle as '<http://example/eq018>'. [TBD: Reference Halpin and Presutti's closed access ESWC 2009 paper.] A local shorthand could be defined to the same effect:

  :about-eq018 ir:onWebAt "http://example/eq018"^^xsd:anyURI . 

(Note that either a 'hash' URI or a 303 URI could be used to refer to an information resource — defined perhaps in this way.)

Yet another possible replacement notation would be syntactic sugar:

  &<http://example/eq018> 

which might be supported in a hypothetical new RDF serialization. (The ampersand is meant to be suggestive of the address-of operator in the C programming language.)

Alternatively, the referring document could just assert that a URI is direct:

  <http://example/eq018> ir:onWebAt "http://example/eq018"^^xsd:anyURI . 

This would be an instance of 3.1 Colocate definition and use. However, this runs some interoperability risk as there may be other agents that interpret the same URI as indirect. [4]

To avoid the need for the clumsy ir:onWebAt notation, some convention might be used to provide a URI (other than u) to refer to IR(u), when one is available. One way to do this would be with a Link: HTTP response header [rfc5988].

Criticism: Currently it is easy to write and interpret Web metadata (meaning metadata written using a dereferenceable absolute URI to refer to the information resource at that URI). This proposal makes metadata more complicated, fragile, and costly, and forces all existing producers and consumers of Web metadata to be updated to be aware of indirect URIs.

It is likely that there is deployed content that would be interpreted differently under the proposed rule than at present. This would be hard to know, and inconsistencies could be consequential, such as the assignment of authorship or a copyright license to the wrong information resource. More complex and costly heuristics would reduce misinterpretation but would not eliminate it.

As most of the Web (e.g. HTTP clients and servers) will continue to adhere to the current interpretation of dereferenceable absolute URIs, the proposed rule introduces a split in the URI namespace, with two communities interpreting the same URIs in incompatible ways. This goes contrary to both the Web architecture and to the Web Consortium's "One Web" vision.

5 Summary

The following table summarizes some of the current and proposed definition discovery methods, evaluating each against a set of criteria, as explained in the key below.

compatible?robust?easy to deploy?min round tripssound?
Hash +-+1+
Absolute + 303 ++-2+
Absolute + discovery-rule++?1+ε+
Absolute + new HTTP ++-1+
Coerce+++1-
Take at face value-++1+
compatible?
Does it assign a new, incompatible definition to existing URIs?
robust?
Is the URI free of fragment identifiers that can get lost or misinterpreted?
easy to deploy?
Can a publisher with a file-upload-only hosting solution use this method?
min round trips
How many network round trips are needed to find a definition, assuming (a) the definition is not cached and (b) the /.well-known/host-meta cache misses with probability ε ?
sound?
Is the method likely to respect deployed axioms and inference rules (i.e. is safe with respect to logical soundness)?

6 Glossary

This section defines terms that are used in this report. An attempt has been made to avoid gratuitous differences from the way these terms are used elsewhere, but in a few cases choice of terminology has been difficult and words with other meanings (such as "definition") are given technical definitions. These definitions are not being proposed for general adoption.

[Draft comment: All terminology choices are provisional; for most of them I am testing the waters to see how well the word works, and am prepared to change.]

absolute
A URI is absolute if it contains no hash '#' sign. This usage is a bit unintuitive but is used for consistency with RFC 3986 [rfc3986].
associated with
[Draft note: This is too sketchy. TBD.] "Association" of a representation with an information resource is by fiat according to each particular information resource. See [ir].
definition
A document or document part that provides information about the meaning of a URI or other kind of term. This term is not meant to be either rigorous or exclusive. The "information" could provided in any human-readable or machine-readable language, or combination of languages. It needn't be successful, specific, or comprehensive in defining the term in the ordinary sense of "defining". Rather, the term as used here refers to the role it plays in discovery. We might more accurately say "putative definition". [Draft note: Alan R: Is a sound recording a possible definition?]
dereferenceable
A URI is dereferenceable if there is at least one representation that is authorized as the result of a retrieval operation. (This definition is derived from [rfc3986] section 1.2.2, which also applies 'dereference' to operations such as POST.) In particular, absolute http: URIs are dereferenceable if some HTTP method or equivalent is successful (yields a 2xx response). Some URIs belonging to some other URI schemes are also dereferenceable.
http: URI
A URI whose scheme (the part before the colon) is 'http' or 'https'.
information resource
Roughly speaking, something that is appropriate as the subject of metadata. See [ir].
IR(u)
IR(u) is shorthand for the information resource on the Web at URI u. For example, if 'http://example/image23' is dereferenceable, then IR('http://example/image23') is the information resource on the Web at that URI.
metadata
Information about information, or about an information resource. In RDF, metadata might be written using vocabularies such as Dublin Core, FOAF, or CC REL.
on the Web at
When a URI is dereferenceable, "the information resource on the Web at a URI" (abbreviated IR(that URI), see below) is the information resource whose associated representations are the ones obtained by dereferencing that URI (or more precisely, the ones that are authorized for dereferences of that URI). See [ir] for a rigorous definition.
refer
For the purposes of this report, reference is just one way to mean. There may be ways to mean other than to refer, but none are specified here.
representation
Content (an octet sequence) tagged with media type and perhaps other information meant to guide interpretation of the content. "Representation" is used as a term of art; these representations don't necessarily "represent" anything at all. Similar to "entity" in RFC 2616. [TBD: citation] See [ir] for a treatment of representations and information resources.

7 Acknowledgments

David Booth, Michael Hausenblas, Nathan Rixham, and Alan Ruttenberg contributed to the creation of this report. Pat Hayes and Henry S. Thompson participated in discussions. Timothy Danford gave some helpful suggestions on a draft.

8 References

issue-57
Issue 57. W3C Technical Architecture Group, 2007-2011. (See http://www.w3.org/2001/tag/group/track/issues/57.)
rfc3986
T. Berners-Lee, R. Fielding, L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986, IETF, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)
disambiguating
Sandro Hawke. Disambiguating RDF Identifiers. W3C, January 2003. (See http://www.w3.org/2002/12/rdf-identifiers/.)
webarch
Ian Jacobs and Norman Walsh, editors. Architecture of the World Wide Web, Volume One. W3C Recommendation, December 2004. (See http://www.w3.org/TR/webarch/.)
ir
Jonathan A. Rees, editor. Information resources and Web metadata. Editor's draft, W3C, 2011. (See http://www.w3.org/2001/tag/awwsw/ir/20110517/.)
rfc4395
T. Hansen, T. Hardie, and L. Masinter. Guidelines and Registration Procedures for New URI Schemes. RFC 4395, IETF, 2006. (See http://www.ietf.org/rfc/rfc4395.txt.)
rfc3406
L. Daigle, D.W. can Gulik, R. Iannella, and P. Faltstrom. Uniform Resource Names (URN) Namespace Definition Mechanisms. RFC 3406, IETF, 2002. (See http://www.ietf.org/html/rfc3406.txt.)
lsid
Life Sciences Identifiers Specification. Object Management Group, 2004. (See http://www.omg.org/cgi-bin/doc?dtc/04-05-01.pdf.)
rfc5988
M. Nottingham. Web linking. RFC 5988, IETF, 2010. (See http://www.ietf.org/rfc/rfc5988.txt.)
hostmeta
E. Hammer-Lahav. Web Host Metadata. Internet-draft, IETF, 2010. (See http://tools.ietf.org/html/draft-hammer-hostmeta-13.)
uriqa
Patrick Stickler. The URI Query Agent Protocol. Nokia, 2010. (See http://sw.nokia.com/uriqa/URIQA.html.)
halpin
Harry Halpin. Reversing HTTP Range 14 and SemWeb Cool URIs decision. Email to public-awwsw list, 2011. (See http://lists.w3.org/Archives/Public/public-awwsw/2011Jan/0021.html.)
degraauw
Marc de Graauw. The #referent convention. Blog post, 2007. (See http://www.marcdegraauw.com/2007/02/20/the-referent-convention/.)
tumarello
Giovanni Tumarello. http-range-14 303 issue, request for reopening the discussion. Email to www-tag list, 2007. (See http://lists.w3.org/Archives/Public/www-tag/2007Jul/0034.html.)

End Notes

[1]
More precisely, the definition will be found in the imports closure of the document containing the URI.
[2]
Unfortunately the 'lsid' URN namespace is not in the IANA registry. Someone encountering an LSID may need to do a search in order to locate the LSID specification and consequently determine what the LSID means. In addition each LSID contains an "authority" field whose meaning is not assigned by the LSID specification, requiring even more research on the part of someone trying to understand an LSID.
[3]
If IR('http://example/eq') (the information resource at URI 'http://example/eq') has multiple representations, it is important that all representations provide definitions of every URI that needs one, and that corresponding definitions in different representations be compatible with one another. (See [webarch] section 3.2.)
[4]

One might think that the notation for referring to information resources could relate the information resource to the referent of u (written '<http://example/eq018>' in Turtle) instead of to the URI u itself (written '"http://example/eq018"^^xsd:anyURI'):

  [ rdfs:isDefinedBy <http://example/eq018> ] 

However, the meaning of this expression is then sensitive to the interpretation of the URI 'http://example/eq018', which is what is in doubt and is therefore what the notation has to avoid depending on. If two URIs, say 'http://example/eq018' and 'http://example/earthquake571', both refer to the same thing (whatever it is), there might be two distinct information resources IR('http://example/eq018') and IR('http://example/earthquake571') satisfying this relationship, with no way for the property, which is defined on the interpretations of the URIs and not on the URIs themselves, to choose between them.