TagIssue57Analysis

From W3C Wiki

See also TagIssue57Home

An attempt to characterize the TagIssue57Responses proposals along the lines suggested by Interoperability of referential uses of hashless URIs.

Setup

The setup for the comparative characterizaions is as follows: We have a sender (someone writing some RDF, e.g. in email or in a triple store) and a receiver.

There is content retrievable using a hashless retrieval-enabled (i.e. 2xx) http: URI U. The sender may or may not have control over HTTP responses to requests for U; in the general case they don't, so let's assume they don't.

The general scenario is that the sender wants express some meaning, and wants to refer to one of the following subjects:

  1. The content retrieved using U (generically speaking, if there is conneg).
  2. Whatever is referred to in the content as <U> (assuming the content contains statements with <U> as subject).
  3. The content's primary topic (assuming the content has a primary topic).

Often subject 2 will make sense even when subject 3 is unclear or ambiguous, and subject 3 will make sense in the absence of any statement in the content that uses the URI U. It is easy for subject 3 to be different from subject 2, e.g. if subject 2 is the same as subject 1. Subjects 1, 2, 3 all being different is unlikely but a logical possibility, e.g. if the page's primary topic is the Loma Prieta earthquake but contains embedded RDF saying that the URI refers to Loma Prieta (the location).

(There is another possible subject: What the URI "identifies" according to some putative "authority", within the bounds of what's allowed by an HTTP specification. This covers the REST / HTTPbis theory of the world, in which the content encodes the state of the resource (perhaps in a resource- or URI-specific manner??) but does not necessarily describe the resource. The REST resource would be the same as subject 2 if subject 2 is defined and respects an HTTP spec, and if the content speaks for the "authority", but that is not a foregone conclusion. As the method of "representing" the REST "state" is in general private to the "authority", i.e. not knowable by anyone else, we won't consider this possibility further, as it's not useful for communication with anyone other than those privy to secret details.)

We are evaluating possible prior agreements between the sender and the receiver that would prescribe how these various meanings can be expressed.

The best any prior agreement can do is to say that <U> is one of the subjects (perhaps a different one depending on circumstances), and the other two situations cannot use <U>. According to some of the proposals, a second URI can sometimes be discovered that refers to one of the other subjects.

Let's assume that regardless of the proposal, there is some agreed reliable but inconvenient (non-URI or URI plus extra information) way to refer to any of the three subjects, e.g.:

  1. [w:contentUri "U"]
  2. [w:descriptionUri "U"]
  3. [foaf:primaryTopic [w:contentUri "U"]]

where

@prefix w: <https://www.w3.org/2001/tag/2012/04/issue57#>.

There are many variations on these, including use of a sender-defined URI instead of blank node notation and various ways one might use <U> instead of "U". These are distracting design details that we needn't get into until an overall approach is agreed.

The distinctions that we are drawing between the prescribed method of expression for each subject 1-2-3 are therefore:

  • use <U>
  • use <V> where V is discovered somehow
  • inconvenient: use blank node notation, sender-defined URI, etc.

Characterizations of proposals

Here are some proposal characterizations based on this framework. The proposals are listed at TagIssue57Responses.

In each case we say what the sender has to write in the three cases, followed by what the receiver has to do to understand it. Actually given what the receiver does, what the sender has to write is determined, but presenting the same information in the two different ways helps to tease out the nature of the proposal.

"No conflict with subject 2" is shorthand for: Either the content does not say anything about what U refers to (roughly speaking, the URI U does not occur in the content), or it uses U to refer to the primary topic of the content.

Retract

No bare use of U is reliable at all, so to be clear the sender must always use a form of reference that is inconvenient:

  1. Inconvenient.
  2. Inconvenient.
  3. Inconvenient.

On receiving a hashless http: URI the receiver has no idea how to interpret it.

No change and its enhancements

Status quo / a representation is always content:

  1. Write <U>.
  2. Inconvenient. (This suggests that the linkee does not buy into the agreement that the sender and receiver have.)
  3. Inconvenient ([foaf:primaryTopic <U>]).

If there is a representation (2xx), a hashless http: URI refers generically to the content; otherwise, a 303 or one of its proposed optimizations (new status code, publish rewrite rules) may help.

Proposal 25

(content opt-out):

  1. If there is a Document: V header, write <V>. Otherwise, write <U>.
  2. If there is a Document: V header, write <U>. Otherwise, inconvenient (the content apparently doesn't respect the agreement).
  3. If there is a Document: V header and no conflict with subject 2, write <U>. Otherwise inconvenient.

If in response to GET U there is Document: V, a hashless http: URI refers to what the content says it does, or if it doesn't say then the content's primary topic, or if there is no primary topic then to the content (???).

Look for contradiction

  1. If content is consistent with <U> = content, write <U>. Otherwise, inconvenient.
  2. If content is consistent with <U> != content, write <U>. Otherwise, inconvenient.
  3. Inconvenient.

Receiver: If content is consistent with <U> = content, then content, otherwise what the content says.

(what if consistency is undecidable or very difficult to determine?)

No Longer Implies

Here is what the proposal as written implies:

  1. If the content contains <U> describedby <V>, write <V>. Otherwise inconvenient.
  2. If the content contains <U> describedby <V>, write <U>. Otherwise inconvenient.
  3. Inconvenient.

The receiver looks for the description link. If it's found and the target is U, then U refers to the content. If it's found and the source is U, then U refers to what the content says it refers to. Otherwise no interpretation of U is specified.

That is, there is no default rule.

But I'm not sure this is what is really intended.

Always description

  1. Shouldn't happen ("the content" is considered incoherent).
  2. Write <U> ? - not clear.
  3. If no conflict with subject 2, write <U>. Otherwise inconvenient. - this is not clear from the proposal.

If the content describes X and only X, then the URI refers to X. Otherwise, what U refers to is unclear.

Punning

That is, the choice of referent depends on the context of use. There are many variants on this proposal, the below is just one. For RDF "context" could mean either the entire graph in which it's embedded, or it could vary by the statement in which it occurs.

  1. If only subject 1 makes sense in context, then write <U>. Otherwise inconvenient.
  2. If only subject 2 makes sense in context, then write <U>. Otherwise inconvenient.
  3. If only subject 3 makes sense in context, then write <U>. Otherwise inconvenient.

If subject N makes sense in context and every other subject is either the same as subject N or makes no sense, then U refers to subject N. Otherwise, what U refers to is unclear.

Don't use http

This is very similar to "retract".

  1. Write <newscheme1:U> (similar to duri:)
  2. Write <newscheme2:U>
  3. Write <newscheme3:U> (similar to tdb:)

On receiving a hashless http: URI: The meaning would be unclear, since under this proposal hashless http: URIs should not be used in RDF.