Persistent Reference Interventions

JAR's notes for TAG F2F, 4 February 2011.

http://w3.org/2001/tag/2011/02/intervention.html referred to this document in 2011 - what does it do now?

Problem statement

Alice wants to write a reference to a document, perhaps because what the document says bears on a scholarly argument she is making. Alice wants to communicate enough information in the reference that Bob, who will check her argument 100 years from now (years after Alice has retired), will be able to use his automated assistent (web browser equivalent) to find the referenced document.

She also cares about today's readers, and the same reference needs to work now, or soon. (She is willing to wait until we have figured out how to help her.)

As success is not under her control, Alice will have to take a leap of faith, and trust, to varying extents, all the organizations and/or social institutions involved in bringing it about.

What string does Alice use for the reference, and what would other agents working on her behalf (perhaps the author or publisher of the document, or a library) have to do to ensure her that her reference is likely to work in 100 years?

(Obviously having the document itself persist over time is necessary to make this work. As long as a few libraries or archives (or equivalent) think it might be important it will be kept, and there is good chance that scholars will be able to find it, with a bit of digging. But the existence of copies does not itself create a reliable system of actionable references.)

Analysis

I've talked about the nature of the problem and the solution space elsewhere.

It probably doesn't pay to start doing much technical design until the appropriate social and organizational intervention points are identified. But please do think about peer-to-peer HTTP, failover, verifiability, role of DNSSec, Zooko's triangle, and Bitcoin. Settling on requirements, and then on implementation strategy, is going to require substantial research.

This week I've been thinking about everything in terms of accountability - who has promised to do what - and obviously incentives and economics are important here too.

Intervention point checklist

In order to get actionable persistence, we will need a miracle. We may be able to summon one miracle, but summoning two is probably beyond our reach. So the question is, what miracle do we ask for?

In the below "actionable" means "actionable (or dereferenceable) in 2011". Some strings that are not actionable now may become actionable in the future.

The following series of steps is written as a set of progressive refinements on some initially unspecified reference system design. Each step proceeds by removing some layer or risks.

*** = research area

Non-machine-friendly reference (e.g. author/title/publisher/date)?

Reliable dereference of this kind of reference is beyond the current state of the art and unlikely to happen.

Hybrid approach?

That is, use a non-machine-friendly reference as "what's really meant," augmented with machine-friendly "will probably work but I won't be held accountable for it".

This is probably the status quo "best practice" if we do nothing.

This would have a high cost in certain applications, such as XML and RDF.

This reduces the user's accountability for the automated form, and therefore does not really solve the problem.

Not a URI (e.g. handle)?

Inject space of non-URI references into URI space, e.g. by prepending some fixed prefix. The problem then reduces to the persistent actionable URI case.

The injection method should be well publicized among users of the reference system.

You may want to present the reference in non-URI form. That's OK, but please "use actionable http: manifestations of any non-natively actionable URIs in actionable contexts" (DOI best practice).

[*** Henry Thompson identified this as a research area at the Oct F2F]

Don't trust IETF in role URI scheme namespace "owner"?

Why not?

URI scheme not registered? or URN scheme?

Document and register it.

Not actionable (urn:, duri:, info:, etc.)?

Update all Web clients so that they can deference these URIs somehow. (miracle)

Also see above under "not a URI".

Problem with http: scheme as specified?

Maybe linking http: to DNS ("governed by a potential HTTP origin server") exposes the URI user to unacceptable risk. We could revise the http: registration to enable an exception for whatever the persistent subspace is, with agreement from main players (ICANN, registrars, client vendors, etc.).

Listen carefully to Dan Connolly's lecture on the inevitability of "ownership," and Roy Fielding's on "authority" (http: is not HTTP).

Or, get general agreement on an approach without revising the registration, and plan to revise it when it becomes necessary.

Don't trust DNS root server managers?

Need to get them on board. Not sure who they listen to. To bypass them, one would have to talk to the hardware vendors. (huge miracle, bad idea)

IANA/ICANN (root management) not on board?

That is, are they unwilling to help guarantee that domain reassignment never happens in persistence space (except in case of administrator corruption, etc. etc. lots of details to work out)?

Persuade them. Need to work out exactly what they're being asked to do. Need to involve others. (miracle) *** (research this)

Maybe persuade IETF and then use their influence over IANA. (Cf. IETF imposition of the '.invalid' rule.)

PIR (the .org administrator) or other existing TLD administrator not on board?

New TLD(s) by arrangement with ICANN/IANA. Set up or commandeer an organization to administer it.

Suitable second-level domain or domains on board?

Yes, if we get this far, e.g. w3.org, doi.org.

Maintenance and repair

Over time the best-laid plans will go wrong. For each 'resolution trustee' (organization trusted to do their part in resolution):

Get institution to make credible commitment to the persistent URIs, or at least not getting in the way of them; ideally contractual
Make sure the trustee has its own succession plan (what happens if the trustee loses interest, goes away, etc.)
Make sure that other trustees up- and downstream have some way to replace a delinquent trustee

Potential failures include natural disaster, demise or replacement of technical infrastructure, contract dispute, legal action, commercial capture.

A community may some day have to set up its own alternative resolution service if the trustees fail to serve it. Let's hope it doesn't come to that.

Change log:

2011-02-04 Added reference to HTTPbis, and allude to subtlety of "ownership" and "authority" ideas.
2011-02-04 Added IETF to web of trust
2011-02-04 Added 'hybrid approach'