Re: Link subjects and targets

To: gramlich@riesling.eng.sun.com (Wayne C. Gramlich), w3c-collab-annotation@w3.org
Subject: Re: Link subjects and targets
From: JCMA@ai.mit.edu (John C. Mallery)
Date: Sun, 29 Oct 1995 02:02:49 -0500
From JCMA@ai.mit.edu Sun Oct 29 02: 02:58 1995
At 7:32 PM 10/28/95, Wayne C. Gramlich wrote:

>
>    Spanning embedded annotation:
>	An annotation that is attached to contiguous span
>	within a target object/document (i.e. between two points.)

Would "region" be better understood than "spanning"? Are we considering
the case of 3D, or N-D volumes (i.e., N points)?

If there are no objections, these definitions can be stuffed into
our glossary:

    http://www.sunlabs.sun/people/wayne.gramlich/work/w3c/annote/glossary.html

Even when I add the .com to the host  name, my browser doesn't come up with a DNS resolution
for this URL.

>
>Specifying locations in immutable documents is quite easy, a simple
>byte offset will do.  Specifiying locations in mutable documents
>is more tricky, since the document contents can change and thereby
>change the absolute byte offset of the location.  For both text
>and HTML documents, some sort of pattern based (i.e. reg. exp.)
>will work.  When HTML named anchors (i.e. <A NAME="...">) are
>available in mutable documents, they are good candidates for location
>specification as well.  No matter what, annotations attached to
>mutable documents may become broken if the document is changed
>sufficiently.

Supposing we have the same immutable document in several formats (.e.g, text, html, PDF)
and we wish to attach a linke to a point or 2-d region, how to we align the byte offset
anchoring approach? How do we know we're referring to the same logical region?

What happens when someone generates a version on the same thing in another format?
How do we extropolate from what we know about existing attachments to the new document
format? Can we?

I don't really like the variable width syntax.  I would rather keep it consistent by always using
a triple (target relation annotation) and then providing the attachment as a subsyntax to
either target or annotation. (This will be consistent with what the link group has in mind.)

This triple approach keeps the semantically important information on the same level.

It allows the location substructure to be hacked independently of the toplevel syntax,
ergo, without breaking those parsers.

It allows additional toplevel arguments to be added in the future, e.g. constraints that
the attachment relation.  Constraints might allow a broswer to suppress any attachments
the user didn't want to see (e.g. X rated comments).  Constraints might provide a way for
the issuing of the annotation to also make statements about the relation instance that
constitutes the annotation.

Thus, each specification of an attachment location is:

Attachment :: (url | urn  attachment-location)

attachment-location::  (start | end | point-location | region-location)

>
>    Locations:
>	Immutable Text Document
>		Byte-offset | Pattern
>	Mutable Text Document
>		Pattern
>	Immutable HTML Document
>		Byte-offset | Pattern | #Name
>	Mutable HTML Document
>		Pattern | #Name

How far do you think you can get with byte offset and pattern?  It would be great if
it goes a very long way, but it could optimistic.

Certainly, immutable document are a critical simplification we will need a way
that people assert that their document never will change -- no matter what insults
someone attaches to it.

The point of the PDI fragment syntax was to allow extensibility rather than assuming
we can know all the answers in advance (ISO style).

What happens with images, sound, or video with the location syntax proposed above?

How to I draw a circle around  a part of an image and attach my annotation to it?

How many formats are there?  

What do we do about each new format as it comes along?

How do we embed someone else's existing location syntax?

The idea in the PDI fragment syntax was to use a default for each mime type that most
people would use but to allow a named location syntax to be provided for other schemes, including
possibly non-standard or proprietary schemes.

Regardles of parsing consistency, any specific syntax needs to address the extensibility and
legacy/proprietary issues.

On the pattern question, we better specify the language/engine for the search specification.
Supposing I want to say use my Java-capable j-random parser and run to some particular point
that my j-random specification drives the parser.  What happens if pattern=regx?

BTW, I am not wedded to the PDI URN concept.  It is two years old and I have better ideas
now, just as it was better than my semanticly overloaded earlier idea.