Copyright © 2019-2020 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
Selecting part of a resource on the Web is an ubiquitous action. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. Often these selections are expressed as fragment identifiers [url], but that is not always the case.
This document relies on existing selection techniques, providing a common model and syntax defined by the Web Annotation Data Model [annotation-model]. That specification developed a JSON-based approach to select targets or bodies of various types of Web Resources. This foundational model has been extended by adding selector types applicable to collective resources and a new model component for describing positions in text and byte streams.
What the Locator draft defines is a way to express various types of locations. However, do we need to formally define a data structure (using, e.g., IDL) that should be return when acting on those locations?
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
Due to the lack of practical business cases for Web Publications, and the consequent lack of commitment to implement the technology, the Publishing Working Group has chosen to discontinue the work on Web Publications, archive the work in the form of a Working Group Note, and focus on other areas of interest. As a consequence, the present document has also been discontinued and is being published as a Working Group Note. The public record of the group's discussions is available in group's archive of meeting minutes.
This document was still a work in progress at the time of its publication. As a result, anyone seeking to extend the Web Annotation Data Model [annotation-model] to select targets and bodies of various types of Web resources should read the approach and proposals outlined in this document with an abundance of caution. It is being published to archive the work and allow incubation, should interest emerge in the future to resume its development.
This document was published by the Publishing Working Group as a Working Group Note.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-publ-wg@w3.org (archives).
Publication as a Working Group Note does not imply endorsement by the W3C Membership.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 1 August 2017 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation.
This document is governed by the 15 September 2020 W3C Process Document.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, RECOMMENDED, and SHOULD in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This specification relies on a subset of JSON terms originally defined as part of the Web Annotation Data Model [annotation-model] and Vocabulary [annotation-vocab]. This specification extends the definitions of some of these terms and defines additional terms in order to satisfy additional use cases, but all uses conforming to original definitions remain valid. In order to ensure backward compatibility, implementations of this specification MAY ignore any JSON terms not defined in this specification (directly or by reference to the Web Annotation Data Model and Vocabulary) and MUST NOT treat as invalid any JSON term encountered that is not defined in this specification.
This section is non-normative.
Selecting a part of a resource on the Web is an ubiquitous action. Similarly, referencing a position in a resource representation is often necessary to support curation of and discourse about Web resources. Interactive editing of a resource, highlighting an area on the screen, adding an annotation to a specific point in a resource, or defining a bookmark to a location or a section of a long document are all examples that involve selection or positioning within a resource.
Over the years several techniques for selection have been developed, usually in conjunction with the media type of the resource. These include referring to a unique identifier within a resource, defining a time interval for an audio or video track, identifying an element within the DOM tree for an XML source, or using CSS style elements to locate and select content. Often these selections are expressed as fragment identifiers [url], but that is not always the case.
This document relies on a selection technique defined by the Web Annotation Data Model [annotation-model] providing a common model and syntax for such selections. That specification developed a formalism based on JSON [json] to select targets or bodies of various types of Web Resources. The model relies on the concepts of Selectors, encapsulating in a JSON object the various ways selections have been defined for different media types, and States, encapsulating selections based on HTTP requests and responses. The model also includes a way to combine and/or refine selections, a feature that may greatly improve the efficiency of applications relying on complex selections. A selection or a state specifier, as described in the original model and used in this document, may also have its own unique identity in the form of an URL. This URL SHOULD be dereferencable and return the selection/state specifier definition itself.
Using the URL of the selection definition, instead of the reference to the “complete” resource could be seen as akin to a server side redirection, returning part of a resource.
The Web Annotation Working Group has also published a separate Working Group Note entitled “Selectors and States” [selectors-states]. That note extracts the selector model from the full Web Annotation Data Model [annotation-model] to make it more palatable for users who are not necessarily interested in other aspects of the Annotation Model. Although normatively this specification refers to the [annotation-model], readers should probably consult the [selectors-states] note for a better understanding of the underlying concepts. Note, however, that the [selectors-states] Working Group Note also includes a proposal for a fragment identifier syntax; that syntax has not been used in the current specification.
The example below shows the usage of a Text Quote Selector referring to a specific portion of text in a resource:
{
"source": "http://example.org/page1",
"selector": {
"type": "TextQuoteSelector",
"exact": "annotation",
"prefix": "this is an ",
"suffix": " that has some"
}
}
The next example shows the usage of refinement: specific portion of text is selected from a paragraph; the latter is identified via a “traditional” fragment identification.
{
"source": "http://example.org/page1",
"selector": {
"type": "FragmentSelector",
"value": "para5",
"refinedBy": {
"type": "TextQuoteSelector",
"exact": "Selected Text",
"prefix": "text before the ",
"suffix": " and text after it"
}
}
}
See the [selectors-states] Note for more examples for the usage of the Web Annotation Model for selection.
The approach defined by the [annotation-model] has been extended in this specification by adding selector types applicable to collective resources and a new model component for describing positions in text and byte streams. It provides methods for selecting a segment of a collective resource (e.g., a “Web Publication” [wpub]) that itself contains or is composed of other discrete and individually identifiable resources, even when the segment of interest spans parts of more than one included resource. The common model for selection as described in this specification makes it easier to provide generic and interoperable tools and APIs to handle selections in various applications.
More specifically, this document extends the Web Annotation Data Model by adding three new selectors, namely:
These changes aim at addressing the particular requirements of resource collections on the Web, like Web Applications or Web Publications [wpub].
Additionally, the current document augments the Web Annotation Data Model of selectors and states with a new class of specifier, Positions. Two position specifiers are defined:
Although defined in conjunction with Web Publications, the techniques described in this document can be used for any type of Web Resource.
This section is normative
Wherever appropriate, this document relies on terminology defined by the note on “Publishing and Linking on the Web” [publishing-linking], including, in particular, user, user agent, browser, and address. Furthermore, the document also relies on some additional terms defined by the “Web Publication” [wpub], including a URL.
source
term, and MAY express a contextual relationship to an additional Web
Resource through a scope
term. The original [annotation-model] document used
the term Specific Resource as a generic term encompassing usages that go beyond selection. This
specification uses the term “Locator” as an alias. This term is formally defined in the [annotation-model]; this (somewhat shortened) specification is a provided as a convenient reference only. Note, however, that the Position Specifier is not part of the original specification, and has been added by this specification.
A Resource that specifies a location in or a portion of another Web Resource. It does this using Specifiers that can be any of:
Specifiers MAY be External Web Resources with their own URLs, such as in the example for the Selector construction, however it is RECOMMENDED that they be included in full within the Locator representation to avoid requiring unnecessary network interactions to retrieve all of the information.
Term | Type | Description |
---|---|---|
id | Property | The identity of the Locator A Locator SHOULD have exactly 1 URL that identifies it. |
source | Relationship | The relationship between a Locator and the resource that it is a more specific representation of,
i.e., the Source. There MUST be exactly 1 source relationship associated
with a Locator. The source resource MAY be described in detail as discussed in the Web Annotation
Data Model [annotation-model] or it MAY simply be identified by the resource’s URL. |
scope | Relationship | The relationship between a Locator and an additional resource other than the source that
provides scope or context for the Locator.There MAY be 0 or more scope relationships for each Locator. When the source is part of a group or collection of resources that has its own
URL, scope MAY be used to record this URL. |
This section is non-normative.
This term is formally defined in the [annotation-model]; this (somewhat shortened) specification is provided as a convenient reference only.
The definition of 'Selector' as used in this specification differs from the normative definition of the term in [css3-selectors]. The text / terminology in this specification regarding Selectors may need to be revised.
"Selectors" is normatively defined in W3C space by https://www.w3.org/TR/css3-selectors/ and the Locators spec should probably use another vocabulary. Locations, matchers, Pointer, etc.
Selection of part of a Web Resource requires two distinct entities:
A Selector specifies how to determine the Segment from within the Source resource. The nature of the Selector is dependent on the selection technique chosen (which determines the class of the Selector) and the media-type of the Source, as the methods to describe Segments from various media-types differ. The Source and the Selector(s) are encapsulated in a Locator.
Example Use Case: Qitara wants to associate a selection of text in a web page with a slice of a dataset. She selects both using her client, and creates Locators with Selectors for both entities before associating them with one another.
Term | Type | Description |
---|---|---|
selector | Relationship | The relationship between a Locator and a Selector. There MAY be 0 or more selector
relationships associated with a Locator. Multiple Selectors SHOULD select the same content, however
some Selectors will not have the same precision as others. User Agents MUST pick one of the described
segments, if they are different. |
{
"source": "http://example.org/page1",
"selector": "http://example.org/paraselector1"
}
This section is normative
For some use cases it is required to identify a resource that is part of a collection or group of
resources, where that collection has its own identity on the Web (and can be identified via its own URL).
An example is selecting a resource that is a chapter of a Web Publication [wpub] or Packaged Web
Publication [pwpub]. Given the URL of such a collective resource as the value of
source
, an Embedded Resource Selector can be used to select and identify an item within
the collection, e.g., the chapter, through its value
relationship. This Selector is usually
used in conjunction with additional Selectors, e.g., through refinement.
Example Use Case: Janine wants to select the cover image of a Web Publication, which is linked to the Web Publication as a whole. She uses an Embedded Resource Selector to designate the image, with the Web Publication’s address as the Source for the selector.
Term | Type | Description |
---|---|---|
type | Relationship | The class of the Selector. Range Selectors MUST have exactly 1 type and the
value MUST be EmbeddedResourceSelector . |
value | Relationship | The URL [url] of the resource within the collection or group of resources identified by
the Source. An EmbeddedResourceSelector MUST have exactly 1 value
property.The URL MAY be a relative URL, with the value of Source serving as a base URL. |
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/images/book-cover.jpg"
}
}
A frequent usage of refinement is in combination with an Embedded Resource Selector to denote the fact that a particular selection is related to, e.g., a Web Publication. For example:
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "EmbeddedResourceSelector",
"value": "MobyDickNav/html/c001.html",
"refinedBy": {
"type": "CssSelector",
"value": "#elemid > .elemclass + p"
}
}
}
Note the usage of a relative URL in the example; it is considered to be good practice to use relative URL, when applicable.
This section is normative
This section is predicated on the assumption that Packaged Web Publications (PWPs) rely on a packaging format of a media type intended exclusively for packaging PWPs or only for packaging PWPs and EPUBs. If the Working Group decides that PWPs should be packaged using a more generic packaging format with a more generally used media type, or if the Group for any other reason decides not to pursue registering this fragment identifier scheme, then this section (including subsection) should be removed.
The document consists of two parts: a description of, essentially, the Selector Model as defined by the Web Annotation Data Model, and a reformulation of that data model in the form of Fragment ID-s. It is not clear, at this moment, whether the standardization of fragment identifiers is necessary, or whether the JSON based structure fulfills the needs of the requirements. If the latter, we can remove the relevant section, and the only possible normative extension in the document is described in issue #4.
For some simple use cases involving Packaged Web Publications, it may be more convenient or more
consistent with past practice to express a simple Embedded Resource selection as a fragment
identifier [url] that can be appended to the URL of the collective resource, i.e., the
source
associated with the Embedded Resource selection. (An informative precedent
for this approach is the International Digital Publishing Forum Recommended Specification, EPUB
Canonical Fragment Identifiers 1.1 [cfi], which defines a fragment identifier
serialized model for selecting and positioning within resources of the
application/epub+zip
media type.)
A mapping for serializing simple Embedded Resource selections as fragment identifiers is defined below. This mapping allows the Segment (of interest) to be expressed in a single URL. Note that this mapping is valid only if the URL of the Source is the URL of a Packaged Web Publication and does not itself already include a fragment identifier of its own.
An Embedded Resource Selector is serialized as a fragment identifier using a function-like syntax, i.e.:
source
for the selection is the base URL to which a #
character is
appended.ERS
(in lieu of a function name), followed by a single
'parameter' enclosed in parentheses.value
from the JSON serialization of the Embedded Resource Selector. The
parameter MAY be an absolute URL or relative to the base URL. The value of the URL 'parameter' appearing in a ERS fragment identifier SHOULD be percent encoded [rfc3986]. The encoding is a MUST for characters that may make the URL ambiguous, namely:
character | code |
space | %20 |
= |
%3D |
, |
%2C |
# |
%23 |
A fragment identifier is defined for a specific media type. This means that, formally, the fragment identifier syntax and semantics defined in this section must be registered for any PWP media type(s) by IANA. Until such a registration is done, these fragment identifiers have the potential to conflict with other fragment identifier schemes specified by media type registrations.
The example below is semantically equivalent to the example on the usage of an Embedded Resource Selector:
https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(
https://dauwhe.github.io/html-first/MobyDickNav/images/cover.jpg)
(A new line character has been introduced into the Example above to facilitate readability; in real usage such new line characters are not allowed in a URL.)
The usage of a fragment identifier may also make the usage of explicit refinement unnecessary. The example below, which incidentally uses a relative rather than absolute URL, is semantically equivalent to the example combining an embedded resource selector with refinement:
{
"source": "https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(MobyDickNav/html/c001.html)",
"selector": {
"type": "CssSelector",
"value": "#elemid > .elemclass + p"
}
}
This section is non-normative.
The fragment identifier serialization mapping of Embedded Resource Selectors generally does not support refinement, except that the URL 'fragment' may include its own fragment identifier, appropriate to the media type of the resource identified by the URL. The following example illustrates such a pattern. (To increase readability, the percent encoding has been omitted from the example.)
https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(MobyDickNav/images/cover.jpg#xywh=50,50,640,480)
The URL above is the result of mapping the JSON-serialized Embedded Resource Selector below; note that the link to the Media Fragments URI 1.0 Recommendation [media-frags] cannot be mapped to the fragment identifier serialization, so the mapping is not entirely lossless.
{
"source": "https://dauwhe.github.io/html-first/MobyDick.pwpub",
"selector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/images/cover.jpg",
"refinedBy": {
"type": "FragmentSelector",
"conformsTo": "http://www.w3.org/TR/media-frags/",
"value": "xywh=50,50,640,480"
}
}
}
This section is normative
Selections from a group of resources, e.g., the group of resources which comprise a Web
Publication [wpub]), may be extensive and may span member resource boundaries. For
resource-spanning selections that are continuous in some ordering of the group of resources, a Span
Selector can be used to identify the beginning and the end of the selection using Embedded Resource Selectors, refined as appropriate.
Embedded Resource Selectors (without refinement) are also used in enumerating any intervening resources
between the beginning and end of the selection that are included in the selection. A Span Selection MUST
span at least two resources. (For continuous selections wholly contained within a single resource, use a
Range
Selector.) In the absence of refinement, the selection consists of the member resource identified
by the startSelector
property (the first resource in the selection), the member resource
identified by the endSelector
(the last resource in the selection), and the intervening
member resource(s) (in some ordering of the group) between the starting and ending member resources as
enumerated by the selectors
property. If the startSelector
is refined with
another selector, then only the part of the first resource from the start of the refined selection to
that resource's end (i.e., including what is identified by refinement) is included in the span selection.
If the endSelector
is refined with another selector, then only the part of the last resource
prior to the start of the refined selection (i.e., excluding what is identified by refinement) is
included in the selection.
The ordering of resources does not make use of external features, like the default
reading order in a Web Publication [wpub]. The order is exclusively established via the
selectors
property.
Example Use Case: Misha wants to comment on text in a Web Publication that spreads over several constituent resources. He selects the start and the end of the selection in different of those resources; his User Agent calculates the Span Selector using a series of Embedded Resource Selections from the first selection as a start and the last selector as the end to provide a continuous span.
Term | Type | Description |
---|---|---|
type | Relationship | The class of the Selector. Span Selectors MUST have exactly 1 type and the
value MUST be SpanSelector . |
startSelector | Relationship | The Selector which describes the inclusive starting point of the span. There MUST be exactly 1 startSelector associated with a Span Selector and it MUST be an Embedded
Resource Selector, which MAY be refined with other selectors. |
selectors | Relationship | Provides an ordered, possibly empty, list of Embedded
Resource Selectors, which identify intermediate resources subsumed in the full selection.
These Embedded Resource Selectors MUST NOT be refined with other selectors. There MAY at most 1 selectors relationship associated with a Span Selector. In the absence of a
selectors relationship, a user agent SHOULD assume that the start and end
resources are contiguous. |
endSelector | Relationship | The Selector which describes the exclusive ending point of the span. There MUST be exactly 1 endSelector associated with a Span Selector and it MUST be an Embedded Resource
Selector, which MAY be refined with other selectors. |
The current design requires an explicit list of selectors for the "intermediate" resources. This is to avoid making the selection dependent on an implicit reading order for a Web Publication. Is that the right choice? It would indeed simplify to rely on implicit order, but there are quite some discussions in the WG whether that is a viable assumption...
Span Selectors can be used to describe selections that span multiple embedded resources - e.g., multiple resources included in a Web Publication. For instance, a Span Selector can be used to select from the last paragraph of Chapter 2, through all of Chapter 3, and into Chapter 4 up until (but not including) the 5th paragraph.
A Multi Selector could be used to specify the same selection, but would do so as an ordered list of paragraphs and chapters, without the benefit of saying the selection was continuous. So, a Multi Selector might do this by listing: the last paragraph of Chapter 2, Chapter 3, paragraph 1 of Chapter 4, paragraph 2 of Chapter 4, paragraph 3 of Chapter 4, and paragraph 4 of Chapter 4.
A Span Selector is therefore more succinct. Are there additional use cases that would benefit from availability of a Span Selector? Is the succinctness worth having an additional type of Selector? See also the earlier discussion of #25 when we were considering the functionality of Span Selector as an extension of Ranger Selector (rather than a separate, new selector type).
What are the use cases for selections that span multiple resources contained within a single source
resource, e.g., selection spans parts of chapter2.html and chapter3.html within the same Web Publication? For these use cases does it matter if the selection is discontinuous or continuous (in some reading order of the Web Publication)?
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "SpanSelector",
"startSelector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"refinedBy" : {
"type": "TextQuoteSelector",
"exact": "Call me Ishmael.",
"suffix": "Some years ago"
}
},
"selectors": [{
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
},{
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c003.html",
}],
"endSelector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c004.html",
"refinedBy": {
"type": "TextQuoteSelector",
"exact": "He commenced dressing",
"suffix": " at top"
}
}
}
}
This section is normative
For some use cases it is required to identify a segment (of interest) that spans multiple selections, possibly over multiple members of a group of resources (e.g., spanning a subset of the resources which comprise a Web Publication [wpub]). A Multi Resource Selection can be used to identify such a segment of interest by creating an ordered list of selectors. A Multi Resource Selection identifies a collection of discrete selections, whether within a single resource or spread over several resources included in a single Source. If the segment of interest spans more than one resource, these selectors MUST all be Embedded Resource Selectors, each of which MAY be refined.
Example Use Case: Example Use Case: Rachel is writing a summative assessment question with hints pointing back to the textbook. The questions pulls on material presented in Chapter 2, a-head 3, Chapter 4, a-head 6, and in Chapter 7, a-head 8. She uses the Multi Resource Selector defining a single link to add to the hints section of her assessment questions that references Sections 2.3, Section 4.6, and Section 7.8, but nothing in between them.
Term | Type | Description |
---|---|---|
type | Relationship | The class of the Selector. Multi Resource Selectors MUST have exactly 1 type
and the value MUST be MultiResourceSelector . |
selectors | Relationship | A list of Selectors. There MUST be exactly 1 selectors list associated with a
Multi Resource Selector. The list MUST have at least 2 elements. |
{
"source": "https://textbook.example.org/",
"selector": {
"type": "MultiResourceSelector",
"selectors": [{
"type" : "EmbeddedResourceSelector",
"value": "https://textbook.example.org/section2.html",
"refinedBy": {
"type": "CssSelector",
"value": "body>section:nth-of-type(3)"
}
},{
"type": "EmbeddedResourceSelector",
"value": "https://textbook.example.org/section4.html",
"refinedBy": {
"type": "CssSelector",
"value": "body>section:nth-of-type(6)"
}
},{
"type": "EmbeddedResourceSelector",
"value": "https://textbook.example.org/section7.html",
"refinedBy": {
"type": "CssSelector",
"value": "body>section:nth-of-type(8)"
}
}]
}
}
A Position object describes a Locus (of Interest) within a stream representation of a Web Resource.
A Position specifier requires knowledge of two distinct entities:
Example Use Case: Allen, while reading chapter 1 of a digital edition of Moby Dick, generates (as a separate resource with its own URL) a Position specifier to note the position in the digital text stream representation where the first page break in chapter 1 of a Moby Dick print edition occurred.
Term | Type | Description |
---|---|---|
position | Relationship | The relationship between the Locator and a Position specifier. A Locator MAY have 0 or 1 position relationships. |
When processing a Locator that includes Selector and/or State specifier(s) the Position specifier (if present) MUST be processed last.
{
"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"position": {
"id": "http://example.org/printPageBreak-c1.1"
}
}
The TextStreamPosition specifier describes an inter-character position in a text stream by
recording the number of characters that precede the position. The property value
is used to
record this character count. A value
of 0 would describe the position immediately before the
first character, a value
of 1 would describe the position immediately after the first
character and before the second character (i.e., the position between the first two characters of the
stream), and so on. For example, if the text stream was “abcdefghijklmnopqrstuvwxyz” and the
value
was 7, then the position referenced would be the position between "g" and "h". If
n is the length of the text stream, then a value
equal to n denotes the position immediately
following the last character in the text stream.
In some situations, it is important to preserve which side of a position a location reference points to.
For example, when resolving a text stream position in a dynamically paginated environment, it could make
a difference if a position is attached to the content before or after the location being referenced
(e.g., to determine whether to display the verso or recto side at a page break). In a
TextStreamPosition
object, the bias
property MAY be used to attach a
position reference to the character preceding the position identified by the value
property
("bias": "before"
) or to the character following the position ("bias":
"after"
). For example, if the text stream was “abcdefghijklmnopqrstuvwxyz”, the
value
was 7, and the bias
was before
, then the position
referenced would be the position between "g" and "h" and would be attached to "g".
The property bias
is only meaningful when some type of break (e.g., a page
break or line break) falls or might fall at the position specified by the TextStreamPosition specifier.
Example Use Case: George notices that a letter is missing between characters 322 and 323 in the text of an HTML file he is reading and decides he wants to mark the position of the missing letter by generating a TextStreamPosition specifier so that the letter can be inserted later during editing. He also wants to ensure that if a hyphen and line break should be dynamically inserted in this position the missing character would follow the break, and so he uses the bias property to associate the position reference with the character that follows the position being referenced (i.e., character 323).
Is this a valid example as regards bias? Or is this kind of situation better handled by an application rather than conflating a side-bias property with the position reference. If not compelling enough, we need a better more concrete, real-world use case for side-bias! Otherwise we should drop side-bias.
From Section 3.1.4 (Character Offset) of EPub 3.1 CFI:
"For XML character data, the offset is zero-based and always refers to a position between characters, so 0 means before the first character and a number equal to the total UTF-16 length means after the last character. A character offset value greater than the UTF-16 length of the available text must not be specified."
And from Section 3.1.9 (Side-Bias) of EPub 3.1 CFI:
"In some situations, it is important to preserve which side of a location a reference points to. For example, when resolving a location in a dynamically paginated environment, it would make a difference if a location is attached to the content before or after it (e.g., to determine whether to display the verso or recto side at a page break)."
Assuming these are real feature requirements, I don't think we have anything precisely equivalent in Web Anno data model. Putting aside for a moment whether something called a fragment identifier can be used to specify a location, how might we be able to address a need for these functionalities?
Regarding the first bit, I do note that in Web Anno we do not specify a meaning for a TextPositionSelector or DataPositionSelector having the same value for both start and end. We do talk about "Position 0 would be immediately before the first character[/byte]". So in this doc could we specify an interpretation that if the document was "abcdefghijklmnopqrstuvwxyz", the start was 4, and the end was 4, we are specifying the location immediately before the character 'e'? For completeness should we specify what to do if end (or start) is greater than the length of the normalized text?
Regarding the second bit, side-bias, I have no idea other than to suggest that this is not something a locator or fragment identifier should have to worry about - it's something the consumer of the locator should be responsible for.
From Section 3.1.9 (Side-Bias) of EPub 3.1 CFI:
"In some situations, it is important to preserve which side of a location a reference points to. For example, when resolving a location in a dynamically paginated environment, it would make a difference if a location is attached to the content before or after it (e.g., to determine whether to display the verso or recto side at a page break)."
No one has come forward with a use case justifying this feature. If no use case is available in a timely fashion (before the end of November 2017), the feature will be removed from FPWD. Can always be added back if a use case subsequently emerges.
This issue was split off from issue #9.
Term | Type | Description |
---|---|---|
type | Relationship | The class of the Position specifier. A Text Stream Position specifier MUST have exactly 1 type and the value MUST be TextStreamPosition . |
value | Property | The count of characters in the text stream preceding the Locus (of interest). Each TextStreamPosition MUST have exactly 1 value property, and it MUST be a non-negative
integer less than or equal to the number of characters in the text stream. |
bias | Property | This property is used to associate a position reference with either the character that preceeds
it or follows it. Each TextPositionSelector MAY include 0 or 1 bias
properties, and the value of bias MUST be either before or
after . |
The text MUST be selected and normalized in the same way as for the Text Quote
Selector before counting the number of characters to determine the value
to be used.
{
"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"position": {
"type": "TextStreamPosition",
"value": 322,
"bias": "after"
}
}
Similar to the TextStreamPosition specifier, the DataStreamPosition specifier describes
a position between two bytes in a byte stream representation of a resource by recording the number of
bytes that precede the position. The property value
is used to record this byte count. A
value
of 0 would describe the position immediately before the first byte, a
value
1 would describe the position immediately after the first byte and before the
second byte (i.e., the position between the first two bytes of the stream), and so on. If n is the length
of the byte stream, then a value
equal to n denotes the position immediately following the
last byte in the stream.
Example Use Case: Paul's data processing application fails after processing the first 401 bytes of
a resource's byte stream representation. Before exiting, the application generates a
DataStreamPosition
object to record the position where processing was interrupted. This
will facilitate resumption of processing after the processing bug is resolved.
Term | Type | Description |
---|---|---|
type | Relationship | The class of the Position specifier. A Data Stream Position specifier MUST have exactly 1 type and the value MUST be DataStreamPosition . |
value | Property | The count of bytes in the byte stream preceding the Locus (of interest). Each DataStreamPosition MUST have exactly 1 value property, and it MUST be a non-negative
integer less than or equal to the number of bytes in the stream. |
{
"source": "https://example.org/MyData.json",
"position": {
"type": "DataStreamPosition",
"value": 401
}
}
Unlike Selector and State specifiers, a Position specifier can not be refined (since it identifies a point in a stream rather than a part of a resource or a representation of a resource). However, it may be easier, more reliable, more accurate, or less brittle to resolve a Position specifier in the context of a part of a resource defined by a Selector and/or a representation of a resource described by a State. Thus a Position specifier may be used to refine a Selector or State (including refined Selectors or States), as long as the Position specifier is the final refinement step processed.
Example Use Case: Deren is one of several people who are collaboratively editing an HTML file. He needs to generate a TextStreamPosition specifier identifying a position where the word "Mister" needs to be inserted in one of the paragraphs he alone has been assigned to edit. He doesn't want to specify a text character count within the text stream representation of the entire HTML file, since he knows other edits are in progress that could affect that character count. So instead he uses a CssSelector to select the paragraph of interest and then refines this selection with a TextStreamPosition specifier to reference the position within the paragraph where the insertion is needed.
{
"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"selector": {
"type": "CssSelector",
"value": "p:nth-child(2)",
"refinedBy": {
"type": "TextStreamPosition",
"value": 8
}
}
}
Editorial Changes
Non-editorial Changes
This section is non-normative.
The editors would like to thank the members of the Publishing Working Group for their contributions to this specification:
The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.