XPointer Framework

1 Introduction

This specification defines the XML Pointer Language (XPointer) Framework, an extensible system for XML addressing that underlies additional XPointer scheme specifications. The framework is intended to be used as a basis for fragment identifiers for any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. Other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

Many types of XML-processing applications need to address into the internal structures of XML resources using URI references, for example, the XML Linking Language [XLink], XML Inclusions [XInclude], the Resource Description Framework [RDF], and SOAP 1.2 [SOAP12]. This specification does not constrain the types of applications that utilize URI references to XML resources, nor does it constrain or dictate the behavior of those applications once they locate the desired information in those resources.

1.1 Notation

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]

The formal grammar for the XPointer Framework is given using simple Extended Backus-Naur Form (EBNF) notation, as described in the XML Recommendation [XML].

1.2 Terminology

[Definition: pointer ]: A string conforming to this specification. This specification defines the syntax and semantics of pointers.
[Definition: pointer part ]: A portion of a pointer that provides a scheme name and some pointer data that conforms to the definition of that scheme. The XPointer processor evaluates a pointer part to identify zero or more subresources within an XML resource.
[Definition: scheme ]: A specialized pointer data format that has a name and is defined in a specification.
[Definition: XPointer processor ]: A software component that identifies subresources of an XML resource by applying a pointer to it. This specification defines the behavior of XPointer processors.
[Definition: application ]: A software component that incorporates or uses an XPointer processor because it needs to access XML subresources. The occurrence and usage of XPointers, and the behavior to be applied to resources and subresources obtained by processing those XPointers, are governed by the definition of each application's corresponding data format (which could be XML-based or non-XML-based). For example, HTML [HTML] Web browsers and XInclude processors are applications that might use XPointer processors.
[Definition: error ]: A violation of the syntactic rules of this specification, or the failure of a pointer to identify subresources.
[Definition: namespace binding context ]: A binding of XML-namespace-defined [XML-Names] namespace prefixes to their associated namespace names.

2 Conformance

This specification defines a framework; it does not currently define a minimum conformance level for XPointer processors. Thus, the information in this section defines conformance requirements only for the framework portion of any minimum conformance level.

XPointer processors depend on the ability of applications to reverse any fragment identifier encoding and escaping (see 4 Character Escaping).

XPointer processor behaviour depends on the availability of certain information from an XML resource: in the terms provided by the [Infoset], the information items and properties tabulated below may be relevant. The presence of some of these items and properties depends in turn on conformant DTD or XML Schema processing: conformant XPointer processors are not required to do such processing, but if they do, shorthand pointer processing will take advantage of the information thus provided (see 3.2 Shorthand Pointer).

From the XML Information Set itself [Infoset]:
- document information item
  - [document element] property
    Note that if the XML resource is not a document but rather an external parsed entity, this property will not be reported. Rather, the information set is effectively extended to report the one or more top-level elements in the entity as ordered "root element" properties for the entity.
- element information item
  - [attributes] property
  - [children] property
- attribute information item
  - [attribute type] property
  - [normalized value] property
From the XML Schema post-schema validation information set (PSVI) [XMLSchema], the following properties of attribute information items and element information items:
- [schema normalized value] property
- Either:
  - [member type definition] property
  - [type definition] property
  - and the [name], [target namespace] and [base type definition] properties of the value thereof
  or:
  - [member type definition namespace] property
  - [member type definition name] property
  - [type definition namespace] property
  - [type definition name] property

Software components claiming to be XPointer processors must conform to this XPointer Framework specification and any other specifications that, together with this specification, define the minimum conformance level for XPointer, and may conform to additional XPointer scheme specifications. XPointer processors must document the additional scheme specifications to which they conform. Specifications that depend on XPointer processing should document the schemes they require and support.

Conforming XPointer processors must report XPointer Framework errors to the application. Applications are free to terminate or recover from XPointer Framework errors in any fashion.

3 Language and Processing

This section describes the XPointer Framework and the behavior of XPointer processors with respect to the framework.

An XPointer processor takes as input an XML resource and a string to be used as a pointer (for example, a fragment identifier, with escaping reversed, taken from the URI reference that was used to access the resource), attempts to evaluate the pointer with respect to the resource, and produces as output an identification of subresources, or one or more errors.

3.1 Syntax

If a string used as a pointer does not adhere to the syntax defined in this section, it is an error.

The symbol S is defined in [XML]. The symbols NCName and QName are defined in [XML-Names].

XPointer Framework Syntax

[1]	`Pointer`	::=	`Shorthand \| SchemeBased`
[2]	`Shorthand`	::=	`NCName`
[3]	`SchemeBased`	::=	`PointerPart (S? PointerPart)*`
[4]	`PointerPart`	::=	`SchemeName '(' SchemeData ')'`
[5]	`SchemeName`	::=	`QName`
[6]	`SchemeData`	::=	`EscapedData*`
[7]	`EscapedData`	::=	`NormalChar \| '^(' \| '^)' \| '^^' \| '(' SchemeData ')'`
[8]	`NormalChar`	::=	`UnicodeChar - [()^]`
[9]	`UnicodeChar`	::=	`[#x0-#x10FFFF]`

As shown in the above productions, the end of a pointer part is signaled by the right parenthesis ")" character that balances the left parenthesis "(" character that began the part. If either a left or a right parenthesis occurs in scheme data without being balanced by its counterpart, it must be escaped with a circumflex (^) character preceding it. Escaping pairs of balanced parentheses is allowed. Any literal occurrences of the circumflex must be escaped with an additional circumflex (that is, ^^). Any other use of a circumflex is an error.

3.2 Shorthand Pointer

A shorthand pointer, formerly known as a barename, consists of an NCName alone. It identifies at most one element in the resource's information set; specifically, the first one (if any) in document order that has a matching NCName as an identifier. The identifiers of an element are determined as follows:

If an element information item has an attribute information item among its [attributes] that is a schema-determined ID, then it is identified by the value of that attribute information item's [schema normalized value] property;
If an element information item has an element information item among its [children] that is a schema-determined ID, then it is identified by the value of that element information item's [schema normalized value] property;
If an element information item has an attribute information item among its [attributes] that is a DTD-determined ID, then it is identified by the value of that attribute information item's [normalized value] property.
An element information item may also be identified by an externally-determined ID value.

If no element information item is identified by a shorthand pointer's NCName, the pointer is in error.

Note:

An element information item might be identified by multiple values, in a document with more than one of DTD-determined IDs, schema-determined IDs, and externally-determined IDs. In such documents, a loss of interoperability might result if the identifier values for a particular element are not always the same.

[Definition: An element or attribute information item is a schema-determined ID if and only if one of the following is true:]

It has a [member type definition] or [type definition] property whose value in turn has [name] equal to ID and [target namespace] equal to http://www.w3.org/2001/XMLSchema;
It has a [base type definition] whose value has that [name] and [target namespace];
It has a [base type definition] whose value has a [base type definition] whose value has that [name] and [target namespace], and so on following the [base type definition] property recursively;
It has a [type definition name] equal to ID and a [type definition namespace] equal to http://www.w3.org/2001/XMLSchema;
It has a [member type definition name] equal to ID and a [member type definition namespace] equal to http://www.w3.org/2001/XMLSchema.

[Definition: An attribute information item is a DTD-determined ID if and only if it has a [type definition] property whose value is equal to ID.]

[Definition: An externally-determined ID is a string, representing an element identifier, whose value is determined by the application through mechanisms outside the scope of this specification.]

Note:

A shorthand pointer provides, for resources with XML-based media types, a rough analog of HTML fragment identifier behavior. However, if ID typing information is not available because no DTD, schema, or application-specific information is available, the pointer will not identify any element. There are several ways to make element identification more reliable. For example, the creator of a resource can use an internal DTD subset to indicate the presence of ID-typed attributes, and the creator of a pointer can, instead of a shorthand pointer, use a scheme-based pointer or provide one or more schemes that address the desired element in other ways.

Note:

The above definitions are not affected by whether or not the value which identified an element information item is unique within the document, because neither [XML] or [XMLSchema] require this for the assignment of type ID.

3.3 Scheme-Based Pointer

A scheme-based pointer consists of one or more pointer parts, optionally separated by white space (S). Each part has a scheme name and contains, within parentheses, data (EscapedData) conforming to the named scheme. If the scheme data contains parentheses, they must be either balanced or escaped.

When multiple pointer parts are provided, an XPointer processor must evaluate them in left-to-right order. If the XPointer processor does not support the scheme used in a pointer part, it skips that pointer part. If a pointer part does not identify any subresources, evaluation continues and the next pointer part, if any, is evaluated. The result of the first pointer part whose evaluation identifies one or more subresources is reported by the XPointer processor as the result of the pointer as a whole, and evaluation stops. If no pointer part identifies subresources, it is an error.

In the following example, if the 'xpointer' pointer part is not understood or fails to identify any subresources, the 'element' pointer part is evaluated. If the 'xpointer' pointer part identifies subresources, the 'element' pointer part is not evaluated.

#xpointer(id('boy-blue')/horn[1])element(boy-blue/3)

A scheme name consists syntactically of an optional Prefix and a LocalPart, as defined in [XML-Names]. Abstractly, scheme names are a tuple consisting of the LocalPart and the namespace name corresponding to that Prefix in the namespace binding context. If the namespace binding context contains no corresponding prefix, or if the (namespace name, LocalPart) pair does not correspond to a scheme name supported by the XPointer processor, the pointer part is skipped.

This specification reserves all unqualified scheme names for definition in additional XPointer schemes defined in W3C Recommendations. The use of QNames as scheme names provides a general framework for extensibility by other XML-based media types wishing to use this framework in defining their own fragment identifier languages. The definition of any scheme to be used in conjunction with the XPointer framework must specify a name for the scheme, consisting of a (namespace name, LocalPart) pair.

3.4 Namespace Binding Context

Scheme specifications may define ways to bind XML namespaces [XML-Names] prefixes to namespace names for the purpose of interpreting the prefixes of scheme names, or element names, attribute names and other QNames appearing in pointer parts. These bindings contribute to a namespace binding context that applies to all pointer parts to the right of the pointer part making the binding, unless exceptions are explicitly made by the schemes in question. The documentation for any namespace-binding scheme must specify whether its bindings remain in effect for later pointer parts. The documentation for every scheme must specify whether it uses the namespace binding context.

In the following example, the 'xmlns' scheme (see [XPtrXmlns]) is used to add a (prefix/namespace name) binding to the namespace binding context. The XPointer processor uses this information to ascertain whether img:rect denotes the name of a scheme that it supports.

#xmlns(img=http://example.org/image)img:rect(10,10,50,50)

The initial namespace binding context prior to evaluation of the first pointer part consists of a single entry: the xml prefix bound to the namespace name http://www.w3.org/XML/1998/namespace. The namespace binding context is subject to the following constraints; attempts to violate these constraints will have no effect on the namespace binding context:

The xml prefix is bound to the namespace name http://www.w3.org/XML/1998/namespace. It must not be bound to any other namespace name.
The namespace name http://www.w3.org/XML/1998/namespace is bound to the prefix xml. It must not be bound to any other prefix.
The xmlns prefix must not be bound to a namespace name.
The namespace name http://www.w3.org/2000/xmlns/ must not be bound to any prefix.

Prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved. Users should not use them except as defined by XML and XML-related specifications.

4 Character Escaping

The set of characters for XPointers is [Unicode]. However, the XPointer language is designed to be used in the context of URI references [RFC 2396] and IRI references [IRI], which require encoding and escaping of certain characters. XPointers and IRI references containing XPointers also often appear in XML documents and external parsed entities, which impose some escaping requirements of their own when the encoding limits the repertoire that can be used directly. Other contexts might require additional escaping to be applied to XPointers. Also, because some characters are significant to XPointer processing, escaping is needed to use these characters in their ordinary sense.

4.1 Escaping Contexts

The following contexts require various types of escaping to be applied to XPointers:

A. Escaping of XPointer-significant characters

As described in 3.1 Syntax, unbalanced parentheses and occurrences of the circumflex must be escaped.

B. Escaping and Encoding of reserved IRI characters

[Definition: An Internationalized Resource Identifier, or IRI is a protocol element that extends the syntax of URIs to a much wider repertoire of Unicode characters [Unicode].] IRI references allow a superset of the characters of fully escaped URI references, but must have normal occurrences of the percent sign (%) escaped because it is the character used for escaping in URIs and IRIs.

Thus, when a pointer is inserted into an IRI reference, any occurrences of percent signs (%) must be escaped. Other characters may be escaped as well, though it is not recommended. Characters are escaped as follows:

Each character to be escaped is converted to UTF-8 [RFC 2279] as one or more bytes.
The resulting bytes are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.

For example % becomes %25.

C. Escaping and Encoding of reserved URI characters

IRI references can be converted to URI references for consumption by URI resolvers. The disallowed characters in URI references include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [RFC 2396], except for the number sign (#) and percent sign (%) and the square bracket characters re-allowed in [RFC 2732]. Disallowed characters are escaped as follows:

Each disallowed character is converted to UTF-8 [RFC 2279] as one or more bytes.
The resulting bytes are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.

D. XML escaping

If a pointer appears in an XML document or external parsed entity, any characters not expressible in the encoding used must be escaped as character references, and any characters that are significant to XML processing at the point where they appear must be escaped by an appropriate mechanism such as character references or entity references. This escaping is reversed when the XML document or entity is parsed. It is not recommended that URI references (rather than the more general IRI references) be placed in XML documents. If for some reason this proves unavoidable, the same escaping mechanism applies.

Since the XPointer processor will reverse only the escaping of XPointer-significant characters (A), the application must reverse any other encodings or escapings (such as B, C, or D) that the pointer was subject to. If the result passed to the XPointer processor does not conform to the syntactic rules for XPointers in this specification, it is an error.

4.2 Examples of Escaping

The following table shows the escaping in various contexts of an XPointer containing an unbalanced parenthesis, double quotation marks, and spaces. These examples use the 'xpointer' scheme (see [XPtrXPointer]), since it allows string literals in its scheme data.

Context	Notation
Initial Scheme Data	The `xpointer` scheme data as it was initially created: string-range(//P,"my favorite smiley :-)")
A. XPointer	With the unbalanced parenthesis in the scheme data escaped, as required by this specification: xpointer(string-range(//P,"my favorite smiley :-^)"))
B. Pointer in IRI reference	Same as A (no percent sign found that needs escaping): #xpointer(string-range(//P,"my favorite smiley :-^)"))
C. IRI reference converted to URI reference	With occurrences of the double quotatation marks (`%22`), spaces (`%20`), and circumflexes (`%5E`) escaped: #xpointer(string-range(//P,%22my%20favorite%20smiley%20:-%5E)%22))
D. IRI reference in XML document	Double quotation marks escaped using XML's predefined `"` entity reference (assuming that the pointer appears in an IRI reference in a double-quoted attribute value): #xpointer(string-range(//P,"my favorite smiley :-^)"))

The following table shows the escaping of an XPointer containing accented characters in various contexts. The XML document is assumed to be encoded in US-ASCII, which does not allow the letter "é" to appear directly.

Context	Notation
Initial Scheme Data	The `xpointer` scheme data as it was initially created: id('résumé')
A. XPointer	The XPointer (no circumflexes or unbalanced parentheses in scheme data that need escaping): xpointer(id('résumé'))
B. Pointer in IRI reference	Same as A (no percent sign found that needs escaping): #xpointer(id('résumé'))
C. IRI reference converted to URI reference	With occurrences of the letter "é" (`%C3%A9`) escaped: #xpointer(id('r%C3%A9sum%C3%A9'))
D. IRI reference in XML document	Represented in the `US-ASCII` encoding; accented letters are escaped with XML character references: #xpointer(id('résumé'))

XPointer Framework

W3C Recommendation 25 March 2003

Abstract

Status of this Document

Table of Contents

Appendix

1 Introduction

1.1 Notation

1.2 Terminology

2 Conformance

3 Language and Processing

3.1 Syntax

XPointer Framework Syntax

3.2 Shorthand Pointer

3.3 Scheme-Based Pointer

3.4 Namespace Binding Context

4 Character Escaping

4.1 Escaping Contexts

4.2 Examples of Escaping

A References

A.1 Normative References

A.2 Non-Normative References