The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


W3C

XML Base (Second Edition)

W3C Proposed Edited Recommendation 20 March 2008

This version:
http://www.w3.org/TR/2008/PER-xmlbase-20080320/
Latest version:
http://www.w3.org/TR/xmlbase/
Previous versions:
http://www.w3.org/TR/2001/REC-xmlbase-20010627/ http://www.w3.org/TR/2006/PER-xmlbase-20061220/
Editors:
Jonathan Marsh, Microsoft <jmarsh@microsoft.com>
Richard Tobin, University of Edinburgh <richard@inf.ed.ac.uk>

This document is also available in these non-normative formats: HTML with diff markup and XML.


Abstract

This document proposes describes a facility, similar to that of HTML BASE, for defining base URIs for parts of XML documents.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document has been produced by the W3C XML Core Working Group as part of the W3C XML Activity. The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xmlbase

This document is a Proposed Edited Recommendation of the W3C. This second edition is not a new version of XML Base; its purpose is to clarify a number of issues that have become apparent since the first edition was published. Some of these were first published as separate errata ( http://www.w3.org/2001/06/xmlbase-errata), others were published in a public editor's draft in November 2006 ( http://www.w3.org/XML/2006/11/xmlbase-2e/Overview.html), and a PER in December 2006 ( http://www.w3.org/TR/2006/PER-xmlbase-20061220/).

This PER normatively references the draft of a replacement to RFC 3987 (here called RFC 3987 bis) for the definition of the term Legacy Extended IRI. It will not advance to Recommendation status until the replacement RFC is published, and the reference will be updated accordingly.

W3C Advisory Committee Members are invited to send formal review comments to the W3C Team until 30 June 2008. Advisory Committee Representatives should consult their WBS questionnaires. The public is invited to send comments on this document to www-xml-linking-comments@w3.org; public archives are available.

There is no implementation report or test suite for this specification, but there is a document describing methods of testing XML Base conformance.

Publication as a Proposed Edited Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is governed by the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Terminology
3 xml:base Attribute
    3.1 URI Reference Encoding and Escaping
4 Resolving Relative URIs
    4.1 Relation to RFC 3986
    4.2 Granularity of base URI information
    4.3 Matching URIs with base URIs
    4.4 Interpretation of same-document references
5 Conformance

Appendices

A References
B References (Non-Normative)
C Impacts on Other Standards (Non-Normative)
D Changes since the first edition (Non-Normative)


1 Introduction

The XML Linking Language [XLink] defines Extensible Markup Language (XML) 1.0 [XML] constructs to describe links between resources. One of the stated requirements on XLink is to support HTML [HTML 4.01] linking constructs in a generic way. The HTML BASE element is one such construct which the XLink Working Group has considered. BASE allows authors to explicitly specify a document's base URI for the purpose of resolving relative URIs in links to external images, applets, form-processing programs, style sheets, and so on.

This document describes a mechanism for providing base URI services to XLink, but as a modular specification so that other XML applications benefiting from additional control over relative URIs but not built upon XLink can also make use of it. The syntax consists of a single XML attribute named xml:base.

The deployment of XML Base is through normative reference by new specifications, for example XLink and the XML Infoset. Applications and specifications built upon these new technologies will natively support XML Base. The behavior of xml:base attributes in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined.

It is expected that a future RFC for XML Media Types will specify XML Base as the mechanism for establishing base URIs in the media types it defines.

2 Terminology

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interreted as described in [RFC 2119].]

The terms base URI and relative URI are used in this specification as they are defined in [RFC 3986].

3 xml:base Attribute

The attribute xml:base may be inserted in XML documents to specify a base URI other than the base URI of the document or external entity. The value of this attribute is interpreted as a URI Reference as defined in RFC 2396 [RFC2396], after processing according to Section 3.1 a Legacy Extended IRI (LEIRI) as defined in the successor to RFC 3897 [RFC 3987 bis] .

Note:

This PER will not become an Edited Recommendation until the successor to RFC 3897 is published, and the reference above will be amended accordingly.

In namespace-aware XML processors, the "xml" prefix is bound to the namespace name http://www.w3.org/XML/1998/namespace as described in Namespaces in XML [XML Names]. Note that xml:base can be still used by non-namespace-aware processors.

An example of xml:base in a simple document containing XLinks follows. XLink normatively references XML Base for interpretation of relative URI references in xlink:href attributes.

<?xml version="1.0"?>
<doc xml:base="http://example.org/today/"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <paragraph>See <link xlink:type="simple" xlink:href="new.xml">what's
      new</link>!</paragraph>
    <paragraph>Check out the hot picks of the day!</paragraph>
    <olist xml:base="/hotpicks/">
      <item>
        <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link>
      </item>
    </olist>
  </body>
</doc>

The URIs in this example resolve to full URIs as follows:

Note:

This specification does not give the xml:base attribute any special status as far as XML validity is concerned. In a valid document the attribute must be declared in the DTD, and similar considerations apply to other schema languages.

3.1 URI Reference Encoding and Escaping

The set of characters allowed in xml:base attributes is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors must encode and escape these characters to obtain a valid URI reference from the attribute value.

The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [RFC2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC 2732]. Disallowed characters must be escaped as follows:

  1. Each disallowed character is converted to UTF-8 [RFC 2279] as one or more bytes.

  2. Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).

  3. The original character is replaced by the resulting character sequence.

The value of an xml:base attribute is a Legacy Extended IRI and may contain characters not allowed in URIs. In accordance with the principle that that percent-encoding must occur as late as possible in the processing chain, applications which provide access to the base URI of an element should calculate and return the value without escaping.

4 Resolving Relative URIs

4.1 Relation to RFC 3986

RFC 3986 [RFC 3986] provides for base URI information to be embedded within a document. The rules for determining the base URI can be summarized as follows (highest priority to lowest):

  1. The base URI is embedded in the document's content.

  2. The base URI is that of the encapsulating entity (message, document, or none).

  3. The base URI is the URI used to retrieve the entity.

  4. The base URI is defined by the context of the application.

Note:

The term "entity" in points #2 and #3 above uses the RFC 3986 meaning of the term. Elsewhere in this document the term "entity" is used in the XML sense.

This document specifies the details of rule #1 for embedding base URI information in the specific case of XML documents.

4.2 Granularity of base URI information

Relative URIs appearing in an XML document are always resolved relative to either an element, a document entity, or an external entity. There is no provision for finer granularity, such as per-attribute, per-character, or per-entity base information. Neither internal entities, whether declared in the internal subset or in an external DTD, nor freestanding text (text not enclosed in an element) in an external entity, are considered to set a base URI separate from the base URI in scope for the entity reference.

The base URI of a document entity or an external entity is determined by RFC 3986 rules, namely, that the base URI is the URI used to retrieve the document entity or external entity.

The base URI of an element is:

  1. the base URI specified by an xml:base attribute on the element, if one exists, otherwise

  2. the base URI of the element's parent element within the document entity or external entity, if one exists, otherwise

  3. the base URI of the document entity or external entity containing the element.

The base URI of an element bearing an xml:base attribute with a value that is not a valid Legacy Extended IRI is application dependent.

4.3 Matching URIs with base URIs

The base URI corresponding to a given relative URI appearing in an XML document is determined as follows:

  • The base URI for a URI reference appearing in text content is the base URI of the element containing the text.

  • The base URI for a URI reference appearing in an xml:base attribute is the base URI of the parent element of the element bearing the xml:base attribute, if one exists within the document entity or external entity, otherwise the base URI of the document entity or external entity containing the element.

  • The base URI for a URI reference appearing in any other attribute value, including default attribute values, is the base URI of the element bearing the attribute.

  • The base URI for a URI reference appearing in the content of a processing instruction is the base URI of the parent element of the processing instruction, if one exists within the document entity or external entity, otherwise the base URI of the document entity or external entity containing the processing instruction.

Note:

The presence of xml:base attributes might lead to unexpected results in the case where the attribute value is provided, not directly in the XML document entity, but via a default attribute declared in an external entity. Such declarations might not be read by software which is based on a non-validating XML processor. Many XML applications fail to require validating processors. For correct operation with such applications, xml:base values should be provided either directly or via default attributes declared in the internal subset of the DTD.

Note:

The presence of xml:base attributes might lead to unexpected results in the case where the attribute value is provided, not directly in the XML document entity, but via a default attribute. For instance, such a declaration in an external entity might not be read by software which is based on a non-validating XML processor. Defaulting attributes through an external mechanism such as XML Schema may also lead to unexpected results; even if a validating processor is used by the application, the addition of defaulted attributes subsequent to creation of the infoset can cause xml:base attributes to get out of sync with the [base URI] infoset property. For these reasons, xml:base values should be provided either directly in the XML document instance or via default attributes declared in the internal subset of the DTD.

4.4 Interpretation of same-document references

RFC 3986 defines certain relative URI references, in particular the empty string and those of the form #fragment, as same-document references. Dereferencing of same-document references is handled specially. However, their use as the value of an xml:base attribute does not involve dereferencing, and XML Base processors should resolve them in the usual way. In particular, xml:base="" does not reset the base URI to that of the containing document.

Note:

Some existing processors do treat these xml:base values as resetting the base URI to that of the containing document, so the use of such values is strongly discouraged.

5 Conformance

An application conforms to XML Base if it calculates base URIs in accordance with the conditions set forth in this specification.

A References

RFC 2119
RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997.
RFC 2279
RFC 2279: UTF-8, a transformation format of ISO 10646. Internet Engineering Task Force, 1998.
RFC 3986
RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005.
RFC 3987 bis
RFC 2732
RFC 2732: Format for Literal IPv6 Addresses in URL's. Internet Engineering Task Force, 1999.
Unicode
The Unicode Standard. The Unicode Consortium.
XML
Extensible Markup Language (XML) 1.0. Tim Bray et al. World Wide Web Consortium.
XML Names
Namespaces in XML 1.0.. Tim Bray et al. World Wide Web Consortium.

B References (Non-Normative)

HTML 4.01
HTML 4.01 Specification. Dave Raggett, Arnaud Le Hors, Ian Jacobs, editors. World Wide Web Consortium, 1999.
XLink
XML Linking Language (XLink). Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. World Wide Web Consortium, 2000.
XML Datatypes
XML Schema Part 2: Datatypes. Paul V. Biron, Ashok Malhotra, editors. World Wide Web Consortium Working Draft.
XHTML
XHTML(TM) 1.0: The Extensible HyperText Markup Language. Steven Pemberton, et al. World Wide Web Consortium, 2000.
XML Infoset
XML Information Set. John Cowan and Richard Tobin, editors. World Wide Web Consortium, 1999.
XPath
XML Path Language James Clark and Steven DeRose, editors. World Wide Web Consortium, 1999.
XSLT
XSL Transformations. James Clark, editor. World Wide Web Consortium, 1999.

C Impacts on Other Standards (Non-Normative)

This section has been deleted.

XML Base defines a mechanism for embedding base URI information within an XML document. It does not define a mechanism to recognize which content or attribute values might contain URIs. This is only known by the specifications or applications assigning semantics to the vocabulary.

It is the intention of XML Base that future specifications and revisions of XML vocabularies identify which parts of the XML document are considered to be URIs, and provide normative reference to this specification in order to ensure that relative URIs are treated consistently across XML documents.

The impacts of XML Base on other standards (as of the publication date of this document) are described below.

  • XML 1.0 [XML] uses URI references in the system identifiers for external entities. Since these declarations appear outside of the document element (in an internal subset or external DTD), the scoping rules for xml:base prevent these URIs from being affected by the value of xml:base.

  • The XML Infoset [XML Infoset] defines the base URI property of element information items. The latest Infoset specification supports XML Base for purposes of determining the value of this property. Interfaces, applications, and specifications referencing this infoset property will support XML Base natively.

  • Namespaces in XML [XML Names] uses URI references, which as currently defined should not be resolved relative to the base URI defined by xml:base for the purposes of namespace identification. Higher level processes which dereference namespace URIs are not covered by the namespaces specification and might at their option specify that xml:base is honored for the purposes of fetching resources at those URIs.

  • The XPath [XPath] data model preserves neither base URI information nor the boundaries of external entities and thus is insufficient to support resolution of relative URI references within these entities to be resolved correctly. This includes relative URI references in xml:base attributes.

  • The XSLT [XSLT] extensions to the XPath data model do provide for base URI information to be retained, but defines this information in a way that precludes support for XML Base. Future XSLT versions might want to require support for XML Base.

  • XML Schema Part 2: Datatypes [XML Datatypes] defines a uriReference primitive datatype. The XML Datatypes specification might want to require that applications recognizing this datatype and resolving such URIs be aware of XML Base.

  • The XLink [XLink] specification requires support for XML Base.

  • XHTML [XHTML] uses URI references beyond those expressible in XLink. These URI references might be resolved by an application relative to the base URI defined by XML Base. The XHTML specification might want to describe their level of support for XML Base.

D Changes since the first edition (Non-Normative)

  1. The published errata (see http://www.w3.org/2001/06/xmlbase-errata) have been incorporated;

  2. The definition of URI reference has been switched from RFC2396 to 3986;

  3. The xml:base attribute has been redescribed as a Legacy Extended IRI, but this does not change its syntax (the December 2006 PER used the term "XML Resource Identifier" which was to be defined in an XLink revision, but that plan has been superseded by the definition of LEIRI in RFC 3987 bis);

  4. Implementations are now encouraged to return base “URIs” without escaping non-URI characters;

  5. The meanings of xml:base="" and xml:base="#frag" have been clarified;

  6. The expected reference to XML Base in the forthcoming XML Media Types RFC (“son of 3023”) has been noted;

  7. It has been clarified that normal validity rules apply to the xml:base attribute;

  8. The out-of-date appendix describing effects on other standards has been removed;

  9. Various minor editorial changes have been made.