This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process is expressed in XML-friendly syntax (elements, attributes, URI References).

Status of this document

The XML Core Working Group, with this 2000 October 26 XInclude working draft, invites comment on this specification.

The W3C Membership and other interested parties are invited to review the specification, provide comment, and report early implementation experience. The area of work covered by this specification was outlined in the XML Inclusion Proposal (XInclude), W3C Note of 23 November 1999 [XInclude]. The purpose of publishing this draft is to update the community on our progress in this area and to solicit feedback on the current draft. It should be noted that the WG plans to take this specification to a Last Call review in the near future.

While the WG has decided to publish this working draft, outstanding issues remain as noted in the draft. Based on feedback from the user community, this draft has returned to the element-based syntax of the March 22nd working draft.

Comments on this document should be sent to www-xml-xinclude-comments@w3.org, which is publicly archived. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.

Appendices

3. Processing Model

Inclusion as defined in this document is a specific type of XML Information Set [XML Infoset] transformation.

[Definition: ] The input for the inclusion transformation consists of a source infoset. [Definition: ] The output, called the result infoset, is a new infoset which merges the source infoset with the infosets of resources identified by URI references appearing in include elements. Thus a mechanism to resolve URIs and return the identified resources as infosets is assumed. Well-formed XML entities that do not have defined infosets (e.g. an external entity file with multiple top-level elements) are outside the scope of this specification, either for use as a source infoset or the result infoset.

Inclusion is indicated by the presence of include elements in the source infoset. [Definition: ] An include element is any element matching the syntactic requirements set forth in this specification.[Definition: ] The information items located by the include element's URI reference are called the included items. The result infoset is essentially a copy of the source infoset, with each include element, replaced by its corresponding included items.

3.1. The Include Location

The value of the href attribute is interpreted as a URI reference. The set of characters allowed in an href attribute is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors must encode and escape these characters to obtain a valid URI reference from the attribute value.

The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows:

Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes.
Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.

The base URI for relative URIs is the base URI of the include element as specified in XML Base [XML Base]. [Definition: ] The URI resulting from resolution to absolute URI form is called the include location.

3.2. Included Items when parse="xml"

When parse="xml", the include location is dereferenced and the resource is fetched. This resource is treated as an XML resource, and is parsed into an information set. [Definition: ] Include elements in this infoset are recursively processed to create the acquired infoset.

Issue (XInclude-70-time-dependent-resources): URIs accessed at different times (say, during an "XInclude run" on two identical include elements in the same document) may produce different results. Do we need to say anything about this?

Resources that are unavailable for any reason (for example the resource doesn't exist, connection difficulties or security restrictions prevent it from being fetched, the URI scheme isn't a fetchable one, or a syntax error in an XPointer) result in an error. Resources that contain non-well-formed XML result in an error.

Issue (XInclude-58-invalid-xml): This implies the use of a non-validating parser, or at least makes no provision for surfacing of validation errors. Is this underspecified?

The fragment part of the URI reference is interpreted as an XPointer [XPointer] when parse="xml". The XPointer indicates that a subresource, or part of the acquired resource, is the target for inclusion.

Issue (XInclude-68-mime-xpointer): When are XPointers allowed? When the resource is of type text/xml or application/xml? Or when parse="xml"?

Issue (XInclude-69-non-xpointer-fragments): What is the behavior of fragments for non-XML resources? Do we ignore fragments which aren't XPointers or do we throw an error?

The set of included items is derived from the acquired infoset as follows:

3.2.1. Document Information Items

An include location might identify the document node (for instance, a URI reference without an XPointer, or an XPointer specifically locating the document root. In this case, the set of included items is the [children] of the acquired infoset's document information item, except for the document type declaration information item child, if one exists.

Issue (XInclude-60-top-level-whitespace): The Infoset does not provide for whitespace outside the document element to be preserved. Accordingly, this whitespace will be stripped by XInclude. If this isn't desirable, the Infoset will have to make provision to expose the whitespace.

Issue (XInclude-61-wrap-document): Do we wrap this in a document entity to preserve base URI and charset? What is the relationship between the document information item and the document entity? The minimal infoset doesn't have to have a document entity.

Ed. note: Add example of ignorable and non-ignorable whitespace.

Issue (XInclude-56-doctype): You don't state (as far as I could tell) what should happen to doctype nodes in included documents. [Donald Ball http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0014.html].

3.2.2. Multiple Nodes

An include location with an XPointer might identify a subresource that consists of more than a single node. In this case the set of included items is the set of information items from the acquired infoset corresponding to the nodes referred to by the XPointer, in the order in which they appear in the acquired infoset.

If the document element in the source infoset is an include element, it is an error to attempt to replace it with more than a single element.

3.2.3. Range Locations

An include location with an XPointer might identify a location set that represents a range or a set of ranges.

Each range corresponds to a set of information items in the acquired infoset. [Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range.[Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.

[Definition: ] An information item is potentially included if it is either selected selected or partially selected by the range. The [children] property of selected information items is not modified. The [children] property of partially selected information items is the set of information items that are in turn either selected or partially selected, and so on.

The set of included items is the union, in document order with duplicates removed, of the potentially included information items corresponding to each range.

3.2.4. Element, Cmment, and Processing Instruction Information Items

An include location that contains an XPointer might identify an element node, a comment node, or a processing instruction node, respectively representing an element information item, a comment information item, or a processing instruction information item. In this case the set of included items consists of the information item corresponding to the element, comment, or processing instrution node in the acquiring infoset.

3.2.5. Attribute and Namespace Declaration Information Items

An include location that contains an XPointer might identify an attribute node or a namespace node. Identifying such a node is an error.

3.2.6. Encodings

Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.

Issue (XInclude-67-determining-encoding): When a document is fetched using HTTP, it may have an encoding value in the HTTP header. When a document that is fetched by that or any other means is an XML document, it may (but need not) contain an <?xml?> declaration specifying an encoding. But if a document is fetched by nfs:, afs:, file:, ftp:, and does not contain an <?xml ... encoding='...'?> declaration or is to be included as text, what encoding does it use?
There is a clear need for xinclude:encoding The value of this attribute is an EncName as defined in XML 1.0 spec., section 4.3.3, rule [81], specifying how the resource is to be translated. [http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0021.html]

3.2.7. Inclusion Loops

When recursively processing an include element, it is an error to process an include element with an include location that has already been processed in the inclusion chain.

Issue (XInclude-59-include-location-simplification): This reuses the definition of 'include location', which is both absolutized and canonicalized. We previously decided only absolutization was necessary. Is character escaping harmful in this case? It simplifies the spec...

In other words, the following are all legal:

An include element with a parse="text" attribute may reference the document containing the include element.
An include element may identify a different part of the same local resource.
Two non-nested include elements may identify a resource which itself contains an include element.

The following are illegal:

An include element with a parse="xml" (or no specified parse value) pointing to itself or any ancestor thereof.
An include element pointing to any include element or ancestor thereof which has already been processed at a higher level.

3.3. Included Items when parse="text"

When parse="text", the include location is dereferenced and the resource is fetched. This resource is treated as a plain text resource.

[Definition: ] A range of characters (the selected range) may be identified by a fragment identifier. The syntax of the fragment identifier is interpreted using the syntax of the fragment identifier for the media type text/plain. In the absence of a fragment identifier, the selected range contains all the characters in the document.

NOTE: There is currently no standard defining fragment identifiers for the media type text/plain. So it is currently an error to specify a fragment identifier when parse="text".

The set of characters in the selected range is converted to a set of included items as follows:

An entity start marker information item. The [entity] property is set to an entity declaration information item with the following property values:
- [entity type] is "external general entity".
  
  Issue (XInclude-66-entity-type): This seems a bit untrue. Should we define a new type of entity instead of reusing an old one?
- [name] is null.
- [system identifier] is the include location for the include.
- [public identifier] is null.
- [base URI] is the include location for the include.
- [notation] is null.
- [content] is null.
- [charset] The name of the character encoding in which the entity is expressed.
  
  Issue (XInclude-62-text-encoding): How is the encoding determined for text? We don't want to look inside the text for a format-specific indication. Is it adequate to state "This property is derived from a MIME header"?
For each character in the selected range, a character information item is created. The character code is set to the character code representing the character in ISO 10646 encoding. The [element content whitespace] flag is set to false.
An entity end marker information item. The [entity] property is set to the [entity] property of the corresponding entity start marker.

Resources that resolve to something other than text when parse="text" result in an error.

Issue (XInclude-45-fail-text): It is easy to see how to fail a non-xml resource - it's not well-formed. Is there a similarly well-defined mechanism for determining the success of a parse="text" inclusion? Or do we need to rely on the media type text/*? (We intentionally don't rely on text/xml, as we want to enable things like image/svg.)

3.4. Creating the Result Infoset

The result infoset is a copy of the source infoset, with each include element replaced as follows:

The information item for the include element is found. [Definition: ] The [parent] property of this item refers to an information item called the include parent. The [children] property of the include parent is modified by removing the include element information item from it. In place of the include element, the following information items are inserted in order:

An Include Start Marker information item. The [parent] property of this item is set to the include parent. The [include element] property is set to the information item representing the include element.
The included items. The [parent] property of each included item is set to the include parent.
An Include End Marker information item. The [parent] property of this item is set to the include parent. The [include element] property is set to the information item representing the include element.

Iintra-document references within include elements must be resolved against the source infoset. The effect of this is that the order in which include elements are processed does not affect the result.

In the following example, the second include always points to the first <xinclude:include> element and not to itself, regardless of the order in which the includes are processed.

<x xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <xinclude:include href="something.xml"/>
  <xinclude:include href="#xpointer(x/xinclude:include[1])"
             parse="text"/>
</x>

Issue (XInclude-57-sax): In section 3.1 you state that internal xpointer references must be resolved against the original source document. That's not so hard to do in DOM (though expensive if you do it merely by cloning the original document) but I think it's going to be quite tricky to do it in SAX. [Donald Ball http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0014.html].

Issue (XInclude-36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.

Issue (XInclude-55-entity-fixup): But it occurs to me that if entity start/ends are preserved by xinclusion, then dummy entity start/end items should be inserted around the included nodes to ensure that they are balanced (in the same way that unbalanced element structure gets fixed up). [Richard Tobin: http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JulSep/0017.html (W3C Members only)]

3.4.1. Properties Preserved by the Infoset

As an infoset transformation, XInclude operates on the logical structure of XML documents, not on their text serialization. All properties of an information item other than those specifically modified by this proposal are preserved during inclusion.

3.4.1.1. Namespace Declarations

A source infoset might contain namespace declaration information items. The namespace URI property is considered to be part of the element information item, and merging infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML text source. A serialized result infoset might contain additional namespace declarations when including a sub-resource.

For example, the following document:

<foo xmlns:x="uri1">
 <xinclude:include href="common.xml#xptr(a/b)"/>
</foo>

including a node from common.xml:

<a xmlns:x="uri2">
  <b>
    <x:a/>
  </b>
</a>

results in a document that could be serialized as:

<foo xmlns:x="uri1">
  <b xmlns:x="uri2">
    <x:a/>
  </b>
</foo>

This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.

Serialization, and specifically where additional namespace declarations might appear, is not constrained by this specification.

Issue (XInclude-52-infoset-properties): We specifically say that the namespace name property of an element is preserved when the infosets are merged. What about the in-scope namespaces property? This seems to be needed so that qnames in the included nodes can be resolved. What about the "declared namespaces" property? More generally, should there be a list of infoset properties that must be preserved or deleted?

Issue (XInclude-63-accidental-scoping): If we preserve the in-scope namespaces property, we may encounter the situation where an element has fewer in-scope namespaces than its parent. There is no syntax for "undeclaring" namespaces. If a result infoset is serialized and then reparsed, it will not be identical to the original result infoset. On the other hand, it is unlikely (impossible) that any of the extra in-scope namespaces will actually be referred to within the included context. Are there any situations where this information is harmful?

3.4.1.2. Base URI

The base URI property of the acquired infoset is not changed as a result of merging the infoset, and remains unchanged after merging. Thus relative URI references in the included infoset resolve to the same URI despite being included into a document with a potentially different base URI in effect. A serialized result infoset may need to add xml:base attributes to indicate this fact.

3.5. Infoset Extensions

3.5.1. Include Start Marker

[Definition: ] An include start marker information item marks the start of a set of information items resulting from an inclusion.

An include start marker information item has the following properties:

[include element]: The element information item representing the include element causing the inclusion.
[parent]: The element information item which contains this information item in its [children] property.

3.5.2. Include End Marker

[Definition: ] An include end marker information item marks the end of a set of information items resulting from an inclusion.

An include end marker information item has the same properties as an include start marker. The values of these properties is the same as those of the corresonding include start marker.

Appendices

A. References

IETF RFC 2119: RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
IETF RFC 2279: RFC 2279: UTF-8, a transformation format of ISO 10646. Internet Engineering Task Force, 1998. (See http://www.ietf.org/rfc/rfc2279.txt.)
IETF RFC 2396: RFC 2396: Uniform Resource Identifiers. Internet Engineering Task Force, 1995. (See http://www.ietf.org/rfc/rfc2396.txt.)
IETF RFC 2732: RFC 2732: Format for Literal IPv6 Addresses in URL's. Internet Engineering Task Force, 1999. (See http://www.ietf.org/rfc/rfc2732.txt.)
Unicode: The Unicode Consortium. The Unicode Standard.(See http://www.unicode.org/unicode/standard/standard.html.)
XML: Tim Bray, Jean Paoli, and C.M. Sperberg-McQueen, editors. Extensible Markup Language (XML) 1.0. World Wide Web Consortium, 1998. (See http://www.w3.org/TR/REC-xml.)
XML Base: Jonathan Marsh, editor. XML Base. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xmlbase.)
XML Infoset: John Cowan and David Megginson, editors. XML Information Set. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xml-infoset.)
XML Names: Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XPointer: Steve DeRose, Ron Daniel, Eve Maler, editors. XML Pointer Language (XPointer). World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xptr.)

B. References (Non-Normative)

XInclude: Jonathan Marsh, David Orchard, editors. XML Inclusion Proposal (XInclude). World Wide Web Consortium, 1999. (See http://www.w3.org/TR/1999/NOTE-xinclude-19991123.)
XLink: Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. XML Linking Language (XLink). World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xlink/.)

C. Examples (Non-Normative)

C.1. Basic Inclusion Example

The following XML document contains an include element which points to an external document.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120 Mz is adequate for an average home user.</p>
  <xinclude:include href="disclaimer.xml"/>
</document>

disclaimer.xml contains:

<?xml version='1.0'?>
<disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120 Mz is adequate for an average home user.</p>
  <disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>
</document>

C.2. Range Inclusion Example

The following illustrates the results of including a range specified by an XPointer.

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <xinclude:include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"
       href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to 
                                 string-range(chapter/p[2]/i,'3.',0,11))"/>
  </quotation>
</document>

source.xml contains:

<chapter>
  <p>Sentence 1.  Sentence 2.</p>
  <p><i>Sentence 3.  Sentence 4.</i>  Sentence 5.</p>
</chapter>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <p>Sentence 2.</p>
  <p><i>Sentence 3.</i></p>
  </quotation>
</document>

C.3. Textual Inclusion Example

The following XML document includes a "working example" into a document.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example><xinclude:include href="data.xml" parse="text"/></example>
</document>

data.xml contains:

<?xml version='1.0'?>
<data>
  <item><![CDATA[Brooks & Sheilds]]></item>
</data>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example>&lt;?xml version='1.0'?&gt;
&lt;data&gt;
  &lt;item&gt;&lt;![CDATA[Brooks &amp; Sheilds]]&gt;&lt;/item&gt;
&lt;/data&gt;</example>
</document>

D. Open Issues List (Non-Normative)

A tabulation of open issues flagged above follows:

XML Inclusions (XInclude) Version 1.0

W3C Working Draft 26 October 2000

Abstract

Status of this document

Table of Contents

Appendices

1. Introduction

1.1. Relationship to XLink

1.2. Relationship to XML External Entities

1.3. Relationship to DTDs

1.4. Relationship to XML Schemas

1.5. Relationship to Grammar-Specific Inclusions

2. Terminology

3. Processing Model

3.1. The Include Location

3.2. Included Items when parse="xml"

3.2.1. Document Information Items

3.2.2. Multiple Nodes

3.2.3. Range Locations

3.2.4. Element, Cmment, and Processing Instruction Information Items

3.2.5. Attribute and Namespace Declaration Information Items

3.2.6. Encodings

3.2.7. Inclusion Loops

3.3. Included Items when parse="text"

3.4. Creating the Result Infoset

3.4.1. Properties Preserved by the Infoset

3.4.1.1. Namespace Declarations

3.4.1.2. Base URI

3.5. Infoset Extensions

3.5.1. Include Start Marker

3.5.2. Include End Marker

4. Syntax

5. Conformance

5.1. Markup Conformance

5.2. Application Conformance

Appendices

A. References

B. References (Non-Normative)

C. Examples (Non-Normative)

C.1. Basic Inclusion Example

C.2. Range Inclusion Example

C.3. Textual Inclusion Example

D. Open Issues List (Non-Normative)