Copyright © 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process is expressed in XML-friendly syntax (elements, attributes, URI References).
The XML Core Working Group, with this 2000 October 26 XInclude working draft, invites comment on this specification.
The W3C Membership and other interested parties are invited to review the specification, provide comment, and report early implementation experience. The area of work covered by this specification was outlined in the XML Inclusion Proposal (XInclude), W3C Note of 23 November 1999 [XInclude]. The purpose of publishing this draft is to update the community on our progress in this area and to solicit feedback on the current draft. It should be noted that the WG plans to take this specification to a Last Call review in the near future.
While the WG has decided to publish this working draft, outstanding issues remain as noted in the draft. Based on feedback from the user community, this draft has returned to the element-based syntax of the March 22nd working draft.
Comments on this document should be sent to www-xml-xinclude-comments@w3.org, which is publicly archived. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.
It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.
Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This proposal introduces a generic mechanism for merging XML documents (as represented by their information sets) for use by applications that need such a facility. The syntax leverages existing XML constructs - elements, attributes, and URI references.
The requirements used to guide the development of XInclude may be found in the XML Inclusion Proposal W3C Note of 23 November 1999 [XInclude].
XInclude differs from the linking features described in the XML
Linking Language [XLink], specifically links with the
attribute value show="embed"
. Such links provide
a media-type independent syntax for indicating that a resource
is to be embedded graphically within the display of the document.
XLink does not specify a specific processing model, but simply
facilitates the detection of links and recognition of associated
metadata by a higher level application.
XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.
Simple node inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.
There are a number of differences between XInclude and XML external entities [XML] which make them complementary technologies.
Processing of external entities (as with the rest of DTDs) occurs at parse time. XInclude operates on information sets and thus is orthogonal to parsing.
Declaration of external entities requires a DTD or internal subset. This places a set of dependencies on inclusion, for instance, the syntax for the DOCTYPE declaration requires that the document element be named - clearly orthogonal to inclusion in many cases. Validating parsers must have a complete content model defined. XInclude is orthogonal to validation and the name of the document element.
External entities provide a level of indirection - the external entity must be declared and named, and separately invoked. XInclude uses direct references. Applications which generate XML output incrementally can benefit from not having to pre-declare inclusions.
The syntax for an internal subset is cumbersome to many authors of simple well-formed XML documents. XInclude syntax is based on familiar XML constructs.
XInclude defines no relationship to DTD validation. XInclude describes an infoset-to-infoset transformation and not a change in XML 1.0 parsing behavior. XInclude does not define a mechanism for DTD validation of the resulting infoset.
XInclude defines no relationship to the augmented infosets produced by applying an XML Schema. Such an augmented infoset can be supplied as the input infoset, or such augmentation may be applied to the infoset resulting from the inclusion.
Special-purpose inclusion mechanisms have been introduced into specific XML grammars. XInclude provides a generic mechanism for recognizing and processing inclusions, and as such can offer a simpler overall authoring experience, greater performance, and less code redundancy.
[Definition: ] The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [IETF RFC 2119].
Inclusion as defined in this document is a specific type of XML Information Set [XML Infoset] transformation.
[Definition: ] The input for the inclusion transformation consists of a source infoset. [Definition: ] The output, called the result infoset, is a new infoset which merges the source infoset with the infosets of resources identified by URI references appearing in include elements. Thus a mechanism to resolve URIs and return the identified resources as infosets is assumed. Well-formed XML entities that do not have defined infosets (e.g. an external entity file with multiple top-level elements) are outside the scope of this specification, either for use as a source infoset or the result infoset.
Inclusion is indicated by the presence of include elements in the source infoset. [Definition: ] An include element is any element matching the syntactic requirements set forth in this specification.[Definition: ] The information items located by the include element's URI reference are called the included items. The result infoset is essentially a copy of the source infoset, with each include element, replaced by its corresponding included items.
The value of the href
attribute is interpreted as
a URI reference. The set of characters allowed in an href
attribute is the same as for XML, namely [Unicode].
However, some Unicode characters are disallowed from URI references,
and thus processors must encode and
escape these characters to obtain a valid URI reference from the
attribute value.
The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows:
Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes.
Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.
The base URI for relative URIs is the base URI of the include element as specified in XML Base [XML Base]. [Definition: ] The URI resulting from resolution to absolute URI form is called the include location.
When parse="xml"
, the
include location
is dereferenced and the resource is fetched.
This resource is treated as an XML resource, and is parsed into
an information set.
[Definition: ] Include
elements in this infoset are recursively processed
to create the acquired infoset.
Issue (XInclude-70-time-dependent-resources): URIs accessed at different times (say, during an "XInclude run" on two identical include elements in the same document) may produce different results. Do we need to say anything about this?
Resources that are unavailable for any reason (for example the resource doesn't exist, connection difficulties or security restrictions prevent it from being fetched, the URI scheme isn't a fetchable one, or a syntax error in an XPointer) result in an error. Resources that contain non-well-formed XML result in an error.
Issue (XInclude-58-invalid-xml): This implies the use of a non-validating parser, or at least makes no provision for surfacing of validation errors. Is this underspecified?
The fragment part of the URI reference is interpreted as
an XPointer [XPointer] when parse="xml"
.
The XPointer indicates that a subresource, or part of the
acquired resource, is the target for inclusion.
Issue (XInclude-68-mime-xpointer): When are XPointers allowed? When the resource is of type text/xml or application/xml? Or when parse="xml"?
Issue (XInclude-69-non-xpointer-fragments): What is the behavior of fragments for non-XML resources? Do we ignore fragments which aren't XPointers or do we throw an error?
The set of included items is derived from the acquired infoset as follows:
An include location might identify the document node (for instance, a URI reference without an XPointer, or an XPointer specifically locating the document root. In this case, the set of included items is the [children] of the acquired infoset's document information item, except for the document type declaration information item child, if one exists.
Issue (XInclude-60-top-level-whitespace): The Infoset does not provide for whitespace outside the document element to be preserved. Accordingly, this whitespace will be stripped by XInclude. If this isn't desirable, the Infoset will have to make provision to expose the whitespace.
Issue (XInclude-61-wrap-document): Do we wrap this in a document entity to preserve base URI and charset? What is the relationship between the document information item and the document entity? The minimal infoset doesn't have to have a document entity.
Ed. note: Add example of ignorable and non-ignorable whitespace.
Issue (XInclude-56-doctype): You don't state (as far as I could tell) what should happen to doctype nodes in included documents. [Donald Ball http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0014.html].
An include location with an XPointer might identify a subresource that consists of more than a single node. In this case the set of included items is the set of information items from the acquired infoset corresponding to the nodes referred to by the XPointer, in the order in which they appear in the acquired infoset.
If the document element in the source infoset is an include element, it is an error to attempt to replace it with more than a single element.
An include location with an XPointer might identify a location set that represents a range or a set of ranges.
Each range corresponds to a set of information items in the acquired infoset. [Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range.[Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.
[Definition: ] An information item is potentially included if it is either selected selected or partially selected by the range. The [children] property of selected information items is not modified. The [children] property of partially selected information items is the set of information items that are in turn either selected or partially selected, and so on.
The set of included items is the union, in document order with duplicates removed, of the potentially included information items corresponding to each range.
An include location that contains an XPointer might identify an element node, a comment node, or a processing instruction node, respectively representing an element information item, a comment information item, or a processing instruction information item. In this case the set of included items consists of the information item corresponding to the element, comment, or processing instrution node in the acquiring infoset.
An include location that contains an XPointer might identify an attribute node or a namespace node. Identifying such a node is an error.
Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.
Issue (XInclude-67-determining-encoding): When a document is fetched using HTTP, it may have an encoding value in the HTTP header. When a document that is fetched by that or any other means is an XML document, it may (but need not) contain an <?xml?> declaration specifying an encoding. But if a document is fetched by nfs:, afs:, file:, ftp:, and does not contain an <?xml ... encoding='...'?> declaration or is to be included as text, what encoding does it use?There is a clear need for xinclude:encoding The value of this attribute is an EncName as defined in XML 1.0 spec., section 4.3.3, rule [81], specifying how the resource is to be translated. [http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0021.html]
When recursively processing an include element, it is an error to process an include element with an include location that has already been processed in the inclusion chain.
Issue (XInclude-59-include-location-simplification): This reuses the definition of 'include location', which is both absolutized and canonicalized. We previously decided only absolutization was necessary. Is character escaping harmful in this case? It simplifies the spec...
In other words, the following are all legal:
An include element with a parse="text"
attribute may reference the
document containing the include element.
An include element may identify a different part of the same local resource.
Two non-nested include elements may identify a resource which itself contains an include element.
The following are illegal:
An include element with a parse="xml"
(or no specified parse value) pointing to itself or any
ancestor thereof.
An include element pointing to any include element or ancestor thereof which has already been processed at a higher level.
When parse="text"
, the
include location
is dereferenced and the resource is fetched. This resource is
treated as a plain text resource.
[Definition: ] A range of characters (the selected range) may be identified by a fragment identifier. The syntax of the fragment identifier is interpreted using the syntax of the fragment identifier for the media type text/plain. In the absence of a fragment identifier, the selected range contains all the characters in the document.
NOTE: There is currently no standard defining fragment identifiers for the media type text/plain. So it is currently an error to specify a fragment identifier when
parse="text"
.
The set of characters in the selected range is converted to a set of included items as follows:
An entity start marker information item. The [entity] property is set to an entity declaration information item with the following property values:
[entity type] is "external general entity".
Issue (XInclude-66-entity-type): This seems a bit untrue. Should we define a new type of entity instead of reusing an old one?
[name] is null.
[system identifier] is the include location for the include.
[public identifier] is null.
[base URI] is the include location for the include.
[notation] is null.
[content] is null.
[charset] The name of the character encoding in which the entity is expressed.
Issue (XInclude-62-text-encoding): How is the encoding determined for text? We don't want to look inside the text for a format-specific indication. Is it adequate to state "This property is derived from a MIME header"?
For each character in the selected range, a character information item is created. The character code is set to the character code representing the character in ISO 10646 encoding. The [element content whitespace] flag is set to false.
An entity end marker information item. The [entity] property is set to the [entity] property of the corresponding entity start marker.
Resources that resolve to something other than text when
parse="text"
result in an error.
Issue (XInclude-45-fail-text): It is easy to see how to fail a non-xml resource - it's not well-formed. Is there a similarly well-defined mechanism for determining the success of a parse="text" inclusion? Or do we need to rely on the media type text/*? (We intentionally don't rely on text/xml, as we want to enable things like image/svg.)
The result infoset is a copy of the source infoset, with each include element replaced as follows:
The information item for the include element is found. [Definition: ] The [parent] property of this item refers to an information item called the include parent. The [children] property of the include parent is modified by removing the include element information item from it. In place of the include element, the following information items are inserted in order:
An Include Start Marker information item. The [parent] property of this item is set to the include parent. The [include element] property is set to the information item representing the include element.
The included items. The [parent] property of each included item is set to the include parent.
An Include End Marker information item. The [parent] property of this item is set to the include parent. The [include element] property is set to the information item representing the include element.
Iintra-document references within include elements must be resolved against the source infoset. The effect of this is that the order in which include elements are processed does not affect the result.
In the following example, the second include always points to the first <xinclude:include> element and not to itself, regardless of the order in which the includes are processed.
<x xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <xinclude:include href="something.xml"/> <xinclude:include href="#xpointer(x/xinclude:include[1])" parse="text"/> </x> |
Issue (XInclude-57-sax): In section 3.1 you state that internal xpointer references must be resolved against the original source document. That's not so hard to do in DOM (though expensive if you do it merely by cloning the original document) but I think it's going to be quite tricky to do it in SAX. [Donald Ball http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Jul/0014.html].
Issue (XInclude-36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.
Issue (XInclude-55-entity-fixup): But it occurs to me that if entity start/ends are preserved by xinclusion, then dummy entity start/end items should be inserted around the included nodes to ensure that they are balanced (in the same way that unbalanced element structure gets fixed up). [Richard Tobin: http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JulSep/0017.html (W3C Members only)]
As an infoset transformation, XInclude operates on the logical structure of XML documents, not on their text serialization. All properties of an information item other than those specifically modified by this proposal are preserved during inclusion.
A source infoset might contain namespace declaration information items. The namespace URI property is considered to be part of the element information item, and merging infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML text source. A serialized result infoset might contain additional namespace declarations when including a sub-resource.
For example, the following document:
<foo xmlns:x="uri1"> <xinclude:include href="common.xml#xptr(a/b)"/> </foo> |
including a node from common.xml:
<a xmlns:x="uri2"> <b> <x:a/> </b> </a> |
results in a document that could be serialized as:
<foo xmlns:x="uri1"> <b xmlns:x="uri2"> <x:a/> </b> </foo> |
This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.
Serialization, and specifically where additional namespace declarations might appear, is not constrained by this specification.
Issue (XInclude-52-infoset-properties): We specifically say that the namespace name property of an element is preserved when the infosets are merged. What about the in-scope namespaces property? This seems to be needed so that qnames in the included nodes can be resolved. What about the "declared namespaces" property? More generally, should there be a list of infoset properties that must be preserved or deleted?
Issue (XInclude-63-accidental-scoping): If we preserve the in-scope namespaces property, we may encounter the situation where an element has fewer in-scope namespaces than its parent. There is no syntax for "undeclaring" namespaces. If a result infoset is serialized and then reparsed, it will not be identical to the original result infoset. On the other hand, it is unlikely (impossible) that any of the extra in-scope namespaces will actually be referred to within the included context. Are there any situations where this information is harmful?
The base URI property of the acquired infoset is not changed as
a result of merging the infoset, and remains unchanged after merging.
Thus relative URI references in the included infoset resolve to the same
URI despite being included into a document with a potentially
different base URI in effect. A serialized result infoset may
need to add xml:base
attributes to indicate this fact.
[Definition: ] An include start marker information item marks the start of a set of information items resulting from an inclusion.
An include start marker information item has the following properties:
[Definition: ] An include end marker information item marks the end of a set of information items resulting from an inclusion.
An include end marker information item has the same properties as an include start marker. The values of these properties is the same as those of the corresonding include start marker.
XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude. For convenience the prefix "xinclude" is used within this specification to indicate this namespace URI.
The XInclude namespace contains a single element,
xinclude:include
, which serves as the
include element.
This element has the following attributes:
Issue (XInclude-64-no-parse-attribute): http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2000Aug/0000.html: If xinclude:parse is optional, please be more explicit about what the default value is (now it's only specified in the DTD fragment).Suggested resolution: Add 'When omitted, the value of "xml" is implied (even in the absence of a default value declaration). Values other than "xml" and "parse" are errors.'
Issue (XInclude-65-id-attribute): As part of the return to element syntax, I re-introduced the ID attribute. Is this OK?
Attributes from other namespaces may be placed on the
xinclude:include
element. Unqualified attribute
names are reserved for future versions of this specification.
The content of the xinclude:include
element is
not defined by this specification.
The following DTD fragment illustrates a sample declaration for
the xinclude:include
element:
<!ELEMENT include EMPTY> <!ATTLIST xinclude:include xmlns:xinclude "http://www.w3.org/1999/XML/xinclude" #FIXED href CDATA #REQUIRED parse (xml|text) "xml" id ID #IMPLIED > |
Issue (XInclude-31-which-namespace): The authors suggest that the xml:
namespace should be
the namespace of the include element. The use of the xml: namespace
allows all xml documents to reference the inclusion mechanism without
requiring additional namespace declarations to support inclusion. As
inclusion is useful to most or all xml vocabularies, we suggest that
it is reasonable to add to the xml: namespace.
Issue (XInclude-71-versioning): What is our versioning strategy? Do we need features now, for example a version attribute, to enable XInclude 2.0?
An element information item is XInclude-conformant if it meets the syntactic requirements for include elements defined in this specification. This specification imposes no particular constraints on DTDs; conformance applies only to elements and attributes.
An application conforms to XInclude if it:
supports XML 1.0, XML Namespaces, the XML Information Set, and XML Base
observes the mandatory conditions (must) set forth in this specification, and for any optional conditions (should and may) it chooses to observe, observes them in the way prescribed
performs markup conformance testing according to all the conformance constraints appearing in this specification.
The following XML document contains an include element which points to an external document.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120 Mz is adequate for an average home user.</p> <xinclude:include href="disclaimer.xml"/> </document> |
disclaimer.xml contains:
<?xml version='1.0'?> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120 Mz is adequate for an average home user.</p> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> </document> |
The following illustrates the results of including a range specified by an XPointer.
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <xinclude:include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude" href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to string-range(chapter/p[2]/i,'3.',0,11))"/> </quotation> </document> |
source.xml contains:
<chapter> <p>Sentence 1. Sentence 2.</p> <p><i>Sentence 3. Sentence 4.</i> Sentence 5.</p> </chapter> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <p>Sentence 2.</p> <p><i>Sentence 3.</i></p> </quotation> </document> |
The following XML document includes a "working example" into a document.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><xinclude:include href="data.xml" parse="text"/></example> </document> |
data.xml contains:
<?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data></example> </document> |
A tabulation of open issues flagged above follows: