Copyright © 2000 W3C ® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process uses an XML-friendly syntax (elements, attributes, URI References). The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents.
The XML Core Working Group, with this 2000 March 14 XInclude working draft, invites comment on this specification.
The W3C Membership and other interested parties are invited to review the specification, provide comment, and report early implementation experience. The area of work covered by this specification was outlined in the XML Inclusion Proposal (XInclude), W3C Note of 23 November 1999. The purpose of publishing this draft is to update the community on our progress in this area and to solicit feedback on the current draft. It should be noted that the WG plans to take this specification to a Last Call review in the near future.
While the WG has decided to publish this working draft, outstanding issues remain as noted in the draft.
Comments on this document should be sent to www-xml-xinclude-comments@w3.org, which is publicly archived. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.
It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.
Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This proposal introduces a generic mechanism for merging XML documents (as represented by their information sets) The syntax leverages existing XML constructs - elements, attributes, and URI references.
XInclude differs
from the linking features described in the XML Linking Language
[XLink], specifically links with the attribute value show="embed"
. Such links provide a media-type independent syntax for indicating
that a resource is to be embedded graphically within the display of the
document. XLink does not specify a specific processing model, but simply
facilitates the detection of links and recognition of associated
metadata by a higher level application.
XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.
Simple node inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.
There are a number of differences between XInclude and XML external entities [XML] which make them complimentary technologies.
Processing of external entities (as with the rest of DTDs) occurs at parse time. XInclude operates on information sets and thus is orthogonal to parsing.
Declaration of external entities requires a DTD or internal subset. This places a set of dependencies on inclusion, for instance, the syntax for the DOCTYPE declaration requires that the document element be named - clearly orthogonal to inclusion in many cases. Validating parsers must have a complete content model defined. XInclude is orthogonal to validation and the name of the document element.
External entities provide a level of indirection - the external entity must be declared and named, and separately invoked. XInclude uses direct referencecs. Applications which generate XML output incrementally can benefit from not having to pre-declare inclusions.
The syntax for an internal subset is cumbersome to many authors of simple well-formed XML documents. XInclude syntax is based on familiar XML constructs.
Note also that XInclude together with XPointer [XPointer] can replace certain forms of internal entities, although XInclude syntax is not optimized for this purpose.
Special purpose inclusion mechanisms have been introduced into specific XML grammars. XInclude provides a generic mechanism for recognizing and processing inclusions, and as such can offer a simpler overall authoring experience, greater performance, and less code redundancy.
The following requirements have been used in the design of XInclude:
The inclusion mechanism syntax shall be specifiable in XML element, XML attribute and URI reference syntax
The inclusion mechanism shall be independent of XML validation.
The inclusion mechanism shall not require a DTD or internal subset.
The results of inclusion -- that is the process of merging an infoset with another infoset -- shall be provided for use by an application.
The result of an inclusion shall accommodate XML 1.0 and XML Namespaces.
It shall be possible to resolve relative URI references in an included document (for instance nested includes, XLinks, and stylesheet PIs).
It may be possible to replace many instances of entities with the inclusion mechanism
Inclusion as defined in this document is a specific type of infoset transformation. A source infoset is transformed into a result infoset using the processing model specified in this document.
The infosets used or created by an XInclude processor support all required information items and properties as specified in the XML Infoset [XML Infoset], and may support any optional properties as well. In addition, XInclude requires the Base URI property to be surfaced on information items. This property is optional in the XML Infoset.
The input for the inclusion
transformation consists of a source infoset. The output is a new infoset
which merges the source infoset with the infosets of resources identified
by URI references appearing in xinclude:include
elements.
Thus a mechanism to resolve URLs and return the identified resources as
infosets is assumed. There is no attempt to preserve information in the
result infoset indicating where inclusion has been performed - for this
information the original infoset must be examined.
The existence of an include is asserted by an include element.
When performing inclusions, an
XInclude processor identifies an xinclude:include
element
in the source infoset and acquires the resource specified. The information
set for the resource is created and merged with the source infoset.
This process is repeated until all xinclude:include
elements
have been processed. The order in which include elements are processed
is not defined by this specification. Intra-document references within
include elements must be resolved against the original infoset, instead
of resolving them against some intermediate state.
In the following
example, the infoset representing something.xml
will appear
twice.
<x xmlns:xinclude="..."> <xinclude:include href="something.xml"/> <xinclude:include href="#xpointer(x/xinclude:include[1])"/> </x> |
The value of the href
attribute on an xinclude:include
is combined with the base
URI of the xinclude:include
element as specified in XML
Base [XMLBase]. The resource identified by the
full URI reference is acquired and an infoset created, either by parsing
the resource as xml, or by converting it into an infoset consisting of
a single text information item. This latter behavior allows the
inclusion of "working examples" into explanatory text. Which of the two
methods for creating an infoset is to be used is determined by the
parse
attribute, which may take the values "xml", "text",
or "cdata".
Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.
Resources that are unavailable for any reason result
in an error. Resources that resolve to non-well-formed XML given the
parse="xml"
option result in an error. Resources that
resolve to something other than text when parse="text"
or parse="cdata"
is specified result in an error.
Any xinclude:include
elements in this infoset are recursively
processed.
Issue (XInclude:03-nesting-optimization): The proposal implies that the destination documents are knitted before inclusion, which we agree is the right behaviour, but we need some way to optimise this (including an element which doesn't have any links in it, but is in a document which does, should not require following all the links in that document). [Richard Tobin]
When processing
nested xinclude:include
elements with parse="xml"
, it is an error to include a resource that contains an
xinclude:include
containing a URI reference that has already
been processed in the inclusion chain.
In other words, the following are all legal inclusion:
An inclusion with parse="text"
or parse="cdata"
may reference itself.
An inclusion may identify a different part of the same local resource.
Two non-nested inclusions may identify a resource which itself contains a legal inclusion.
The following are illegal inclusions:
An inclusion of the xinclude:include
element
itself or any ancestor thereof.
An inclusion of any xinclude:include
element
or ancestor thereof which has already been processed by a higher-level
inclusion.
An XInclude processor is by definition aware of XML Namespaces [XML Names], and performs namespace processing as described in the Infoset WD. The namespace URI is thus considered part of the element information item, and merging the infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML sources. A serialized result infoset may thus contain additional namespace declarations when including a sub-resource.
For example, the following document:
<foo xmlns:x="uri1"> <xinclude:include href="common.xml#xptr(a/b)"/> </foo> |
including a node from common.xml:
<a xmlns:x="uri2"> <b> <x:a/> </b> </a> |
results in a document that could be serialized as:
<foo xmlns:x="uri1"> <b xmlns:x="uri2"> <x:a/> </b> </foo> |
This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.
Applications performing serialization of the result infoset are not constrained on where they place the namespace declarations, as long as the result preserves the namespaces of the included items.
The acquired infoset is merged with the source
infoset to create a new infoset by replacing the information items representing
the xinclude:include
elements with information items in
the acquired infoset. The xinclude:include
element,
its attributes and any children, are not represented in the result
infoset.
The base URI property of the acquired infoset is not changed as result of merging the infoset, so the base URI property remains unchanged after merging.
Issue (XInclude:02-base-uri-syntax): A reserialised document will lose the base URL information; do we need an [xinclude:base-url] attribute that can be added to any element? [Richard Tobin]
Issue (XInclude:36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.
An acquired infoset will often represent a complete
XML document. In this case the document information item does not appear
in the resulting infoset. The top-level children of the document information
item replace the xinclude:include
element, in the
order in which they appear in the acquired infoset. This applies to
comments, processing instructions, and the document element.
The XML declaration in the included document is ignored. The document type declaration information item in the included document is ignored.
Ed. note: Add example of ignorable and non-ignorable whitespace.
An xinclude:include
may identify a subresource that consists of more than a single
information item. In this case these information items replace the
information item representing xinclude:include in the order in which
they appear in the included document.
If the document element
in the source infoset is an xinclude:include
, it is an
error to attempt to replace it with more than a single element.
An href
with an XPointer may identify an attribute or
a collection of nodes containing an attribute. Attempting inclusion
of attributes results in an error.
Issue (XInclude:32-include-attributes): Currently, it is not possible to set the value of an attribute through an include mechanism. This make it difficult to generate XLinks for example. Should a mechanism be developed to include text as attribute values?
source: <x> <uri>theUri</uri> <link xmlns:xlink="..."> <xinclude:include href="#xpointer(x/uri/text())" as-an-attribute-named="xlink:href"/> </link> </x> result: <x> <uri>theUri</uri> <link xlink:href="theURI" xmlns:xlink="..."/> </x>
Issue (XInclude:12-ignore-attributes): Should attempted inclusion of attributes be ignored instead of generating an error?
An href with an XPointer may identify a location set that represents a ranges or a set of ranges. Information items within these ranges appear in the result tree.
[Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range. [Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.
A range is included by including in document order the set of information item selected or partially selected by the range. The children of selected information items are included. The children of partially selected information items are included if they in turn are either selected or partially selected.
A location set containing multiple ranges are included as if each range in the location set were included in order.
[Definition: ] An XInclude processor is a class of XML processor that conforms to all the behavior of the XML and XML Namespaces Recommendations, and additionally supports the inclusion behavior specified in this document. For purposes of this document, the term "XInclude processor" includes all the functionality of an "XML processor".
Note that a simple application-defined switch would be sufficient to flip between XML processors and XInclude processors.
An XInclude processor may expose the base URI of a document, element, or processing instruction information item. This enables applications which resolve URI References to process them correctly. Two examples where this is necessary are XLink, and the xml-stylesheet processing instruction.
Issue (XInclude:14-exposing-base-url): Should exposure of this information be required? It appears necessary for applications that wish to operate on URIs in the result.
XML 1.0 validation is not performed on the results of the inclusion, nor on the included elements. The include mechanism introduces the notion of infoset validation. After all inclusions are completed, an include processor will validate the infoset against the original document's DTD if it contains a doctype declaration.
NOTE: The DTD or Schema used for validation may need to be adjusted when running a particular document through an XML processor instead of an XInclude processor. A validating XInclude document is not necessarily a validating XML document, and vice versa.
Issue (XInclude:15-validation-relationship): I do not believe that XInclude should hard-code its relationship to schema validation. If I want to write an application that does inclusion and then validates the resulting document, I should be allowed to. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
Issue (XInclude:16-dtd-validation): Technically speaking, XInclude inclusion *cannot* occur before DTD validation. DTD validation is done by the XML processor: by definition it is accomplished before an information set is created. If you want DTD-syntax validation that works on information sets the you need to specify it yourself as the HyTime people did. SGML and XML just do not support it natively. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]From Ben Trafford: Couldn't you guys define a normative addition to the internal subset that would allow for XInclude validation, and then state than an XInclude-aware processor makes this addition to the infoset based on the parsing of the internal subset? Basically, a 'virtual internal subset'.
IDs and IDREFS intersection with the inclusion mechanism surfaces a few issues with respect to to XML and inclusion infoset validation.
If an attribute declares an ID that has already been declared, processing is the same as if duplicate IDs had been encountered in a single XML document. This condition would be discovered during infoset validation, after all inclusions are performed. For example, processing could be halted and an approprate error surfaced.
ID rewriting is a possiblity for inclusion of documents with nodes containing IDs. The following condition may occur: An including document contains an ID. The inclusion specified is to the subnode of a separate document. The separate document contains the same ID outside the scope of the inclusion, and the inclusion scope contains an IDREF to the ID. It is unclear whether this should be an error condition or not. It is conceivable that authors would design their modularity to use this aspect of IDs. It is also possible that IDs should be re-written to be local to the scope of the document.
This proposal suggests that ID rewriting should not be performed. In the previous use-case, the document will infoset validate if the infoset after inclusion contains an IDREF to an ID that is in the document.
Issue (XInclude:17-id-validation-redundant): ID validation is merely a schema validation issue and should not be separated out as its own "point." [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
The relationship between XInclude and othr XML standards is defined by the concept of a 'XInclude processor'. Such a processor leverages XML 1.0 and XML Namespaces in it's syntax, and uses the XML Infoset to describe a specific processing model. In general, XInclude processing should occur between the generation of an Infoset by a processor, and the consumption of that infoset by a higher-level application, so that the inclusion results are transparent to those applications.
Although XInclude may be implemented as an independent layer, it also may be implemented at a lower level with the same results, but with potentially greater performance.
The relationship between XInclude and DTD or XML Schema validation needs additional exploration (as noted by issues within this document). In particular DTD validation as defined in XML 1.0 does not support validation of the result infoset within this 'layered' strategy.
Issue (XInclude:27-schema): Are there any requirements in particular that the Schema WG has of XInclude? For instance, Schema has a facility for mapping included documents to the including document's namespace instead. We could provide this feature as well.
The syntax for specifying inclusion is an element similar to the simple links defined by XLink. XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude .
[Definition:
] The XInclude namespace contains a single element, the
include element, or xinclude:include
. This element
has the following attributes:
A URI Reference containing the address of the resource to include.
An enumeration specifying whether or not to include the resource as parsed XML or as text. A value of "xml" indicates that the resource should be parsed as XML and the infosets merged. A value of "text" indicates that the resource should be included as the contents of a text node. A value of "cdata" indicates that the resource should be included as the contents of a CDATA node or a sequence of CDATA nodes.
<!ELEMENT xinclude:include EMPTY> <!ATTLIST xinclude:include href CDATA #REQUIRED parse (xml|parse|cdata) #IMPLIED "xml" > |
Issue (XInclude:29-add-id-attribute): Should an id attribute be added to XInclude? If so, how is it given the ID datatype? [Paul Grosso in http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0290.html (W3C Members only)]
Issue (XInclude:30-allow-other-attributes): Should the permission to add non-XInclude attributes such as ID be made explicit? [John Cowen in http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0292.html (W3C Members only)]
Issue (XInclude:31-which-namespace): The authors suggest that thexml:
namespace should be the namespace of the include element. The use of the xml: namespace allows all xml documents to reference the inclusion mechanism without requiring additional namespace declarations to support inclusion. As inclusion is useful to most or all xml vocabularies, we suggest that it is reasonable to add to the xml: namespace. The authors do not suggest a mechanism for the W3C to determine the body that works on the specification of thexml:include
element.
Issue (XInclude:33-atribute-only-syntax): XInclude requires an XML element. This has implications for re-use in other vocabularies. It may be advantageous to have an attribute only syntax for XInclude to allow vocabularies the ability to create their own include elements. XLink, faced with a similar problem, chose to only support an attribute-based syntax.
An XInclude processor must support xml:base [XMLBase].
A non-validating XInclude processor will perform indistinguishably
from a non-validating XML processor on documents that do not contain
xinclude:include
elements.
A validating XInclude processor will perform indistinguishably
form a validating XML processor on documents that do not contain
xinclude:include
elements.
An XInclude processor shall process the xinclude:include
element according to the semantics given in this specification.
A validating XInclude processor will validate the infoset that is a result of the inclusion rather than the source document.
An XInclude processor must be able to merge documents in any mix of encodings that they would otherwise support in isolation.
The following XML document contains an xinclude:include element which points to an external document.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120Mz is adequate for an average home user.</p> <xinclude:include href="disclaimer.xml"/> </document> |
disclaimer.xml contains:
<?xml version='1.0'?> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120Mz is adequate for an average home user.</p> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> </document> |
The following illustrates the results of including a range specified by an XPointer.
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <xinclude:include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude" href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to string-range(chapter/p[2]/i,'3.',0,11))"/> </quotation> </document> |
source.xml contains:
<chapter> <p>Sentence 1. Sentence 2.</p> <p><i>Sentence 3. Sentence 4.</i> Sentence 5.</p> </chapter> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <p>Sentence 2.</p> <p><i>Sentence 3.</i></p> </quotation> </document> |
The following XML document link a working example into.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><xinclude:include href="data.xml" parse="cdata"/></example> <example><xinclude:include href="data.xml" parse="text"/></example> </document> |
data.xml contains:
<?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><![CDATA[<data> <item><![CDATA[Brooks & Sheilds]]]]><![CDATA[></item> </data>]]></example> <example><data> <item><![CDATA[Brooks & Sheilds]]></item> </data></example> </document> |
Note that CDATA notation can itself be escaped at the textual level by replacing occurances of "]]>" with "]]]]><![CDATA[>". At the DOM level, this may mean several CDATA nodes may result from an inclusion, instead of just one.
Issue (XInclude:34-cdata-breaking): The above implies one way to split a CDATA section into parts, but other ways exist, e.g. splitting ]-]> instead of ]]->. Do we want to mandate a specific split point?
Issue (XInclude:35-multiple-cdata-nodes): It is unclear whether this is necessary. The CDATA start and end markers can be inserted around the include, and since the resource is acquired as text, there isn't really any necessity to double escape these. In any case the normative description should be worded in terms of CDATA markers.
A tabulation of open issues flagged above follows:
Issue (XInclude:37-next-number): Dummy issue used to record the next unused issue number.