XML processor profiles

1 Background

The XML specification [Extensible Markup Language (XML) 1.0 (Fifth Edition)] defines an XML processor as "a software module. . .used to read XML documents and provide access to their content and structure. . .on behalf of another module, called the application." XML applications are often defined by building on top of the [XML Information Set (Second Edition)] or other similar XML data models such as [XML Path Language (XPath) Version 1.0] or [XQuery 1.0 and XPath 2.0 Data Model (XDM)] , understood as the output of an XML processor. Such definitions have suffered to some extent from an uncertainty inherent in using that kind of foundation, in that the ~~mappingXML~~ mapping XML processors perform from XML documents to data model is not rigid. Some of this stems from the XML specification itself, which leaves open the possiblity of reading and interpreting external entities, or not. Some stems from the growth of the XML family of specifications: if the input document includes uses of XInclude, for instance.

This specification addresses this issue by defining several XML processor profiles, each of which fully determines a data model for any given XML document. It is intended as a resource for other specifications, which can by a single normative reference establish precisely what input processing they require.

The profiles defined here are appropriate for processing both XML 1.0 [Extensible Markup Language (XML) 1.0 (Fifth Edition)] and XML 1.1 [Extensible Markup Language (XML) 1.1 (Second Edition)] documents. References to XML or XML Namespaces below should be understood as references to 1.0 or 1.1 as required by the relevant document or application.

1.1 Terminology

[ Definition :The key words must ,must not ,required ,shall ,shall not ,should ,should not ,recommended ,may ,and optional in this specification are to be interpreted as described in [RFC 2119] .]

The term base URI is used in this specification as it is defined in [RFC 3986] .

2 The minimum XML processor profile profiles

~~The minimum approach to the construction~~ All of the profiles describe the steps necessary to construct a data model from a well-formed and namespace well-formed XML ~~document~~ document. This specification does not consider documents that are not namespace well-formed. Documents which are not well-formed are not XML.

2.1 The minimum XML processor profile

The minimum approach to the construction of a data model requires the following:

Processing of the document as required of conformant non-validating XML processors ~~while~~ without reading ~~all~~ any external markup declarations ;
Maintenance of the base URI ~~property~~ of each element in conformance with [XML Base] ;

3 2.2 The basic XML processor profile

The basic ~~recommended~~ approach to the construction of a data model ~~from a well-formed~~ requires the following:

Processing of the document as required of conformant non-validating XML processors ~~and namespace well-formed~~ without reading any external markup declarations ;
Maintenance of the base URI of each element in conformance with [XML Base] ;
Identification of all xml:id attributes as IDs as required by [xml:id Version 1.0]

2.3 The modest XML processor profile

The modest approach to the construction of a data model requires the following:

Processing of the document as required of conformant non-validating XML processors while reading all external markup declarations ;
Maintenance of the base URI of each element in conformance with [XML Base] ;
Identification of all xml:id attributes as IDs as required by [xml:id Version 1.0]

2.4 The recommended XML processor profile

The recommended approach to the construction of a data model requires the following:

Processing of the document as required of conformant non-validating XML processors while reading and processing all external markup declarations ;
Maintenance of the base URI ~~property~~ of each element in conformance with [XML Base] ;
Identification of all xml:id attributes as IDs as required by [xml:id Version 1.0]
Replacement of all include elements in the XInclude namespace, and namespace, xml:base and xml:lang fixup of the result, as required for conformance to [XML Inclusions (XInclude) Version 1.0 (Second Edition)] .

The following [XProc: An XML Pipeline Language] ~~pipeline,~~ pipeline implements the 2.4 The recommended XML processor profile when ~~implemented~~ executed by a conformant XProc processor which ~~processes~~

Processes its input as required by point (1) ~~above, implements~~ above;
Recognizes and preserves the ~~default process:~~ ID type of all xml:id attributes in conformance with [xml:id Version 1.0] .

~~<p:pipeline xmlns:p="http://www.w3.org/ns/xproc">~~

Example: XProc pipeline which implements the recommended processor profile

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc">
  <p:xinclude fixup-xml-base="true" fixup-xml-lang="true"/>
</p:pipeline>

3 Invariants

Data models constructed in conformance with one of the profiles defined above will be guaranteed to share certain properties. The following sub-sections describe this in terms of invariants with respect to the information available in the data model.

3.1 Data model invariants within a given profile

Any two data models which are both constructed in conformance with the same profile from a given namespace-well-formed XML document will have exactly the same information with repect to the following information items and properties (per [XML Information Set (Second Edition)] ):

The Document Information Item: [document element], [base URI], [character encoding scheme], [standalone], [version], [all declarations processed]
Element Information Items: [namespace name], [local name], [prefix], [children], [attributes], [namespace attributes], [in-scope namespaces], [base URI], [parent]
Attribute Information Items: [namespace name], [local name], [prefix], [normalized value], [specified], [attribute type], [references], [owner element]
Processing Instruction Information Items: [target], [content], [base URI], [notation], [parent]
Unexpanded Entity Reference Information Items: [name], [system identifier], [public identifier], [declaration base URI], [parent]— This type of information item will not occur at all if using 2.3 The modest XML processor profile or 2.4 The recommended XML processor profile profiles, or if standalone="yes"
Character Information Items: [character code], [parent]
Comment Information Items: [content], [parent]
Namespace Information Items: [prefix], [namespace name]

3.1.1 Underspecified information

Whether the remaining information is present or, if present only partially, whether it is the same, depends on implementation-dependent properties, so no invariant can be guaranteed :

The Document Information Item: [children], [notations], [unparsed entities]
Character Information Items: [element content whitespace]
The Document Type Declaration Information Item: entirely or partially
Unparsed Entity Information Items: entirely or partially
Notation Information Items: entirely or partially

3.2 Data model variation between profiles

When two data models are constructed in conformance with the two different profiles from a given namespace-well-formed XML document, the information contained therein will in some cases (depending on the specifics of the document in question) differ with repect to the following information items and properties (per [XML Information Set (Second Edition)] ) (leaving aside the items and properties identified as implementation-defined above):

3.2.1 Between minimum and richer profiles

Attribute Information Items: [attribute type], [references]— These properties may vary for xml:id attributes

And all the differences listed in the next two sections.

3.2.2 Between basic and richer profiles

Element Information Items: Entirely, in that where a basic processor reports an Unexpanded Entity Reference, richer ones will report the entity expansion, which may be or include entire elements.
Attribute Information Items: Entirely, for the same reason, or, just with respect to [normalized value], [specified], [attribute type] and [references] where a basic processor has not processed the relevant declaration, but a richer one has.
Processing Instruction Information Items: Entirely, per the Element case above
Unexpanded Entity Reference Information Items: Entirely, in the opposite sense to the Element case above
Character Information Items: Entirely, per the Element case above
Comment Information Items: Entirely, per the Element case above
Namespace Information Items: Entirely, per the Element case above

And all the differences listed in the next section.

3.2.3 Between modest and recommended profiles

Element Information Items: Entirely, in that where a modest processor reports an xinclude Element, a recommend Processor will report the result of XInclude processing, which may be or include entire elements.
Attribute Information Items: Entirely, for the same reason
Processing Instruction Information Items: Entirely, for the same reason
Character Information Items: Entirely, for the same reason
Comment Information Items: Entirely, for the same reason
Namespace Information Items: Entirely, for the same reason

4 Other profiles (non-normative)

The profiles defined here, particularly the 3 2.4 The ~~basic~~ recommended XML processor profile , can be used as a starting point for the definition of further profiles. For example, the media type registrations for stylesheet languages applicable to XML such as text/xsl application/xslt+xml or text/css might define a profile specifying appropriate <?xml-stylesheet type="[their media type]" . . .?> processing in addition to the processing required by 3 2.4 The ~~basic~~ recommended XML processor profile .

5 Conformance

Conformance is a matter for any specification which references this one to mandate, expressed in terms such as "Conforming implementations must construct input data models from XML documents as required by the ~~basic~~ recommended XML processor profile ."

A References

A.1 Normative References

XML Information Set (Second Edition): XML Information Set (Second Edition) ,John Cowan and Richard Tobin, Editors. World Wide Web Consortium, 04 Feb 2004. This version is http://www.w3.org/TR/2004/REC-xml-infoset-20040204/. The latest version is available at http://www.w3.org/TR/xml-infoset/.
RFC 2119: RFC 2119: Key words for use in RFCs to Indicate Requirement Levels .Internet Engineering Task Force, 1997.
RFC 3986: RFC 3986: Uniform Resource Identifier (URI): Generic Syntax .Internet Engineering Task Force, 2005.
XProc: An XML Pipeline Language: XProc: An XML Pipeline Language , Norman Walsh, Alex Milowski, and Henry S. Thompson, Editors. World Wide Web Consortium, 9 March 2010. This version is http://www.w3.org/TR/2010/REC-xproc-20100309/. The latest version is available at http://www.w3.org/TR/xproc/.
xml:id Version 1.0: xml:id Version 1.0 , Norman Walsh, Daniel Veillard, and Jonathan Marsh, Editors. World Wide Web Consortium, 09 Sep 2005. This version is http://www.w3.org/TR/2005/REC-xml-id-20050909/. The latest version is available at http://www.w3.org/TR/xml-id/.
XML Inclusions (XInclude) Version 1.0 (Second Edition): XML Inclusions (XInclude) Version 1.0 (Second Edition) , David Orchard, Jonathan Marsh, and Daniel Veillard, Editors. World Wide Web Consortium, 15 Nov 2006. This version is http://www.w3.org/TR/2006/REC-xinclude-20061115/. The latest version is available at http://www.w3.org/TR/xinclude/.
Extensible Markup Language (XML) 1.0 (Fifth Edition): Extensible Markup Language (XML) 1.0 (Fifth Edition) , Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. , Editors. World Wide Web Consortium, 28 Nov 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126/. The latest version is available at http://www.w3.org/TR/xml/.
Extensible Markup Language (XML) 1.1 (Second Edition): Extensible Markup Language (XML) 1.1 (Second Edition) , Tim Bray, John Cowan, Jean Paoli, et. al. , Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml11-20060816/. The latest version is available at http://www.w3.org/TR/xml11/.
Namespaces in XML 1.0 ~~(Second~~ (Third Edition): Namespaces in XML 1.0 ~~(Second~~ ( Third Edition) , Tim Bray, Dave Hollander, Richard Tobin, and Andrew Layman, Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml-names-20060816/. The latest version is available at http://www.w3.org/TR/xml-names/.
Namespaces in XML 1.1 (Second Edition): Namespaces in XML 1.1 (Second Edition) , Tim Bray, Dave Hollander, Andrew Layman, and Richard Tobin, Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml-names11-20060816/. The latest version is available at http://www.w3.org/TR/xml-names11/.
XML Base: XML Base (Second Edition) ,Jonathan Marsh, Editor. World Wide Web Consortium, 28 January 2009. This version is http://www.w3.org/TR/2001/REC-xmlbase-20090128/. The latest version is available at http://www.w3.org/TR/xmlbase/.

A.2 Non-normative References

XML Path Language (XPath) Version 1.0: XML Path Language (XPath) Version 1.0 ,James Clark and Steven DeRose, Editors. World Wide Web Consortium, 16 Nov 1999. This version is http://www.w3.org/TR/1999/REC-xpath-19991116/. The latest version is available at http://www.w3.org/TR/xpath/.