Known Issues with Canonical XML 1.0 (C14N/1.0)

1. Overview

Section 2.4 of the Canonical XML 1.0 [C14N10] Specification defines special treatment for attributes in the XML namespace when a representation of a document subset is generated. The processing specified assumes that attributes in the XML namespace are inherited by copying them from the nearest ancestor. The inheritance rule given is appropriate for the processing of the xml:space and xml:lang attributes, but not for xml:base, which needs a special inheritance mechanism, or for xml:id, which should not be inherited at all. [XML-BASE-Problem].

Related problems exist in the Decryption Transform for XML Signature [XMLENCDEC] W3C Recommendation, which applies a modified C14N/1.0 algorithm and adds additional rules concerning the copying of attributes in the xml namespace. These rules are based on the same assumptions as their counterparts in C14N/1.0.

2. Interaction with XML Base

The XML Base Recommendation [XMLBASE] defines the base URI of an element as the value of the element's xml:base attribute, the base URI of the element's parent element within the document or external entity, or the base URI of the document entity or external entity containing the element. In particular, the meaning of relative URI references in an xml:base attribute can depend on the chain of xml:base attributes along an element's ancestor axis.

The canonicalization of xml:base requires a more specific algorithm than just copying or inheriting the values of preceding xml:base attributes. The following cases must be taken into account:

xml:base values may consist of only a fragment identifier (this is a no-op)
xml:base values may be empty (this is a no-op)
xml:base values may be absolute or relative URI references

2.1 Inheriting xml:base values

Depending on the input node set to canonical xml, one can either canonicalize a whole document or a subset of the document's nodes. For example, in [XMLDSIG], one can use either XPointer to dereference only parts of a document or XPath Filter and XPath Filter 2.0 transforms to refer to a given fragment of the document that one wants to sign.

Consider the following XML document (document 1):

<?xml version="1.0"?>
<a xml:lang="en"> 
  <b xml:base="http://www.example.org/pathseg1/" xml:lang="de">
	<c>
	</c>
  </b>
</a>

Figure 1: Sample XML document 1

We now canonicalize document 1 with the input nodeset of c14n being the element <c>. The element nodes along <c>'s ancestor axis are examined for the first occurence of any xml namespace axis, and these are then merged into the attribute list of <c>.

<?xml version="1.0"?>
	<c xml:base="http://www.example.org/pathseg1/" xml:lang="de">
	</c>

Figure 2: Canonical form of sample XML document 1

The xml:base attribute on the <c/> element in the canonicalized node-set indeed contains the base URI of the <c/> element as present in document 1.

Up to now, there have been no problems with the simple duplication of xml:base for maintaining the inheritance. However, this is not always possible. Let's now consider the following XML document (document 2):

<?xml version="1.0"?>
<a xml:base="http://www.example.org/pathseg1/" xml:lang="en"> 
  <b xml:base="../pathsegA/" xml:lang="de" >
	<c>
	</c>
  </b>
</a>

Figure 3: Sample XML document 2

We now canonicalize document 2, the input nodeset of c14n being the element <c>

<?xml version="1.0"?>
	<c xml:base="../pathsegA/" xml:lang="de">
	</c>

Figure 4: Canonical form of sample XML document 2

In the case of xml:lang, copying the parent's attributes allowed to retain the context. In the case of xml:base, we have lost the context of how to resolve the relative URI reference. Thus, for a given node-set, the application of the C14N/1.0 inheritance rule can lead to xml:base attributes which specify a base URI that is different from the one in the original document context.

2.2 Special values of xml:base

C14N/1.0 also has issues in that it doesn't know how to process xml:base attributes that have no value or have values that are a same-document (section 4.2 [RFC 2396]) reference. As indicated by Roy Fielding and Richard Tobin these should be treated as do nothing or no operation (noop) in xml:base.

Consider the following document located at (file:///tmp/doc.xml):

<?xml version="1.0"?>
<a xml:base="http://www.example.org/pathseg1/"> 
  <b xml:base="file.ext" xml:lang="de">
	<c xml:base="" >
	  <d xml:base="" href="file.ext#some-id1">
	  </d>
	  <e xml:base="#some-fragment" href="file.ext#some-id2">
	  </e>
	</c>
  </b>
</a>

Figure 5: Sample XML document 3

We now canonicalize document 3 with the input nodeset of C14N/1.0 being the element <c> and all its descendants:

<?xml version="1.0"?>
	<c xml:base="">
	  <d xml:base="" href="#some-id1">
	  </d>
	  <e xml:base="#some-fragment" href="#some-id2">
	  </e>
	</c>

Figure 6: Incorrect canonical form of sample XML document 3

As there already exists an xml:base="" attribute in <c>, C14N/1.0 rules won't let <c> inherit xml:base="http://www.example.org/pathseg1/file.ext".

Let's now consider the case that the node that has xml:base="" is in the input-nodeset and that xml:base="" is considered as a no operation (noop). According to the C14N/1.0 rules, we would need to copy the ancestor's value that is not in the input-nodeset. However, this would not suffice.

The inheritance rules of the XML Base Recommendation [XMLBASE, section 4] allows for succesive use of relative references. Also, such sucessive relative references may not be in the input node set and hence not rendered. So an inheritance rule for xml:base would have to combine xml:base="" with its omitted ancestors xml:base values. However this is not stated.

A correct canonicalization of element <c> and all its descendants that preserves the base URI from the original context would be as follows:

<?xml version="1.0"?>
	<c xml:base="http://www.example.org/pathseg1/file.ext" >
	  <d href="file.ext#some-id1">
	  </d>
	  <e href="file.ext#some-id2">
	  </e>
	</c>

Figure 7: Correct canonical form of sample XML document 3

3. Interaction with XML Id

The xml:id [XMLID] attribute is part of the XML information Set [XMLINFOSET]. It allows to associate any XML element with a unique identifier. Therefore, the value of a given xml:id attribute is unique within an XML document. The xml:id Recommendation was issued after Canonical XML 1.0 had become a Recommendation.

The recommended C14N/1.0 processing behavior that requires inheritance of attributes by copying them from the nearest ancestor can produce badly-formed documents with respect to the xml:id recommendation. Consider the following fragment of an XML document:

	<a xml:id="id_a">
	   <b />
	   <c />
       	</a>

If we select the children of node <a> and apply the C14N/1.0 processing rules, both node  and <c> would obtain a copy of <a>'s xml:id attribute. This produces a badly-formed XML document as two xml:id attributes have the same value:

	   <b xml:id="id_a" />
	   <c xml:id="id_a" />

Note that even if only element inherited the xml:id attribute, the result would still be wrong - the xml:id attribute value would be assigned to the wrong element. For example, let's now select node . The C14N/1.0 processing would assign node <a>'s xml:id attribute value to node :

	   <b xml:id="id_a" />

Therefore, C14N/1.0 cannot be applied to documents containing xml:id attributes. Inheritance of any xml:id attributes would produce a wrong or a badly-formed document.

4. Implicit use of Canonical XML 1.0 by XML Signature

XML Signature [XMLDSIG] identifies the canonicalization method by an URI inside <ds:CanonicalizationMethod> on a <ds:SignedInfo> level. More importantly, the same is needed on the data object or <ds:Manifest> level by using a <ds:Transform> inside a <ds:Reference>. In the latter case, if no such <ds:Transform> is given on the data object level, and if a node-set is subject to a transformation that requires an octet stream or is to be hashed using the message digest, the XML Signature Reference Processing Model uses Canonical XML C14N/1.0 implicitly to convert a node-set into an octet stream.

If applications require processing according to a particular version of Canonical XML, then they should explicitly give the appropriate algorithm URI. Specifically, the following cases must be taken into account:

insert an explicit <ds:Transform> invoking a new version of Canonical XML before each <ds:Transform> that requires an octet stream as input, but is applied to a node-set
if the previous transform outputs a note-set, append a <ds:Transform> invoking a new version of Canonical XML as the last <ds:Transform> before the digest input.
use this URI inside <ds:CanonicalizationMethod>

Such an approach, however, will increase the size and the complexity of XML digital signatures. Future versions of XML Signature [XMLDSIG] should consider the use of <ds:CanonicalizationMethod> to specify a default node-set to octet stream conversion method for the XML Signature Reference Processing Model.

One should also note that a lot of care will have to be taken on future signature creation as all transforms (including the digest) that require an octet stream as input but are applied to a node-set will need to have such a revised version of Canonical XML as <ds:Transform> before it is input.

For further information, please refer to the companion note, "XML Digital Signatures in the 2006 XML Environment [XMLDSIG2006], which describes with more detail how a revised canonicalization algorithm (C14N/1.1 or other) may be used with the current XML-SIG/1.0 Specification.

5. Further considerations for C14N/1.1

5.1 xml:base and URI reference simplification

Inheritance rules will also have to be able to deal with relative references having "./" and "../" segments apearing in the values for xml:base.

According to the rules laid down in the XML Base Recommendation [XMLBASE, Section 4], relative references are resolved against the xml:base attribute of the element or element's ancestor. This implies that relative references are absolutized and normalized as specified in [RFC 2396, Section 5.2].

This operation can only be performed from the outermost to the innermost relative reference. Thus, there is no value in keeping dot and dot-dot-segments when fixing up relative reference values of xml:base when defining an inheritance rule for canonicalizing xml:base attributes.

Some special considerations are needed. When normalizing a relative URI reference, it is crucial to keep the leading "../" segments of relative-path references. Otherwise, path-segments of ancestors' xml:base URIs may not be removed appropriately. Another issue is that one could create erroneous output that looks similar to that of a network-path reference when normalizing an absolute-path reference. For instance, an incorrect normalization of "seg/.././/pseudo-networkpath/seg/file.ext" would be //pseudo-netpath/seg/file.ext.

Note: [RFC 3986, Section 4.2] defines the terms relative-path, network-path and absolute-path reference as used in this document.

The removal of dot-segments cause more logically equivalent documents to produce the same canonicalized output. Furthermore, XML Signatures [XMLDSIG] will benefit from such normalization as the likelyhood of false negatives on signature validation decreases.

5.2 An XML infoset strategy for canonicalizing XML base

As stated earlier in this note, the rules for the inheritance of xml:base require many considerations. Another more straight-forward approach would be to use a strategy based on the XML infoset [C14N-INFOSET], namely:

Use the name EII for an element information item to be canonicalized, and EIIC for the element information item corresponding to EII in the result of parsing the canonical serialization of the node-set containing EII.
Synthesize an xml:base attribute for EII iff the EIIC's [base URI] would otherwise be different from EII's [base URI].

This has the advantage that not only does it correctly produce

<a xml:base="http://example.org">
       <c xml:base="test/" />
</a>

from

<a xml:base="http://example.org">
   <b xml:base="test/ ">
       <c/>
   </b>
</a>

when ...</b> is filtered out, but it will also correctly produce

<a xml:base="http://example.org">
       <c xml:base="http://example.org/test/test/" />
</a>

from

<a xml:base="http://example.org">
   <b xml:base="test/">
       <c xml:base="test/" />
   </b>
</a>

when ... is filtered out.

But we can't say it that way, because C14N as written does not use the infoset. Cannonical XML is currently defined on the XPath data model.

6. References

[C14N10]: Canonical XML Version 1.0, J. Boyer. W3C Recommendation, 15 March 2001, http://www.w3.org/TR/xml-c14n (Errata).
[C14N-INFOSET]: An infoset-based strategy for canonicalizing xml:base, H. S. Thompson. XML-CORE Public Mailing list, 6 March 2006, http://lists.w3.org/Archives/Public/public-xml-core-wg/2006Mar/0005.html
[RFC2396]: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee MIT/LCS, R. Fielding U.C. Irvine, L. Masinter Xerox Corporation, August 1998 http://www.ietf.org/rfc/rfc2396.txt.
[RFC3986]: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee W3C/MIT, R. Fielding Day Software, L. Masinter Adobe Systems, January 2005 http://www.ietf.org/rfc/rfc3986.txt.
[XMLBASE]: XML Base , J. Marsh. W3C Recommendation, 27 June 2001, http://www.w3.org/TR/xmlbase/.
[XMLDSIG2006]: Using XML Digital Signatures in the 2006 XML Environment, T. Roessler. W3C Draft Working Group Note, 10 August 2006, http://www.w3.org/TR/2006/WD-DSIG-usage-20060915/.
[XMLENCDEC]: Decryption Transform for XML Signature, M. Hughes, T. Imamura, H. Maruyama. W3C Recommendation, 10 December 2002, http://www.w3.org/TR/2002/REC-xmlenc-decrypt-20021210.
[XMLID]: xml:id Version 1.0 , J. Marsh, D. Veillard, N. Walsh. W3C Recommendation,9 September 2005, http://www.w3.org/TR/xml-id/.
[XMLINFOSET]: XML Information Set (Second Edition) , J. Cowan and R. Tobin, editors W3C Recommendation, 4 February 2004, http://www.w3.org/TR/xml-infoset/.
[XMLDSIG]: XML-Signature Syntax and Processing, D. Eastlake, J. R., D. Solo, M. Bartel, J. Boyer , B. Fox , E. Simon. W3C Recommendation, 12 February 2002, http://www.w3.org/TR/xmldsig-core/.

7. Acknowledgments

This note is based on based on input from John Boyer, Roy Fielding, Larry Masinter, Thomas Roessler, the members of the XML Core Working Group, and the members of the xml-dsig mailing list.

Known Issues with Canonical XML 1.0 (C14N/1.0)

W3C Working Draft 15 September 2006

Abstract

Status of this Document

Short Table of Contents

Table of Contents

Appendix