XML-binary Optimized Packaging

1 Introduction

This specification defines the XML-binary Optimized Packaging (XOP) convention, a means of more efficiently serializing XML Infosets (see [XML InfoSet]) that have certain types of content.

A XOP package is created by placing a serialization of the XML Infoset inside of an extensible packaging format (such a MIME Multipart/Related, see [RFC 2387]). Then, selected portions of its content that are base64-encoded binary data are re-encoded (i.e., the data is decoded from base64) and placed into the package. The locations of those selected portions are marked in the XML with a special element that links to the packaged data using URIs.

In a number of important XOP applications, binary data need never be encoded in base64 form. If the data to be included is already available as a binary octet stream, then either an application or other software acting on its behalf can directly copy that data into a XOP package, at the same time preparing suitable linking elements for use in the root part; when parsing a XOP package, the binary data can be made available directly to applications, or, if appropriate, the base64 binary character representation can be computed from the binary data.

However, at the conceptual level, this binary data can be thought of as being base64-encoded in the XML Document. As this conceptual form might be needed during some processing of the XML Document (e.g., for signing the XML document), it is necessary to have a one to one correspondence between XML Infosets and XOP Packages. Therefore, the conceptual representation of such binary data is as if it were base64-encoded, using the canonical lexical form of XML Schema base64Binary datatype (see [XML Schema Part 2] 3.2.16 base64Binary). In the reverse direction, XOP is capable of optimizing only base64-encoded Infoset data that is in the canonical lexical form.

As a result, only element content can be optimized; attributes, non-base64-compatible character data, and data not in the canonical representation of the base64Binary datatype cannot be successfully optimized by XOP.

The remainder of this specification is organized in the following fashion:

Section Two describes the XOP Infoset, which preserves the non-optimized content and structure of the original XML Infoset.
Section Three specifies XOP's processing model.
Section Four of this specification describes the form of the XOP Package.
Section Five describes how XOP Documents are identified.
Section Six explores the security considerations of using the XOP convention.

1.1 Terminology

This specification uses terminology from the XML Infoset (see [XML InfoSet]) when discussing XML content and structure. This is only a convention for clear specification of XOP's behavior.

The following terms are used in this specification:

Original XML Infoset - An XML Infoset (see [XML InfoSet]) to be optimized.
Optimized Content - Content which has been removed from the XML Infoset.
XOP Infoset - The Original Infoset with any Extracted Content removed and replaced by xop:Includeelement information items.
XOP Document - A serialization of the XOP Infoset using any W3C recommendation-level version of XML.
XOP Package - A package containing the XOP Document and any Extracted Content. As a whole, the XOP Package is an alternate serialization of the Original Infoset.
Reconstituted XML Infoset - An XML Infoset that has been constructed from the parts of a XOP Package.

1.2 Example

Example shows an XML Infoset prior to XOP processing. Example shows the same Infoset, serialized using the XOP format in a MIME Multipart/Related package. The base64-encoded content of the m:photo and m:sig elements have been replaced by a xop:Include element, while the binary octets have been serialized in separate MIME parts. Note that those examples uses [media-type] to identify the media type of the content of the m:photo and m:sig elements.

Example: XML Infoset prior to XOP processing

<soap:Envelope
    xmlns:soap='http://www.w3.org/2003/05/soap-envelope' 
    xmlns:xop='http://www.w3.org/2003/12/xop/include' 
    xmlns:xmlmime='http://www.w3.org/2004/06/xmlmime'>
  <soap:Body>
    <m:data xmlns:m='http://example.org/stuff'>
      <m:photo xmlmime:content-type='image/png'>
        /aWKKapGGyQ=
      </m:photo>
      <m:sig xmlmime:content-type='application/pkcs7-signature'>
        Faa7vROi2VQ=
      </m:sig>
    </m:data>
  </soap:Body>
</soap:Envelope>

Example: XML Infoset serialized as a XOP package

MIME-Version: 1.0
Content-Type: Multipart/Related;boundary=MIME_boundary;
	      type=text/xml;start="<mymessage.xml@example.org>"
Content-Description: An XML document with my picture and signature in it

--MIME_boundary
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-ID: <mymessage.xml@example.org>

<soap:Envelope
    xmlns:soap='http://www.w3.org/2003/05/soap-envelope'
    xmlns:xop='http://www.w3.org/2003/12/xop/include'
    xmlns:xmlmime='http://www.w3.org/2004/06/xmlmime'>
  <soap:Body>
    <m:data xmlns:m='http://example.org/stuff'>
      <m:photo xmlmime:content-type='image/png'>
        <xop:Include href='cid:http://example.org/me.png'/>
      </m:photo>
      <m:sig xmlmime:content-type='application/pkcs7-signature'>
        <xop:Include href='cid:http://example.org/my.hsh'/>
      </m:sig>
    </m:data>
  </soap:Body>
</soap:Envelope>

--MIME_boundary
Content-Type: image/png
Content-Transfer-Encoding: binary
Content-ID: <http://example.org/me.png>

// binary octets for png

--MIME_boundary
Content-Type: application/pkcs7-signature
Content-Transfer-Encoding: binary
Content-ID: <http://example.org/my.hsh>

// binary octets for signature

--MIME_boundary--

Example shows another serialization of the same XML Infoset, when it happens that the photo contained in the m:photo element is available on the web. Taking advantage of this, the serialization uses a Content-Location header instead of a Content-ID header to identify the MIME part containing the photo.

Example: XML Infoset serialized as a XOP package, using Content-Location

MIME-Version: 1.0
Content-Type: Multipart/Related;boundary=MIME_boundary;
	      type=text/xml;start="<mymessage.xml@example.org>"
Content-Description: An XML document with my picture and signature in it

--MIME_boundary
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-ID: <mymessage.xml@example.org>

<soap:Envelope
    xmlns:soap='http://www.w3.org/2003/05/soap-envelope'
    xmlns:xop='http://www.w3.org/2003/12/xop/include'
    xmlns:xmlmime='http://www.w3.org/2004/06/xmlmime'>
  <soap:Body>
    <m:data xmlns:m='http://example.org/stuff'>
      <m:photo xmlmime:content-type='image/png'>
        <xop:Include href='http://example.org/me.png'/>
      </m:photo>
      <m:sig xmlmime:content-type='application/pkcs7-signature'>
        <xop:Include href='cid:http://example.org/my.hsh'/>
      </m:sig>
    </m:data>
  </soap:Body>
</soap:Envelope>

--MIME_boundary
Content-Type: image/png
Content-Transfer-Encoding: binary
Content-Location: http://example.org/me.png

// binary octets for png

--MIME_boundary
Content-Type: application/pkcs7-signature
Content-Transfer-Encoding: binary
Content-ID: <http://example.org/my.hsh>

// binary octets for signature

--MIME_boundary--

1.3 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119].

This specification uses a number of namespace prefixes throughout; they are listed below. Note that the choice of any namespace prefix is arbitrary and not semantically significant.

Prefixes and Namespaces used in this specification.
Prefix	Namespace
Prefix	Notes
xop	"http://www.w3.org/2003/12/xop/include"
xop	A non-normative XML Schema [XML Schema Part 1], [XML Schema Part 2] document for the "http://www.w3.org/2003/12/xop/include" namespace and for [XML 1.0] can be found at http://www.w3.org/2000/xp/Group/3/06/Attachments/include.xsd.
xmlmime	"http://www.w3.org/2004/06/xmlmime"
xmlmime	The namespace for the content type attribute.
xs	"http://www.w3.org/2001/XMLSchema"
xs	The namespace of XML Schema data types [XML Schema Part 2].

Editorial note: HR
Note that the "http://www.w3.org/2004/06/xmlmime" URI is not final and will be changed.

2 XOP Infosets Constructs

XOP operates by extracting the Optimized Content from the Original Infoset to create the XOP Infoset. In particular, the character information item children of element information items to be optimized are removed and replaced with an element information item named xop:Include. The xop:Include element information item contains an attribute information item with a link to the part of the XOP Package that carries a binary representation of the data removed from the original element information item. Details of the construction and processing of XOP serializations are provided in 3 XOP's Processing Model.

The Infoset used as input to XOP processing MUST NOT contain any element information item with a [namespace name] property of "http://www.w3.org/2003/12/xop/include" and a [local name] property of Include. Infosets containing such element information items cannot be serialized using XOP.

The following subsections provide formal definitions for the element information item and attribute information items used to construct a XOP serialization. A non-normative XML Schema for [XML 1.0] serializations of those element information item and attribute information items can be found at http://www.w3.org/2000/xp/Group/3/06/Attachments/include.xsd.

2.1 `xop:Include` element information item

The xop:Include element information item has the following constraints on its properties:

Its [local name] MUST be Include.
Its [namespace name] MUST be "http://www.w3.org/2003/12/xop/include".
There MAY be zero or more namespace qualified element information items in its [children] property. Any such element information item MUST have a [namespace name] ant that [namespace name] MUST NOT be "http://www.w3.org/2003/12/xop/include". It MUST NOT change the semantics of processing the xop:Includeelement information items and MUST be ignored if it is not recognized.
There MAY be one or more attribute information items in its [attributes] property:
- A mandatory hrefattribute information items (see 2.2 href attribute information item).
- Other namespace qualified attribute information items; these MUST have a [namespace name], the value of which MUST NOT be "http://www.w3.org/2003/12/xop/include". Any such attribute information items MUST NOT change the semantics of processing the xop:Includeelement information items and MUST be ignored if not recognized.

2.2 `href` attribute information item

The href attribute information item has the following constraints on its properties:

Its [local name] MUST be href.
Its [namespace name] MUST be empty.
Its [normalized value] MUST be a representation of a URI (see [RFC 2396] as amended by [RFC2732]) referencing the part of the package containing the data logically included by the [owner element] (i.e., the xop:Includeelement information item). The [normalized value] MUST be a valid lexical form of the XML Schema xs:anyURI datatype (see [XML Schema Part 2]3.2.17 anyURI).
Its [owner element] MUST be the xop:Includeelement information item containing the attribute information item.

3 XOP's Processing Model

This section describes the processing model for creating XOP Packages and interpreting XOP Packages. Unless otherwise stated, the result of such processing MUST be semantically equivalent to performing the specified steps separately, and in the order given.

3.1 Creating XOP Packages

To create a XOP Package from an Original XML Infoset:

Ensure that the Original XML Infoset contains no element information item with a [namespace name] of "http://www.w3.org/2003/12/xop/include" and a [local name] of Include. As discussed in 2 XOP Infosets Constructs, XML Infosets with such element information items cannot be represented using XOP.
Create an empty package.
Identify within the Original XML Infoset the element information items to be optimized. To be optimized, the characters comprising the [children] of the element information item MUST be in the canonical form of xs:base64Binary (see [XML Schema Part 2]3.2.16 base64Binary) and MUST not contain any whitespace characters, preceding, inline with or following the non-whitespace content.
Create a XOP Infoset which is a copy of the Original XML Infoset, but with the [children] of each element information item identified in the previous step replaced by a xop:Includeelement information item (see 2.1 xop:Include element information item) constructed as follows:
1. Transform the replaced characters into binary data by processing them as base64-encoded data.
2. Serialize the binary data into a new part of the package, with appropriate metadata corresponding to the [normalized value] of the hrefattribute information item of the xop:Includeelement information item (see 2.2 href attribute information item).
3. If the element information item being optimized (i.e., the [parent] of the newly inserted xop:Includeelement information item) has a xmlmime:content-typeattribute information item, its value SHOULD be reflected appropriately in the part's metadata.
Serialize the resulting XOP Infoset into the package using any W3C recommendation-level version of XML (e.g., [XML 1.0], [XML 1.1]) and identify it as the root part according to the packaging mechanism's convention, labeling it with the appropriate media type, as described in 5 Identifying XOP Documents.

Additional parts MAY be added to the package to satisfy application specific requirements. Other content-specific metadata MAY be reflected in the packaging metadata as appropriate.

If content cannot be successfully encoded into the XOP Infoset, implementations SHOULD behave as if that portion of the Original XML Infoset was not nominated for optimization.

3.2 Interpreting XOP Packages

This section specifies the means by which the Original XML Infoset can be reconstructed from a XOP Package that has been prepared according to the rules of 3.1 Creating XOP Packages.

Note: conventions or error reporting mechanisms to be used in processing packages that incorrectly purport to be XOP Packages are beyond the scope of this specification.

To create a Reconstituted XML Infoset from a XOP Package:

Construct an XML Infoset by parsing the root part of the package as an XML document. The document MUST be parsed according to the level of the XML Recommendation identified by the XML declaration of that document. If no XML declaration is present, then the document MUST be parsed per [XML 1.0].
Using that XML Infoset, for each element information item which has as its [children] a xop:Includeelement information item (as defined in 2.1 xop:Include element information item):
1. Locate the part of the package corresponding to the URI in the xop:Include's hrefattribute information item (i.e., corresponding to the URI encoded in the attribute information item's [normalized value]).
2. Replace the element information item's [children] with a character information items representing the canonical base64 encoding of the entity body of the identified package part (i.e., effectively replace the xop:Includeelement information item with the data reconstructed from the package part).

4 XOP Packages

XOP is capable of using a variety of underlying packaging mechanisms. Such packaging mechanisms MUST be able to represent, with full fidelity all the parts created according to 3 XOP's Processing Model (see 3.1 Creating XOP Packages).

The subsection below specifies normatively how a particular packaging mechanism, MIME Multipart/Related, is used, but does not preclude the use of other packaging mechanisms with the XOP convention.

4.1 MIME Multipart/Related XOP Packages

This section describes how MIME Multipart/Related packaging (as specified in [RFC 2387]) is used with XOP.

The root MIME part is the root part of the package, and MUST be a serialization of the XOP Infoset using any W3C recommendation-level version of XML (e.g., [XML 1.0], [XML 1.1]), as defined below. This root MIME part MUST be identified with a media type that is specific to the XOP encoding of the Original XML Infoset's media type, as described in 5 Identifying XOP Documents. This media type MAY specify which version of XML was used for the serialization of the XOP Infoset.

Except for purposes of determining the root MIME part, as specified by [RFC 2387], ordering of MIME parts MUST NOT be considered significant to XOP processing or to the construction of the XOP Infoset.

Part metadata is reflected in MIME header fields. Specifically, if the URI used in the value of a xop:Include element's href attribute has a 'cid' scheme, the corresponding MIME part's Content-ID header field (see [RFC 2387] and [RFC 2392]) MUST have a corresponding field-value. Otherwise, the MIME part's Content-Location header field (see [RFC 2557]) MUST have a field-value identical to the URI in the value of the href attribute.

Furthermore, if a xmlmime:content-type attribute information item is found (as described in 3 XOP's Processing Model), it SHOULD be reflected in the MIME Content-Type header's field-value.

5 Identifying XOP Documents

XOP Packages can use any of a variety of packaging mechanisms, and furthermore can serialize any number of XML formats. To enable applications to identify the use of XOP as well as the original Infoset's media type, a distinct media type MUST be registered for each application of XOP to a particular XML application vocabulary. This media type will be used to label the XOP package's root part, while the package's media type itself is unchanged. Formats MAY also restrict which version of XML the media type allows to be used in the XOP Infoset.

For example, if the format identified by "application/soap+xml" is to be packaged as XOP serializations, then a XOP-specific media type (e.g., "application/soap_xop+xml") MUST be registered. A XOP Package using the Multipart/Related packaging mechanism and serializing such an Infoset would have a package media type of "multipart/related" and a root media type of "application/soap_xop+xml".

Processors wishing to identify the use of XOP or the media type of the resulting Output Infoset MAY examine the root media type of the package. However they MAY also rely on other mechanism for doing this identification.

Note that there is currently no convention for structuring the names of XOP-based media types.

6 Security Considerations

6.1 XOP Package Integrity

The integrity of Infosets optimized using XOP may need to be ensured. As XOP packages can be transformed to recover such Infosets (see 3.2 Interpreting XOP Packages, existing XML Digital Signature techniques can be used to protect them. Note, however, that a signature over the Infoset does not necessarily protect against modifications of other aspects of the XOP packaging; for example, an Infoset signature check might not protect against re-ordering of non-root parts.

6.2 XOP Package Confidentiality

The confidentiality of XOP Packages may need to be ensured. As such packages can be transformed to an XML Information Set, existing XML Encryption (see [XML Encryption]) techniques can be used to protect such packages. Any part of a package can be encrypted, whether it includes base64 characters or not. The resulting CipherData element information item can then be optimized because the content of such an element information item is base64 characters.

In future a transform algorithm for use with XML Encryption could provide a more efficient processing model where the raw octets are encrypted directly.

XML-binary Optimized Packaging

W3C Working Draft 8 June 2004

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Terminology

1.2 Example

1.3 Notational Conventions

2 XOP Infosets Constructs

2.1 `xop:Include` element information item

2.2 `href` attribute information item

3 XOP's Processing Model

3.1 Creating XOP Packages

3.2 Interpreting XOP Packages

4 XOP Packages

4.1 MIME Multipart/Related XOP Packages

5 Identifying XOP Documents

6 Security Considerations

6.1 XOP Package Integrity

6.2 XOP Package Confidentiality

A Relationship to other specifications

A.1 Dependencies

A.2 Payload

A.3 Extension

B References

B.1 Normative References

B.2 Informative References