Xerox Document Services Document Model

1 Introduction

1.1 Background

While electronic documents are a dominant and increasing part of the business world, paper documents have not disappeared, and are in fact increasing in absolute terms. While paper documents offer significant advantages in reading, understanding, some kinds of editing, under certain legal conditions, and in the many other affordances of paper, electronic documents are undeniably the coin of the realm in today's business world. Increasingly, business is dealing with compound documents containing both paper and electronic forms. Storing copies of paper documents electronically gives the best of both worlds, allowing the full ease of electronic document operations to be applied to paper documents, yet allows paper access to electronic documents when necessary and valuable. Capturing paper documents in a usable electronic form — and being able to print, copy, or otherwise operate on them from a desktop computer or a networked document appliance — is of great importance to business.

Documents are most valuable when they are , and can serve as memory association triggers. Many paper documents are situated by physical filing systems and human spatial memory, and not all of the information about the documents (meta-data) is in a form that is easily captured electronically. Electronic documents are usually situated by use of meta-data, or named properties of documents. Scanned electronic documents rapidly lose their value if the association between the document identity, document content, and documnt meta-data is lost.

Capturing paper documents has traditionally been an expensive business process. First, documents are scanned, and saved to removable media or an extranet, then checked for quality and possibly re-scanned, then sent to a "coding bureau" to have meta-data typed in and associated, and then finally shipped to a document repository. The total cost of this cycle is tremendous, as information present at each step is lost before the next and must be recreated, at great cost. In summary, the process of capturing scanned document meta-data for scanned documents is labor intensive, and is best done closest to the source of the documents, as it can be expensive to recover the meta-data at a later date, or by someone other than the document's owner.

Document Services re-envisions the paper-electronic boundary, and uses a capture technology that associates the meta-data with the document as soon as possible, when the document is still situated, and gives immediate feedback about the document quality, thus reducing the cost of both the capture and QA steps of document capture.

A typical paper business document achieves importance by being in the hands of a knowlege worker, who not only knows the value of the document, but also knows the context. Thus, he or she is an idea person to capture the meta-data associated with the value and context of the document, and to approve the quality of its capture. Unfortunately, traditional document capture technologies are cumbersome and time-consuming, so it is not cost-effective to pay knowledge workers to handle their own documents. A Document Services-based system aims to reduce the cost of handling and capturing documents to produce rich repositories of electronic knowledge, at low cost, by integrating the handling of paper and electronc documents into the normal work practice of knowledge workers, with operations that are defined in their terms, rather than focused on traditional scanning procedures.

This specification defines one important part of a compound paper-electronic document processing system, the Xerox Document Services Document Model, which is an XML instance document modeling the documents under processing, and holding their rendition and meta data information. Other key components of a document processing system are listed here, but are beyond the scope of this paper: a Document Service Orchestrator, which accepts a workflow definition XML document describing a process for performing document services. Document Services include capturing a document, adding meta-data to it, performing quality assurance, apply transformations such as OCR to both renditions and meta-data, storing in a document repository and dispatching the document to a target such as a printer or e-mail address.

Manipulation of the XDSDM by document services is done through XPath expressions (see [XPath 1.0]), as is done in XForms (see [XForms 1.0]). In fact, the XDSDM itself is similar to the instance document in XForms, but instead of being modified through user interface controls, it is modified by document services.

Document Services that operate on the XDSDM can retrieve documents, metadata, and renditions from the model by using XPath expressions, and can update the model to add or change renditions or metadata through the use of XML Events (see [XML Events]) with DOM mutation actions, whose location is specified by XPath expressions and whose contents are described in the event payload. Unfortunately, the XML Events specification does not specify action handlers, and so the DOM mutation handlers are presently implemented as shorthand for an XSLT transformation (see [XSLT 1.0]) in which the action handler body is an transformed into an XSLT transformation, which is then applied to the identified element document in the XDSDM with the event payload available as a the result of an XSLT extension function in XPath expressions.

Issue (issue-xml-event-handlers):

XML Handlers

A recommendation for DOM Mutation and scripting in XML Event handlers would be most welcome.

The document content itself is not stored in the model, but is refererred to by URI (see [RFC 2396]), and is compatible with the XForms 1.0 element upload and XForms 1.0 submission methodmultipart-related serialization.

1.2 Documentation Conventions

Throughout this document, the following namespace prefixes and corresponding namespace identifiers are used:

doc:The Document Services Document Model namespace (http://www.example.com/document) A.1 Schema for Document Model
ri:The Document Services Rendition Information namespace (http://www.example.com/rendition-info) A.2 Schema for Rendition Information
xsd:The XML Schema namespace (http://www.w3.org/2001/XMLSchema)[XML Schema part 1]
xsi:The XML Schema for instances namespace (http://www.w3.org/2001/XMLSchema-instance)[XML Schema part 1]
my:Any user defined namespace

2 Document Structure

The XDSDM is derived from the document model of [System 33], in which a document is separated into a triple:

an identity with a unique handle
a set of parallel renditions representing the content of the document, each rendition having a series of named properties
a set of named metadata items

In XDSDM, each of the items in this triple is represented by an element: the identity by the element document, the renditions by a sequence of elements rendition, and the metadata items by a containing element metadata. The content of the renditions themselves are not stored in the model, but are referenced by an attribute on rendition.

An XDSDM element documents contains zero or more documents, each of which can have zero or more renditions (content), and zero or more pieces of metadata. XML Schema descriptions for the XDSDM instance, document, rendition, and metadata structures are given. These schemas use XML namespaces for extensibility. Other XML applications such as [Guidelines for implementing Dublin Core in XML] are used where appropriate.

2.1 Common Attributes

2.1.1 Attribute `doc:id`

Attribute doc:id are common to most elements in this proposal; however the use of multiple namespaces complicates the question of the namespace for the declaration of attribute id.

Issue (issue-id-attribute):

xml:id

An attribute xml:id added to the XML namespace would simplify matters greatly for XML applications using containing languages and multiple namespaces.

Foreign attributes are generally allowed, but and services may ignore them.

2.1.2 Attribute `doc:mustUnderstand`

Services must process all elements and attributes in the following namespaces:

http://www.example.com/document
http://www.example.com/rendition-info
http://purl.org/dc/elements/1.1/

The attribute doc:mustUnderstand is used on any child element of metadata or rendition to indicate that any service processing the document must understand that element, and must not process the document if it does not. This concept is borrowed from [SOAP 1.2] and [XForms 1.0].

Issue (issue-mustUnderstand-attribute):

mustUnderstand

A common namespace for this concept would be beneficial to producers and services of loosely coupled multiple-namespace documents.

2.2 Elements related to documents

2.2.1 Element `documents`

XDSDM provides a containing element documents which holds a sequence of elements document.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

2.2.2 Element `document`

In XDSDM, the document identity is provided by element document, with the unique handle provided by attribute id, which is unique only to the particular model. The model is composed of an XML document containing a sequence of zero or more elements document.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

Issue (issue-foreign-elements-ordering):

Foreign Elements and Ordering

We would like element document to contain at most one element renditions and at most one element metadata, but any number of foreign elements. It is difficult to express the unordered choice of zero or one of these specified elements and at the same time allow any number of unordered foreign elements. This problem puts uncomfortable constraints on documents with multiple XML applications.

2.3 Elements related to renditions

2.3.1 Element `doc:renditions`

The element doc:renditions serves as a containing element for elements doc:rendition.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

2.3.2 Element `doc:rendition`

A document can have zero or more child elements doc:rendition. Each rendition is a whole rendition of the document, though the content type, quality, fidelity, and other attributes of the rendition may vary.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

2.3.3 Element `doc:renditionSequence`

Some documents are composed of an ordered sequence of renditions; for example, a document consisting of a scanned TIFF [TIFF 6.0] file followed by a PDF file [PDF 3.0], would have a doc:rendition containing a doc:renditionSequence containing a sequence of two doc:rendition elements.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

2.3.4 Namespace http://www.example.com/rendition-info

While the A.1 Schema for Document Model provides basic information about the existence of renditions and the location of their content, it provides no information about the rendition itself. Any namespace is allowed as a child element of doc:rendition, but for interoperability, this paper proposes a canonical set of rendition information elements in the namespace http://www.example.com/rendition-info.

While all renditions of a document are in some sense equivalent, they do have different properties; for example, an original scanned image will have near 100% fidelity to the paper document, but an OCR'd version of the document as plain text would have a low fidelity, perhaps 10%, and an uncorrected accuracy of perhaps 85%. The Schema for these and other common properties of renditions is given in A.2 Schema for Rendition Information.

2.4 Elements related to Meta Data

2.4.1 Element `doc:metadata`

The element doc:metadata specifies a sequence of any items in any other namespace. It is up to the application using the document model to place constraints on the type of metadata to be gathered; however, see 2.4.2 Dublin Core Elements.

Foreign attributes are allowed, and services may ignore them. Foreign elements are allowed, and processing is subject to 2.1.2 Attribute doc:mustUnderstand.

2.4.2 Dublin Core Elements

The [Guidelines for implementing Dublin Core in XML] specify an embedding of Dublin Core elements in XML, and it is proposed that Dublin Core metadata items be used where practical. Services must understand these elements, and must not process a document if not.

3 Glossary Of Terms

document service: [Definition: A generic term for systems and services that process documents, but in this paper used specifically to refer to services on scanned image documents and their derivatives. Services include document capture, transformation, and distribution. ]
document services document model: [Definition: A Document Services Document Model is a single element documents which serves as a container for a series of documents]
document services-orchestrator: [Definition: A processor designed to apply a sequence of to a .]
OCR: [Definition: A class of that accepts an mediaType image/* rendition produces a new rendition of type text/* (or similar coded type) and optionally also produces new metadata.]
document repository: [Definition: A document storage and retrieval facility, such as a file server, web server, or other system.]
situated: [Definition: Situated documents obtain meaning from physical context. ]
target: [Definition: A destination for a document, such as a or a printer.]
meta-data: [Definition: Data about a document, separate from its content; For example, the type of document is metadata -- contract, letter, newspaper clipping.]
document: [Definition: In this paper, "document" refers to a scanned image document or a coded document derived from one.]
rendition: [Definition: A rendition of a document is a reference the content of the document, as distinct from the location or identity of the document, or its meta-data. Documents can have multiple renditions, each with different properties; for example, there may be both an image and a text rendition of a document.]

A Schemas for Xerox Document Service Document Model

The example XML Schemas for XDSDM and related Rendition Information and Meta-Data namespaces are below:

A.1 Schema for Document Model

This is the XML Schema for the Document Model

<xs:schema xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:doc="http://www.example.com/document" targetNamespace="http://www.example.com/document" elementFormDefault="qualified">

  <xs:import namespace="http://purl.org/dc/elements/1.1/" schemaLocation="http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd" />

  
  <xs:attributeGroup name="Attributes">
    <xs:anyAttribute namespace="##other" />
  </xs:attributeGroup>

  
  <xs:element name="documents" type="doc:documentsType" />
  <xs:element name="document" type="doc:documentType" />
  <xs:element name="metadata" type="doc:metadataType" />
  <xs:element name="rendition" type="doc:renditionType" />
  <xs:element name="renditions" type="doc:renditionsType" />
  <xs:element name="renditionSequence" type="doc:renditionSequenceType" />

  
  <xs:complexType name="documentsType">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:choice>
        <xs:element ref="doc:document" />
        <xs:any namespace="##other" />
      </xs:choice>
    </xs:sequence>
    <xs:attributeGroup ref="doc:Common.Attributes" />
    <xs:attribute name="id" type="xs:ID" use="optional" />
  </xs:complexType>

  <xs:complexType name="documentType">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:choice>
        
        <xs:element ref="doc:renditions" />
        
        <xs:element ref="doc:metadata" />
        <xs:any namespace="##other" />
      </xs:choice>
    </xs:sequence>
    <xs:attributeGroup ref="doc:Common.Attributes" />
    
    <xs:attribute name="id" type="xs:ID" use="required" />
  </xs:complexType>

  <xs:complexType name="metadataType">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:choice>
        <xs:any namespace="http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd" />
        <xs:any namespace="##other" />
      </xs:choice>
    </xs:sequence>
    <xs:attribute name="document" type="xs:IDREF" use="optional" />
    <xs:attribute name="id" type="xs:ID" use="required" />
  </xs:complexType>

  
  <xs:complexType name="renditionsType">
    <xs:sequence>
      <xs:element ref="doc:rendition" minOccurs="0" maxOccurs="unbounded" />
      <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attributeGroup ref="doc:Common.Attributes" />
    <xs:attribute name="document" type="xs:IDREF" use="optional" />
    <xs:attribute name="id" type="xs:ID" use="required" />
  </xs:complexType>

  
  <xs:complexType name="renditionSequenceType">
    <xs:sequence>
      <xs:element ref="doc:rendition" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attributeGroup ref="doc:Common.Attributes" />
    <xs:attribute name="document" type="xs:IDREF" use="optional" />
    <xs:attribute name="id" type="xs:ID" use="required" />
  </xs:complexType>

  <xs:complexType name="renditionType">
    <xs:sequence>
      <xs:element ref="doc:renditionSequence" minOccurs="0" maxOccurs="unbounded" />
      <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="src" type="xs:anyURI" use="optional" />
    <xs:attribute name="document" type="xs:IDREF" use="optional" />
    <xs:attribute name="id" type="xs:ID" use="optional" />
    <xs:attributeGroup ref="doc:Common.Attributes" />
  </xs:complexType>

</xs:schema>

A.2 Schema for Rendition Information

This is the XML Schema for Rendition Information. Rendition Information is a common set of rendition properties that are expected to be understood by all services, but are not the exclusive set of properties.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ri="http://www.xerox.com/dsp/2002/gemini/rendition-info" targetNamespace="http://www.example/rendition-info" elementFormDefault="qualified">
  
  
  <xs:element name="language" type="ri:languageType" />
  <xs:element name="typesetting" type="ri:typesettingType" />
  <xs:element name="filename" type="xs:string" />
  <xs:element name="xresolution" type="xs:decimal" />
  <xs:element name="yresolution" type="xs:decimal" />
  
  <xs:element name="fidelity" type="xs:decimal" />
  <xs:element name="accuracy" type="xs:decimal" />
  <xs:element name="contentType" type="xs:string" />
  <xs:attribute name="contentLength" type="xs:nonNegativeInteger" />
  <xs:element name="pageCount" type="xs:nonNegativeInteger" />
  
  <xs:simpleType name="languageType">
    <xs:restriction base="xs:string" />
  </xs:simpleType>
  <xs:simpleType name="typesettingType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="printed" />
      <xs:enumeration value="dotMatrix24" />
      <xs:enumeration value="dotMatrix9" />
      <xs:enumeration value="handPrint" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

B References

B.1 Normative References

XForms 1.0: XForms 1.0, M Dubinko, et. al, 2003. W3C Recommendation available at http://www.w3.org/TR/xforms/ .
RFC 2396: RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, 1998. Available at http://www.ietf.org/rfc/rfc2396.txt.
XHTML Modularization: Modularization of XHTML, M. Altheim, et al., 2001. W3C Recommendation available at http://www.w3.org/TR/xhtml-modularization/ .
XML Base: XML Base, Jonathan Marsh, 2001. W3C Recommendation available at http://www.w3.org/TR/xmlbase/ .
XML 1.0: Extensible Markup Language (XML) 1.0 (Second Edition), Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, 2000. W3C Recommendation available at http://www.w3.org/TR/REC-xml
XML Names: Namespaces in XML, Tim Bray, Dave Hollander, Andrew Layman, 1999. W3C Recommendation available at http://www.w3.org/TR/REC-xml-names .
SOAP 1.2: SOAP Version 1.2 Part 0: Primer, Nilo Mitra, 2003. W3C Recommendation available at http://www.w3.org/TR/soap12-part0/ .
XPath 1.0: XML Path Language (XPath) Version 1.0, James Clark, Steve DeRose, 1999. W3C Recommendation available at http://www.w3.org/TR/xpath .
XSLT 1.0: XSL Transformations (XSLT) Version 1.0, James Clark, 1999. W3C Recommendation available at http://www.w3.org/TR/xslt .
XML Schema part 1: XML Schema Part 1: Structures, Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, 2001. W3C Recommendation available at http://www.w3.org/TR/xmlschema-1/ .
XML Schema part 2: XML Schema Part 2: Datatypes, Paul V. Biron, Ashok Malhotra, 2001. W3C Recommendation available at http://www.w3.org/TR/xmlschema-2/ .

B.2 Informative References

XML Events: XML Events - An events syntax for XML, Steven Pemberton, T. V. Raman, Shane P. McCarron, 2003. W3C Recommendation available at http://www.w3.org/TR/xml-events/ .
XHTML 1.0: XHTML 1.0: The Extensible HyperText Markup Language - A Reformulation of HTML 4 in XML 1.0, Steven Pemberton, et al., 2000. W3C Recommendation available at http://www.w3.org/TR/xhtml1 .
XML Schema part 0: XML Schema Part 0: Primer, David C. Fallside, 2001. W3C Recommendation available at http://www.w3.org/TR/xmlschema-0/ .
System 33: Design and Implementation of the System 33 Document Service, Putz, Steve, 1993. Xerox PARC P93-00112. Available at http://wwww.parc.com/about/history/publications/bw-ps/system33.ps .
Guidelines for implementing Dublin Core in XML: Guidelines for implementing Dublin Core in XML, Powell, Andy, et. al. Available at http://dublincore.org/documents/dc-xml-guidelines/ .
The W3C Workshop on Web Applications and Compound Documents: The W3C Workshop on Web Applications and Compound Documents. Available at http://www.w3.org/2004/04/webapps-cdf-ws/ .
TIFF 6.0: TIFF 6.0, Adobe Systems, Incorporated, 1992. Available at http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf.
PDF 3.0: PDF Reference, Third Edition, Version 1.4. Adobe Systems, Incorporated, 2003. Addison-Wesley, ISBN 0-201-75839-3. Available at http://partners.adobe.com/asn/acrobat/docs/File_Format_Specifications/PDFReference.pdf .

C Xerox Document Service Document Model Use Example (Non-Normative)

This section presents an example use of the XDSDM in . The first example shows a job document before OCR, and the second shows how the documents instance is updated by the OCR service.

C.1 Before OCR

<o:job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:transformation="http://www.example.com/transformation" xmlns:template="http://www.example.com/template" xmlns:services="http://www.example.com/services" xmlns:rq="http://www.example.com/rendition-request" xmlns:ri="http://www.example.com/rendition-info" xmlns:rfc822="urn:IANA:namespace:rfc822" xmlns:repositories="http://www.example.com/repositories" xmlns:ocr="http://www.example.com/ocr" xmlns:o="http://www.example.com/orchestration" xmlns:ev="http://www.w3c.org/2002/xml-events" xmlns:emx="urn:ietf:params:email-xml" xmlns:email="http://www.example.com/email" xmlns:doc="http://www.example.com/document" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:containers="http://www.example.com/containers">

  
  <o:data>
    
    <doc:documents>
      <doc:document id="input-document">
        <doc:metadata xmlns="">
          <dc:description xml:lang="en">Example.com</dc:description>
          <dc:title>Purchase Order</dc:title>
          <ClientNumber>7764</ClientNumber>
        </doc:metadata>
        <doc:renditions>
          <doc:rendition id="scanned-rendition" src="file://localhost/documents/03fbd3c8.tiff">
            <contentType>image/tiff</contentType>
            <ri:xresolution>300</ri:xresolution>
            <ri:yresolution>300</ri:yresolution>
            <ri:fidelity>100</ri:fidelity>
            <ri:pageCount>42</ri:pageCount>
            <ri:contentLength>12427305</ri:contentLength>
          </doc:rendition>
          <doc:rendition id="ocr-rendition" />
        </doc:renditions>
      </doc:document>
    </doc:documents>
  </o:data>

  
  <o:step>
    <containers:TransformDocument implementation="OCR">
      
      <containers:input document="InputDoc">
        <rq:renditionRequest>
          <ri:content-type>image/tiff image/*</ri:content-type>
          <rq:minimum><ri:resolution>300</ri:resolution></rq:minimum>
        </rq:renditionRequest>
      </containers:input>
      <containers:output rendition="OCR" />
      <action ev:event="containers:invoke">
        <containers:invoke>
          <template:template name="services:TransformData">
            <renditions>
              <template:copy select="ocr:bestRendition(renditions())" />
            </renditions>
            <transformation:renditionRequest xsi:type="ocr:ocrRenditionRequest">
              <ocr:recognizeText>
                <ri:language>en</ri:language>
                <ocr:textFormat>Searchable PDF</ocr:textFormat>
                <ocr:tradeOff>speed</ocr:tradeOff>
                <ri:typesetting>printed</ri:typesetting>
                <ocr:layout>auto</ocr:layout>
              </ocr:recognizeText>
            </transformation:renditionRequest>
          </template:template>
        </containers:invoke>
       </action>
      </containers:TransformDocument>
    </o:step>

    
    <o:step>
      <containers:SendDocument implementation="Email" groupName="Aaron">
        <containers:input document="InputDoc" rendition="instance('documentData')/id('OCR')" />
        <action ev:event="step">
          <containers:invoke>
            <template:template name="services:SendData">
              <emx:Message>
                <rfc822:subject><template:value select="metadata()/Title" /></rfc822:subject>
                <emx:content type="text/plain">
                  <template:value select="metadata()/Description" />
                </emx:content>
                <emx:content>
                   <template:attribute name="type" value="rendition()/@content-type" />
                   <template:copy select="rendition()" />
                </emx:content>
                <rfc822:to>
                  <emx:Address>
                    <emx:adrs>mailto:Fred.Derf@example.com</emx:adrs>
                    <emx:name>Fred Derf</emx:name>
                  </emx:Address>
                </rfc822:to>
              </emx:Message>
            </template:template>
          </containers:invoke>
      </action>
    </containers:SendDocument>
  </o:step>

  <o:completion>
    <NamedEmailConfirmation>
      <ConfirmationAddress>
        <emx:Address>
          <emx:adrs>mailto:Fred.Derf@usa.xerox.com</emx:adrs>
          <emx:name>Fred Derf</emx:name>
        </emx:Address>
      </ConfirmationAddress>
    </NamedEmailConfirmation>
  </o:completion>

</o:job>

C.2 After OCR

<o:job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:transformation="http://www.example.com/transformation" xmlns:template="http://www.example.com/template" xmlns:services="http://www.example.com/services" xmlns:rq="http://www.example.com/rendition-request" xmlns:ri="http://www.example.com/rendition-info" xmlns:rfc822="urn:IANA:namespace:rfc822" xmlns:repositories="http://www.example.com/repositories" xmlns:ocr="http://www.example.com/ocr" xmlns:o="http://www.example.com/orchestration" xmlns:ev="http://www.w3c.org/2002/xml-events" xmlns:emx="urn:ietf:params:email-xml" xmlns:email="http://www.example.com/email" xmlns:doc="http://www.example.com/document" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:containers="http://www.example.com/containers">

  
  <o:data>
    
    <doc:documents>
      <doc:document id="input-document">
        <doc:metadata xmlns="">
          <dc:description xml:lang="en">Example.com</dc:description>
          <dc:title>Purchase Order</dc:title>
          <ClientNumber>7764</ClientNumber>
        </doc:metadata>
        <doc:renditions>
          <doc:rendition id="scanned-rendition" src="file://localhost/documents/03fbd3c8.tiff">
            <contentType>image/tiff</contentType>
            <ri:xresolution>300</ri:xresolution>
            <ri:yresolution>300</ri:yresolution>
            <ri:fidelity>100</ri:fidelity>
            <ri:pageCount>42</ri:pageCount>
            <ri:contentLength>12427305</ri:contentLength>
          </doc:rendition>
          <doc:rendition id="ocr-rendition" src="file://localhost/documents/273ffde.txt">
            <ri:contentType>text/plain</ri:contentType>
            <ri:contentLength>42930</ri:contentLength>
            <ri:fidelity>10</ri:fidelity>
            <ri:accuracy>90</ri:accuracy>
          </doc:rendition>
        </doc:renditions>
      </doc:document>
    </doc:documents>
  </o:data>

  
  <o:step>
    <containers:TransformDocument implementation="OCR">
      
      <containers:input document="InputDoc">
        <rq:renditionRequest>
          <ri:content-type>image/tiff image/*</ri:content-type>
          <rq:minimum><ri:resolution>300</ri:resolution></rq:minimum>
        </rq:renditionRequest>
      </containers:input>
      <containers:output rendition="OCR" />
      <action ev:event="containers:invoke">
        <containers:invoke>
          <template:template name="services:TransformData">
            <renditions>
              <template:copy select="ocr:bestRendition(renditions())" />
            </renditions>
            <transformation:renditionRequest xsi:type="ocr:ocrRenditionRequest">
              <ocr:recognizeText>
                <ri:language>en</ri:language>
                <ocr:textFormat>Searchable PDF</ocr:textFormat>
                <ocr:tradeOff>speed</ocr:tradeOff>
                <ri:typesetting>printed</ri:typesetting>
                <ocr:layout>auto</ocr:layout>
              </ocr:recognizeText>
            </transformation:renditionRequest>
          </template:template>
        </containers:invoke>
       </action>
      </containers:TransformDocument>
    </o:step>

    
    <o:step>
      <containers:SendDocument implementation="Email" groupName="Aaron">
        <containers:input document="InputDoc" rendition="instance('documentData')/id('OCR')" />
        <action ev:event="step">
          <containers:invoke>
            <template:template name="services:SendData">
              <emx:Message>
                <rfc822:subject><template:value select="metadata()/Title" /></rfc822:subject>
                <emx:content type="text/plain">
                  <template:value select="metadata()/Description" />
                </emx:content>
                <emx:content>
                   <template:attribute name="type" value="rendition()/@content-type" />
                   <template:copy select="rendition()" />
                </emx:content>
                <rfc822:to>
                  <emx:Address>
                    <emx:adrs>mailto:Fred.Derf@example.com</emx:adrs>
                    <emx:name>Fred Derf</emx:name>
                  </emx:Address>
                </rfc822:to>
              </emx:Message>
            </template:template>
          </containers:invoke>
      </action>
    </containers:SendDocument>
  </o:step>

  <o:completion>
    <NamedEmailConfirmation>
      <ConfirmationAddress>
        <emx:Address>
          <emx:adrs>mailto:Fred.Derf@usa.xerox.com</emx:adrs>
          <emx:name>Fred Derf</emx:name>
        </emx:Address>
      </ConfirmationAddress>
    </NamedEmailConfirmation>
  </o:completion>

</o:job>

D Changelog (Non-Normative)

This section summarizes changes since the previous draft of this document..

Approved for Publication May 17, 2004.

E Acknowledgments (Non-Normative)

This model was produced with the participation the following individuals:

Charlotte Baltus, Xerox
Julia Craig, Xerox
Rich Hyde, Xerox
Leigh L. Klotz, Jr., Xerox
Nizam Mohideen, Xerox (former)
William Stumbo, Xerox
David Tilley, Xerox
Aaron Witt, Xerox

F Production Notes (Non-Normative)

This document was encoded in the XMLspec DTD (which has documentation available). The XML sources were transformed using xmlspec.xsl style sheet. The XML Schemas and examples were rendered with the xmlverbatim XSLT stylesheet Emacs was used for editing. The XML was validated using XMLLint (part of the GNOME libxml package) and transformed using XSLTProc—part of the GNOME libxsl package).

Xerox Document Services Document Model

Workshop Position Paper 12 May 2004

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Background

1.2 Documentation Conventions

2 Document Structure

2.1 Common Attributes

2.1.1 Attribute `doc:id`

2.1.2 Attribute `doc:mustUnderstand`

2.2 Elements related to documents

2.2.1 Element `documents`

2.2.2 Element `document`

2.3 Elements related to renditions

2.3.1 Element `doc:renditions`

2.3.2 Element `doc:rendition`

2.3.3 Element `doc:renditionSequence`

2.3.4 Namespace http://www.example.com/rendition-info

2.4 Elements related to Meta Data

2.4.1 Element `doc:metadata`

2.4.2 Dublin Core Elements

3 Glossary Of Terms

A Schemas for Xerox Document Service Document Model

A.1 Schema for Document Model

A.2 Schema for Rendition Information

B References

B.1 Normative References

B.2 Informative References

C Xerox Document Service Document Model Use Example (Non-Normative)

C.1 Before OCR

C.2 After OCR

D Changelog (Non-Normative)

E Acknowledgments (Non-Normative)

F Production Notes (Non-Normative)

Xerox Document Services Document Model

Workshop Position Paper 12 May 2004

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Background

1.2 Documentation Conventions

2 Document Structure

2.1 Common Attributes

2.1.1 Attribute doc:id

2.1.2 Attribute doc:mustUnderstand

2.2 Elements related to documents

2.2.1 Element documents

2.2.2 Element document

2.3 Elements related to renditions

2.3.1 Element doc:renditions

2.3.2 Element doc:rendition

2.3.3 Element doc:renditionSequence

2.3.4 Namespace http://www.example.com/rendition-info

2.4 Elements related to Meta Data

2.4.1 Element doc:metadata

2.4.2 Dublin Core Elements

3 Glossary Of Terms

A Schemas for Xerox Document Service Document Model

A.1 Schema for Document Model

A.2 Schema for Rendition Information

B References

B.1 Normative References

B.2 Informative References

C Xerox Document Service Document Model Use Example (Non-Normative)

C.1 Before OCR

C.2 After OCR

D Changelog (Non-Normative)

E Acknowledgments (Non-Normative)

F Production Notes (Non-Normative)

2.1.1 Attribute `doc:id`

2.1.2 Attribute `doc:mustUnderstand`

2.2.1 Element `documents`

2.2.2 Element `document`

2.3.1 Element `doc:renditions`

2.3.2 Element `doc:rendition`

2.3.3 Element `doc:renditionSequence`

2.4.1 Element `doc:metadata`