XML Pipeline Language (XPL) Version 1.0 (Draft)

1 Introduction

1.1 What is XPL?

[Definition: An XPL program or program in the XPL language is a well-formed XML document whose syntax is well-formed XML [XML 1.0] conforming to the Namespaces in XML Recommendation [XML Namespaces 1.0]. Furthermore, the XML document must conform to the syntax of the XPL language described in this specification. ]

An XPL program defines orchestrated sequences of operations on XML Information Sets (Infosets). Individual operations are encapsulated within components called XML processors. Operations include production, consumption, and transformation of XML Infosets. An XPL program supports unconditional operations, and may support as well conditions, loops, and change of control following runtime errors.

1.2 Motivation

A growing number of specifications describe operations on XML documents. The best-known specification is [XSLT 1.0], a language designed to transform XML documents into other XML documents. There are other such specifications, including [XQuery], validation languages like [RELAX NG] and [XML Schema]. No current specification adequately addresses the interoperability of those specifications from the point of view of the XML Infosets they produce or consume. XPL addresses this problem.

2 Concepts

2.1 Terminology

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119]. ]

[Definition: The term information set refers to the output of an [XML 1.0] or [XML 1.1] processor, expressed as a collection of information items and properties as defined by the [XML Infoset] specification.] In this document the term Infoset is used as a synonym for information set. ]

2.2 Notation

In this document the specification of each XPL-defined element type is preceded by a summary of its syntax in the form of a model for elements of that element type. The meaning of syntax summary notation is as defined in [XSLT 2.0 Working Draft], section 2.2.

2.3 XPL Implementation

[Definition: A specific software product able to execute an XPL program according to the XPL specification is referred to as an XPL implementation. ]

The XPL specification does not put any requirement on the underlying software platform other than being able to execute an XPL program. In fact it is hoped that XPL will be implemented in various programming languages on various platforms.

2.4 Error Handling

The following definitions are borrowed from [XSLT 2.0 Working Draft] with minor adjustments.

[Definition: An error that is detected by examining an XPL program before execution starts is referred to as a static error. ]

[Definition: An error that is not detected until an XPL program is executed is referred to as a dynamic error. ]

[Definition: Some dynamic errors are classed as recoverable errors. When a recoverable error occurs, this specification allows the XPL implementation either to signal the error (by reporting the error condition and terminating execution) in the basic profile or to take a defined recovery action and continue processing when the Exception module is available. ]

[Definition: A dynamic error that is not recoverable is referred to as a non-recoverable dynamic error. When a non-recoverable dynamic error occurs, the XPL implementation must signal the error, and the execution of the XPL program fails. ]

2.5 Qualified Names

The following definitions are borrowed from [XSLT 2.0 Working Draft] with minor adjustments.

XML processors referred to by XPL are specified as a QName using the syntax for QName as defined in [XML Namespaces 1.0].

[Definition: A QName is always written in the form (NCName ":")? NCName, that is, a local name optionally preceded by a namespace prefix. When two QNames are compared, however, they are considered equal if the corresponding expanded-QNames are the same, as described below. ]

Because an atomic value of type xs:QName is sometimes referred to loosely as a QName, this specification also uses the term lexical QName to emphasize that it is referring to a QName in its lexical form rather than its expanded form. This term is used especially when strings containing lexical QNames are manipulated as run-time values.

[Definition: A lexical QName is a string representing a QName in the form (NCName ":")? NCName, that is, a local name optionally preceded by a namespace prefix. ]

[Definition: A string in the form of a lexical QName may occur as the value of an attribute node in a stylesheet module, or within an XPath expression contained in such an attribute node, or as the result of evaluating an XPath expression contained in such an attribute node. The element containing this attribute node is referred to as the defining element of the QName. ]

[Definition: An expanded-QName contains a pair of values, namely a local name and an optional namespace URI. It may also contain a namespace prefix. Two expanded-QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded-QName needs to be converted back to a string. ]

If the QName has a prefix, then the prefix is expanded into a URI reference using the namespace declarations in effect on its defining element. The expanded-QName consisting of the local part of the name and the possibly null URI reference is used as the name of the object.

In the case of a prefixed QName used as the value of an attribute in the XPL program, or appearing within an XPath expression in the stylesheet, it is a static error if the defining element has no namespace node whose name matches the prefix of the QName.

2.6 Namespace

All the elements defined by XPL must be in the http://www.orbeon.com/oxf/xpl namespace. For consistency, this specification use the prefix p. It is recommended that authors of XPL programs use that prefix as well. However the XPL implementation must only consider the namespace URI of an element to identify it as an XPL element.

3 XML Processors

3.1 Definition

[Definition: An XML processor is a component used in an XPL program and identified by a QName. ]

An XML processor is composed of:

A set of inputs and outputs, defining how the XML processor interfaces with an XPL program. See 3.3 Inputs and Outputs.
A behavior, defining tasks performed by the XML processor during the execution of an XPL program. See 3.4 Behavior.

A small set of XML processors must be provided by the XPL implementations. See 9 Standard XML Processor Library. However most of the XML processors provided by XPL implementations are implementation-defined. XPL implementations may also choose to be extensible and allow users of the implementation to provide their own XML processors.

The use of a QName to identify an XML processor has the following benefits:

Shorter to write by hand than full URIs. Once an XML namespace prefix mapping is done, a qualified name only requires typing a prefix, which can be as short as one character, and a local name.
Allows for logical grouping of processors. A single URI regroups the XML processors in a certain category, for example the XML processors implemented by a certain company.
Consistency with other specifications. [XSLT 2.0 Working Draft] for example, uses qualified names to identify stylesheet-defined objects such as functions.

The mapping between a particular QName and an XML processor implementation is implementation-dependent.

3.2 Instances

[Definition: An XML processor instance designates a specific use of a processor in a pipeline. ]

An XML processor used in an XPL program is always instantiated. Multiple instances of the same XML processor may occur in an XPL program.

XPL does not specify how an XML processor is instantiated or keeps state information during an XPL program execution.

3.3 Inputs and Outputs

Inputs and outputs connect an XML processor instance to the rest of the pipeline. Each input may provide an XML Infoset to the XML processor instance. Each output may provide an XML Infoset produced by the processor instance.

[Definition: A static input or output is an input or output exposed by an XML processor before the execution of the XPL program. ]

[Definition: A connected input or output is an input or output of an XML processor instance connected according to 6.4 Connections. ]

[Definition: A declared input or output of an XML processor instance is an input or output with name n where a p:input or p:output element with name n is a child of the processor instance's p:processor element. See 5.3 Processor Module. ]

[Definition: A dynamic input or output is a declared input or output of an XML processor instance which is not exposed by the XML processor before the execution of the XPL program. ]

Each XML processor instance has:

A set of inputs each identified by a name. No two inputs may have the same name. The set of inputs may be empty.
A set of outputs each identified by a name. No two outputs may have the same name. The set of outputs may be empty.

For each static input or output, the XML processor defines whether the input or output is:

Mandatory. Such an input or output must be declared in an XPL program. The XPL implementation should raise a static error if a statically defined mandatory input or output is not declared for a given XML processor instance.
Optional. Such an input or output may or may not be declared in an XPL program.

In the following table "N/A" is used to designate combinations that do not exist according to the above definitions.

		Mandatory	Optional	Dynamic
		Static		Dynamic
Declared	Connected	OK	OK	OK
Declared	Not Connected	OK	OK	OK
Not Declared	Connected	N/A	N/A	N/A
Not Declared	Not Connected	Invalid	OK	N/A

Note:

The mechanism by which an XML processor exposes inputs and outputs, and whether they are mandatory or optional to the XPL implementation is outside the scope of this specification.

3.4 Behavior

The behavior of an XML processor is for the most part outside the scope of this specification, with the exception of the following aspects:

Change of control. The XPL implementation may give control to an XML processor instance, in association with exactly zero or one connected output. If the XML processor instance has at least one declared output, the XPL implementation will always give control to the XML processor in association with an output. If the XPL implementation gives control to the XML processor instance in association with an output, while in control of processing the XML processor instance must "produce" an XML Infoset associated with that output. If the XPL implementation does not associate an output, the XML processor does not generate an XML Infoset.
Generating an error. The XML processor instance, while in control of processing, may generate an error that is propagated to the XPL implementation. The XPL implementation then raises a dynamic error. Even if the XPL implementation expects a resulting XML Infoset, no such XML Infoset is provided by the XML processor. With the Exception module, the error is considered an exception and may be a recoverable dynamic error. This means that if an exception handler is available for that exception, it can be caught and processing may resume. Otherwise, the error is a non-recoverable dynamic error. With the basic profile, all errors raised by an XML processor are non-recoverable dynamic errors.

Example 1: An XML processor may generate an error if the XML Infoset read on any of its inputs does not conform to a format expected by the XML processor. Similarly, the interaction with other software may cause the XML processor to report errors.

Example 2: An XML processor may generate an error if the interaction with other software reports an error.
Reading inputs. The XML processor instance, while in control of processing, may read one or more of its connected inputs. "Reading an input" means that the XML processor instance asks the XPL implementation to produce the XML Infoset associated with that particular connected input.
Reading outputs. "Reading an output" means that the XPL implementation gives control to an XML processor instance in association with an output. When in control, the XML processor instance must produce the XML Infoset associated with that particular output. If the XML implementation reads a dynamic output, the XML processor instance may generate a dynamic error if it does not support that output.

While in control of processing, an XML processor instance may perform other tasks, like interacting with other software. Such tasks are outside the scope of the present specification.

XPL does not specify the format or API used by the XPL implementation to provide XML Infosets to an XML processor, or how an XML processor returns XML Infosets to the XPL implementation.

4 XPL Program

4.1 Structure

An XPL program consists of:

Input and output parameters. Each input or output has a name, and may either provide the XPL program with an XML Infoset (case of an input), or produces an XML Infoset (case of an output). The XPL program may have no input or output.
A sequence of statements. Statements are discussed in 4.3 Statements and sequences of statements in 4.4 Sequence of Statements.

Note:

Some use cases do not require that an XPL program have any inputs or outputs. It is important to note that information, whether represented as XML Infosets or not, is not necessarily exchanged with XPL programs through inputs and outputs, but possibly through other means. For example, an XML processor can access information by connecting to a relational database and return an XML Infoset to the XPL program.

4.2 XML Infoset Identifiers

[Definition: An XML Infoset identifier is an identifier that refers to a particular XML Infoset. Within the execution of a 4.4 Sequence of Statements, an XML Infoset identifier may be used multiple times. In that case it must always refer to the exact same XML Infoset. See 6.3 Output Invariance. ]

In XPL, XML Infoset identifiers are exposed by statements using the infoset attribute.

4.3 Statements

An XPL program statement is an element with the following characteristics:

Scoped XML Infoset Identifiers. A set of XML Infoset identifiers in scope at the point where the statement occurs in the XPL program. The set may be empty.
Exposed XML Infoset Identifiers. A set of XML Infoset identifiers exposed by the statement. There cannot be an intersection between the scoped XML Infoset identifiers and the exposed XML Infoset identifiers. For example, a statement can attempt to expose, with an infoset attribute on a p:output element, an XML Infoset identifier already present in the set of scoped XML Infoset identifiers. Such a condition must raise a static error. This is referred to as the no-collision rule.

The set of XML Infoset identifiers in scope for a statement of an XPL program, unless specified otherwise, consists of the union of the identifiers in scope for the previous statement and the identifiers exposed by the previous statement in document order.

If there is no such previous statement, the set of XML Infoset identifiers in scope for a statement of an XPL program, unless specified otherwise, consists of the set of identifiers specified by the infoset attributes of the XPL program inputs. In other words, this condition applies to the first statement of an XPL program.

In this specification, a statement consists of either a p:processor, p:choose or p:for-each element. A statement may contain nested statements. For example, p:choose may contain one or more p:processor elements.

4.4 Sequence of Statements

[Definition: A Sequence of Statements is an ordered collection of zero or more statements directly contained under a parent element such as p:pipeline or p:for-each. All the statements are elements sharing the same parent element. ]

Note:

The term sequence is here used with the meaning defined in [XPath 2.0 Working Draft], as an ordered collection of zero or more items (here, of elements). It does not imply that statements will be executed in the order they appear.

5 Syntax

5.1 Introduction

The XPL syntax is organized into modules. The Pipeline module and the Processor module constitute the basic framework of XPL. Other modules build on top that framework to provide enhanced functionality. Modules are grouped into profiles, see 11 Conformance.

5.2 Pipeline Module

5.2.1 The `p:pipeline` Element

<p:pipeline
    version = "1.0" >
    <!-- Content: (p:input*, p:output*, (p:processor | p:choose | p:for-each)*) -->
</p:pipeline>

An XPL program always starts with a p:pipeline element. The p:pipeline element must be the root element of the XML document containing the XPL program.

The p:pipeline element has a mandatory version attribute. The value of the version attribute must be a valid instance of the type xs:decimal as defined in [XML Schema].

For this version of XPL, the value of the version attribute must be 1.0.

5.2.2 The `p:input` Element

<p:input
    name = ncname
    infoset? = ncname
    schema-uri? = uri-reference
    schema-href? = uri-reference />

The p:input element defines exactly one XPL program input. Zero, one or more p:input or p:output elements can be children elements of the p:pipeline element. They must occur before any other element in the XPL program.

The name attribute is mandatory. It identifies the input to external users of the XPL program. There cannot be two inputs with the same name, but it is possible to have an input and an output with the same name. It is a static error if two inputs have the same name.

The infoset attribute is optional. It is an XML Infoset identifier that identifies the XML Infoset associated with the input for use within the XPL program. In particular, the XML Infoset may be referenced with the infosetref attribute on p:input elements within p:processor elements.

If the infoset attribute is missing, the XPL program cannot read the input. It may make sense to declare an XPL program input without reading it so that external users of the XPL program are aware that the input may be used in a future version of the program.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset associated with the input as per 6.10 Schema References.

5.2.3 The `p:output` Element

<p:output
    name = ncname
    infosetref = infoset-reference
    schema-uri? = uri-reference
    schema-href? = uri-reference />

The p:output element within a p:pipeline element defines exactly one XPL program output. Zero, one or more p:input or p:output elements can be children elements of the p:pipeline element. They must occur before any other element in the XPL program.

The name attribute is mandatory. It identifies the output to external users of the XPL program. There cannot be two outputs with the same name, but it is possible to have an output and an input with the same name. It is a static error if two outputs have the same name.

The infosetref attribute is mandatory. It identifies the XML Infoset that must be produced by the XPL program in association with the output.

The Infoset reference contained by the infosetref attribute may refer to any XML Infoset identifier in scope after the last statement of the XPL program. See 8 Infoset Reference for the detailed syntax.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset associated with the output as per 6.10 Schema References.

Note:

The p:input and p:output elements can also be used within a p:processor element. In that context, they support different attributes and content.

5.3 Processor Module

5.3.1 The `p:processor` Element

<p:processor
    name = qname >
    <!-- Content: (p:input* | p:output*) -->
</p:processor>

The p:processor element declares a statement that consists of a single XML processor instance.

The set of XML Infoset identifiers exposed by a p:processor statement consists of the set of identifiers declared with the infoset attribute on the nested p:output elements, if any.

The name attribute is mandatory. It is of type QName. The QName identifies a particular XML processor implementation. The prefix of the given QName must be in scope on the p:processor element.

More than one p:processor element with the same name attribute may be used in an XPL program. This translates into using several instances of the XML processor.

Note:

The presence of a p:processor statement in an XML program does not guarantee that the processor is executed. 6 Processing Model details the conditions under which a processor is executed.

5.3.2 The `p:input` Element

<p:input
    name = ncname
    infosetref? = infoset-reference
    schema-uri? = uri-reference
    schema-href? = uri-reference >
    <!-- Content: (embedded-infoset)? -->
</p:input>

The p:input element connects an XML processor input identified by the mandatory name attribute.

If the infosetref attribute is missing, there must be an embedded Infoset as a child of the p:input element. In that case, there must be exactly one child element of the p:input element. If no element is present or more than one element is present, a static error must be raised. If the infosetref attribute is present and the p:input element has one or more children, a static error must be raised.

The Infoset reference contained by the infosetref attribute may refer to any XML Infoset identifier in scope for the parent p:processor. See 8 Infoset Reference for the detailed syntax.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset associated with the input as per 6.10 Schema References.

The embedded XML Infoset, if any, is constructed from the single element under the p:input element as described in 7 Infoset Extraction.

5.3.3 The `p:output` Element

<p:output
    name = ncname
    infoset = ncname
    schema-uri? = uri-reference
    schema-href? = uri-reference >
</p:output>

The p:output element within a p:processor element connects an XML processor output identified by the mandatory name attribute.

The mandatory infoset element assigns an XML Infoset identifier with the particular output.

The identifier must follow the no-collision rule. This means that the identifier must not be present in the set of XML Infoset identifiers in scope for the parent p:processor statement. The XPL implementation must raise a static error if a collision is detected.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset associated with the output as per 6.10 Schema References.

Note:

The p:input and p:output elements can also be used within a p:processor element. In that context, they support different attributes and content.

5.4 Choose Module

5.4.1 The `p:choose` Element

<p:choose
    infosetref = infoset-reference >
    <!-- Content: (p:output*, p:when+, p:otherwise?) -->
</p:choose>

The p:choose element declares a statement used to execute different sequences of statements depending on conditions evaluated during the execution of the XPL program.

The content of a nested p:when or p:otherwise element is called a branch. As detailed in the processing model section, a branch may or may not be executed, and one branch of a p:choose element at most is executed.

Conditions are expressed by children p:when elements using XPath expressions. They are applied to an XML Infoset determined by the mandatory infosetref attribute of the p:choose element.

The set of XML Infoset identifiers exposed by a p:choose statement is defined by the nested p:output elements if any. If no p:output element is present, no XML Infoset identifier is exposed by the p:choose statement.

5.4.2 The `p:output` Element

<p:output
    infoset = ncname
    infosetref = infoset-reference
    schema-uri? = uri-reference
    schema-href? = uri-reference />

The p:output element defines what XML Infoset identifers are exposed by the parent p:choose element. Each p:output element must have an infoset attribute which defines an XML Infoset identifier to expose.

The mandatory infosetref attribute determines the XML Infoset exposed. The Infoset reference may refer to XML Infoset identifiers scoped on the parent p:choose element as well as to XML Infoset identifiers scoped in the branches of the p:choose statement.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset produced by the infosetref attribute as per 6.10 Schema References.

5.4.3 The `p:when` Element

<p:when
    test = expression >
    <!-- Content: (p:processor | p:choose | p:for-each)* -->
</p:when>

A p:when element is always a child of a p:choose element. A sequence of multiple p:when elements can be present under a single p:choose element. There must be at least one p:when element.

The mandatory test attribute contains an XPath expression. The result of the expression must be castable to a boolean result. It is applied on the XML Infoset provided to the parent p:choose element.

The p:when element may support [XPath 1.0] or [XPath 2.0 Working Draft]. The details are specified in 11 Conformance.

5.4.4 The `p:otherwise` Element

<p:otherwise>
    <!-- Content: (p:processor | p:choose | p:for-each)* -->
</p:otherwise>

A p:otherwise element is always a child of a p:choose element. All p:when sibling elements must precede a p:otherwise element. There must be exactly zero or one p:otherwise element child of a p:choose element.

5.4.5 Branches

Each p:when or p:otherwise branch may contain a sequence of statements.

The set of scoped XML Infoset identifiers for the first statement in the branch consists of the set of scoped identifiers for the parent p:choose element.

The set of scoped XML Infoset identifiers after the last statement of a branch must contain the XML Infoset idenfiers referred by all the p:output element's infosetref attribute.

A static error must be raised if this condition is not met.

Note:

The no-collision rule applies for statements within a branch. In other words an XML Infoset identifier exposed by a statement within a branch cannot override an identifier scoped for the corresponding p:choose element.

5.5 Repeat Module

5.5.1 The `p:for-each` Element

<p:for-each
    infosetref = infoset-reference
    select = expression
    schema-href? uri-reference
    schema-uri? uri-reference >
    <!-- Content: (p:output?, (p:processor | p:choose | p:for-each)*) -->
</p:for-each>

The p:for-each element declares a statement used to execute sequences of statements multiple times within the execution of a same XPL program.

The set of exposed XML Infoset identifiers consists of the single identifier defined by the optional embedded p:output element's infoset attribute. If there is no embedded p:output attribute, the set is empty.

The p:for-each element contains a sequence of statements. If an embedded p:output element is present, the sequence must contain at least one statement. It is a static error if this is not the case.

The set of scoped XML Infoset identifiers before the first statement in the embedded sequence of statements consists of the identifiers scoped for the p:for-each element.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset associated with the infosetref attribute as per 6.10 Schema References.

Note:

The no-collision rule applies for statements within a p:for-each element. In other words an XML Infoset identifier exposed within a p:for-each cannot override an identifier scoped for corresponding p:for-each element.

5.5.2 The `p:output` Element

<p:output
    infosetref = infoset-reference
    infoset = ncname
    schema-uri? = uri-reference
    schema-href? = uri-reference >
</p:output>

The p:output element within a p:for-each element exposes an XML Infoset identifier to statements appearing after the current p:for-each statement.

The mandatory infoset element exposes an XML Infoset identifier. The identifier must follow the no-collision rule. This means that the identifier must not be present in the set of scoped XML Infoset identifiers for the parent p:for-each statement. The XPL implementation must raise a static error if a collision is detected.

The mandatory infosetref attribute determines the XML Infoset exposed. The Infoset reference may refer to XML Infoset identifiers scoped on the parent p:for-each element as well as to XML Infoset identifiers scoped after the last statement of the sequence of statements embedded within the p:for-each statement.

For the second category of XML Infoset identifiers, each identifier used by the expression in the infosetref attribute refers to a sequence of XML Infosets rather than a single XML Infoset as is usually the case. See 6.9 p:for-each Execution.

The optional schema-href and schema-uri are documented in 5.6 Schema References. They validate the XML Infoset produced by the infosetref attribute.

5.6 Schema References

Those attributes are associated with a particular XML Infoset, depending on the element they are associated with.

If the schema-href attribute is present, it must contain a URL referring to either an W3C XML Schema schema, or a Relax NG schema.

If the schema-uri attribute is present, it must contain a URI identifying either an W3C XML Schema schema, or a Relax NG schema. For W3C Schemas, it should specify the namespace URI that is the target namespace of the schema. The way the mapping between the URI and the actual schema is done is outside the scope of XPL.

If both attributes are present, a static error must be raised.

The use of schema-href allows easily using schemas bundled with an XPL program. The use of schema-uri allows the use of URIs, commonly used with XML Schema, which provide a level of abstraction hiding the actual storage location of actual schema files.

6 Processing Model

6.1 Introduction

The XPL processing model is based on declarative and lazy evaluation models. This means that, unlike most imperative programming languages, processing order is determined by first considering the results to be produced, and then walking down a chain of dependencies to determine what statements must be executed. The benefit of lazy evaluation and declarative programming is that they have more potential for allowing implementations with different levels of optimization.

6.2 Input Invariance

A processor, when given control, may read one or more of its inputs. The XPL implementation must make sure that, if a processor reads the same input several times over a single execution of a sequence of statements, the XML Infoset read is the same.

6.3 Output Invariance

Multiple statements may consume XML Infosets in an XPL program by referencing the same XML Infoset identifier. These consumers may request the XML Infoset at different times during program execution. The XPL implementation must make sure that the XML Infoset read by those different processors is the same for a given execution of a sequence of statements.

6.4 Connections

A processor input with name n is said to be connected when:

The corresponding p:processor element contains an embedded p:input element with an attribute name with value n.
The p:input element contains an embedded XML Infoset, or has a valid infosetref attribute. See 8 Infoset Reference.

A processor output with name n is said to be connected when:

The corresponding p:processor element contains an embedded p:output element with an attribute name with value n.
The p:output element exposes an XML Infoset identifier i.
There exists an expression in the XPL program referring to i.

6.5 XML Processor Execution

The execution of a processor always starts with an initialization phase. During the initialization phase, the XML processor:

Is given control by the XPL implementation, through a mechanism outside the scope of XPL.
Should reinitialize previous execution state if any.
May read one or more of its inputs, in any order and at any time.
May perform tasks unrelated to the XPL program, like interacting with other software.
The processor should return control to the XPL implementation after a finite amount of time. There may be some rare uses cases where not returning control may be desirable.

If a processor does not have any declared output, the execution of the processor terminates with the end of the initialization phase.

When a processor has at least one output, execution of the processor may resume at a later point when the XML Infoset associated with an output is requested by the XPL implementation. This is called a read phase. During such a phase, the processor:

Is given control by the XPL implementation, through a mechanism outside the scope of XPL.
May access execution state created during the initialization phase or previous read phases, if necessary.
May read one or more of its inputs, in any order and at any time.
May perform tasks unrelated to the XPL program, like interacting with other software.
The processor should return control to the XPL implementation after a finite amount of time, and return an XML Infoset associated with that output.. There may be some rare uses cases where not returning control may be desirable.

There is no guarantee that an XML processor will read one or more of its inputs. This is entirely left to the implementor of the XML processor.

There is no guarantee that a processor in an XPL program will be initialized, or will have any read phase after an initialization phase. The only guarantee of execution is that if an XML processor does not have any outputs and is declared in a sequence of statements which is executed, then it will be initialized.

6.6 Sequence of Statements Execution

The execution of a sequence of statements always starts with an initialization phase. During the initialization phase:

The sequence is initialized. The meaning of this is largely implementation dependent, but may for example mean that execution state information kept by the implementation and related to the execution of this sequence of statements is discarded, for example, discarding XML Infosets associated with some Infoset identifiers in the sequence. It may also notifying processor instances that they should reinitialize some internal state.
A list of statements in the sequence that do not have any declared output is established in document order (in the order they appear in the XPL program). Each statement in the list is then initialized in that order.
During the initialization of a statement, that statement may read one or more of its inputs. When doing so, control is passed to the XPL implementation again. The XPL implementation must then obtain the XML Infoset associated with that input, possibly by evaluating an expression. That expression may refer to one or more XML Infoset identifier. The XPL implementation must obtain those XML Infoset identifiers in turn, by executing read phases on the statements exposing those identifiers. When XML Infosets associated with all the XML Infoset identifiers used by the expression are obtained, the XML Infoset associated with the input is complete, and control is returned to the statement.

If the sequence of statements exposes XML Infoset identifiers, XML Infosets associated with one or more of those identifiers may be requested by the XPL implementation. When such an identifier i is requested, a read phase is initiated on the sequence of statements:

The statement exposing the identifier i is initialized if and only if not already done during the execution of the sequence of statements.
The XML Infoset associated with the XML Infoset identifier i is requested from the statement by starting a read phase. The XPL implementation may decide to store the XML Infoset to fulfill further requests associated with i, and return the stored XML Infoset when possible. How this is achieved is implementation dependent, but in all cases output invariance must be guaranteed. See 6.3 Output Invariance.
Like during initialization, the statement may read one or more of its inputs. The same processing model applies.
The XML Infoset associated with the XML Infoset identifier i is forwarded to the requestor.

6.7 XPL Program Execution

The execution of an XPL program consists in executing the sequence of statements contained under the p:pipeline element.

After the initialization phase, the XPL implementation may request XML Infosets associated with XPL program outputs. To achieve this, infosetref attribute associated with the XPL program output to read is evaluated. This may cause requesting one or more XML Infosets exposed by the sequence of statements, therefore executing one or more read phases on the sequence of statements.

6.8 `p:choose` Execution

This section is relevant only if the Choose Module is implemented.

The execution of a p:choose statement consists in:

Obtaining the XML Infoset associated with the p:choose element's infosetref attribute. The expression is evaluated, and may cause requesting one or more XML Infosets in scope.
For each p:when element contained in the p:choose element, evaluate the expression contained in the test attribute on the XML Infoset obtained previously. Evaluation must be performed on p:when elements in document order. The result of the expression is converted to an xs:boolean type. If this is not possible, a dynamic error is generated. If the result of the boolean expression is true(), the branch is selected. Evaluation of further p:when elements must not be performed.
If the branches do not export any XML Infoset identifier, the selected branch is considered a sequence of statements and is simply initialized.
If the branches export at least one XML Infoset identifier, the selected branch is considered a sequence of statements and is executed like a regular sequence of statements.

Note:

Because of 6.3 Output Invariance, when a branch is selected during a particular execution, it remains selected for the rest of the execution. In particular, if a branch exports several XML Infoset identifiers, and the associated XML Infosets are requested, they are always requested from the same branch.

6.9 `p:for-each` Execution

This section is relevant only if the Repeat Module is implemented.

The execution of a p:for-each statement consists in:

Obtaining the XML Infoset associated with the p:for-each element's infosetref attribute. The expression is evaluated, and may cause requesting one or more scoped XML Infosets.
Evaluating the expression on the p:for-each element's select attribute. This expression must return a sequence of elements called the iteration sequence. The iteration sequence may be empty. If it does not consist of a sequence of elements, a dynamic error must be raised. The iteration sequence may be calculated lazily, and the dynamic error may be raised only when the first item in the sequence that is not an element is found, if any.
If there is no nested p:output element, the embedded sequence of statements is executed once for each element in the iteration sequence. Because there is no read phase, this means that the sequence of statements is initialized once for each element in the iteration sequence. The function current() of 8 Infoset Reference is assigned a new XML Infoset built from the current element of the iteration sequence. See 7 Infoset Extraction.
If there is a nested p:output element, the expression on the infosetref attribute of the embedded p:output element, called the output expression, must be evaluated. It may refer to XML Infoset identifiers of two types:
- XML Infoset identifiers in scope for the p:for-each statement. Those are called external XML Infoset identifiers.
- XML Infoset identifiers exposed by the sequence of statements contained within the p:for-each statement. Those are called local XML Infoset identifiers.
A local XML Infoset identifier must be interpreted as sequences of XML Infosets. Each execution of the sequence of statements produces the next value in the sequence. After initializing the embedded sequence of statements, a read phase is performed for each such identifier, in the order in which they appear in the output expression.

External XML Infoset identifiers refer to sequences of XML Infoset identifiers that contain only one element.

When the embedded sequence of statements has been executed a number of times equal to the number of elements in the iteration sequence, the output expression must have produced a complete XML Infoset. That XML Infoset is associated with the identifier on the infoset attribute of the p:output element.
If there is a p:output element within the p:for-each element but the iteration sequence is empty, the embedded sequence of statements is never executed. The output expression is evaluated with each local XML Infoset identifier replaced with an empty sequence.

6.10 Schema References

Several elements support the optional schema-uri and schema-href attributes. The purpose of these attributes is to refer to a schema defined outside of the XPL program, and to use that schema to validate a particular XML Infoset. See 5.6 Schema References for information about the syntax. If none of those attributes is present, the processing model is not influenced by this section.

The application must load the schema referred to by one of the two attributes if present. It is allowed to load the schema before execution, but at the latest the schema must be loaded as soon as the first parts of the XML Infoset associated with the schema is available.

It is a dynamic error if the schema cannot be loaded during execution. Optionally, if the XPL implementation loads the schema before execution, it may raise a static error instead.

If the XML Infoset associated with the schema read on the input does not validate against the schema, a dynamic error must be raised.

Note:

The XPL implementation is allowed to start providing parts of the XML Infoset to the consumer of the XML Infoset while validation is being performed. The constraint is that the XML Infoset must be valid according to the schema up to that point, and that when the XML processor has received the entire XML Infoset, it is guaranteed to have received an XML Infoset valid according to the provided schema. Such streamed validation may not be possible with all schema languages.

7 Infoset Extraction

XPL defines how an XML element el appearing anywhere in an XML Infoset (not necessarily as a document element) can be extracted to create a new XML Infoset with that element as document element.

A new empty XML Infoset is created. The document information item created contains:

A document element which contains the same information as el except that: it does not have any parent element; it must contain new namespace attributes information items for all the namespaces originally in scope on the element but not declared on the element.
No processing instructions, comments, and document type declaration information.

8 Infoset Reference

Some attributes in XPL contain expressions called infoset references. Such expressions refer to or construct an XML Infoset and provide a way to:

Reference external XML documents
Reference XML Infosets provided by processor outputs and pipeline inputs
Aggregate documents using the aggregate() function
Select part of a document using XPointer

The complete syntax of those expressions is described below in a Backus Nauer Form (BNF)-like syntax:

infosetref        ::= ( local_reference | uri_reference | aggregation | current ) [ xpointer ]
local_reference   ::= "#" infosetid
aggregation       ::= "aggregate(" qname "," agg_parameter_1 [", " agg_parameter_2 ...] ")"
current           ::= "current()"
agg_parameter     ::= infosetref [ "," agg_parameter ]
xpointer          ::= "#xpointer(" xpath_expression ")"

8.1 Local Reference

A local reference starts with the #sign and is followed by an XML Infoset identifier. It refers to either a single XML Infoset, or a sequence of XML Infosets. Whenever a local reference is used, it must refer to a scoped XML Infoset identifier.

A local reference evaluates to the XML Infoset or XML Infoset sequence identified by the XML Infoset identifier.

Note:

The only instance where a local reference evaluates to an XML Infoset sequence instead of just one XML Infoset is within the p:output element contained in a p:for-each statement.

8.2 URI Reference

This definition is borrowed from [XSLT 2.0 Working Draft]:

[Definition: Within this specification, the term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI data type as defined in [XML Schema]. ]

Note that this is a wider definition than the definition in [RFC2396], for example it does not require non-ASCII characters to be escaped.

URI references used to identify external resources must conform to the same rules as the locator attribute (href) defined in section 5.4 of [XLink]. If the URI reference is relative, then it is resolved (unless otherwise specified) against the base URI of the containing element node, according to the rules of [RFC2396], after first escaping all characters that need to be escaped to make it a valid RFC2396 URI reference.

8.3 Aggregation

The aggregate() function returns a new XML Infoset. In terms of [XQuery 1.0 and XPath 2.0 Data Model], it can be defined as follows:

fn:aggregate($root as xs:QName, $arg1 as node()*, ...) as document-node()

The resulting XML Infoset's document information item contains:

A document element with the QName specified as the first argument of the aggregate() function.
No processing instructions, comments, and document type declaration information.

The aggregate() function can take, in additoin to the first element specifying the QName, one or more additonal arguments. Each of those arguments is of type node()*, in other words a sequence of nodes as defined by [XPath 2.0 Working Draft], with the exception that attribute nodes and namespace nodes are not allowed. The types returned are consistent with the acquired Infoset defined in [XInclude]. The Infoset associated with each node in the sequence is added as child information items to the document element information item.

8.4 Current XML Infoset

This section is relevant only if the Repeat Module is implemented.

The current() function evaluates to the current element of the iteration sequence within a p:for-each statement. In terms of [XQuery 1.0 and XPath 2.0 Data Model], it can be defined as follows:

fn:current() as document-node()

8.5 XPointer

XPointer expressions must support [XPointer Framework] and [XPointer xmlns() Scheme].

They must also support a subset of [XPointer xpointer() Scheme] working with standard XPath 1.0, or an extension of that subset working with XPath 2.0. This means that there is no requirement for XPL to support the XPath extensions proposed by [XPointer xpointer() Scheme].

9 Standard XML Processor Library

9.1 Introduction

XPL defines a set of standard XML processors, called the XPL standard XML processor library, that must be implemented by any conformant implementation. All the XML processors are identified with a QName with URI http://www.orbeon.com/oxf/xpl/standard. The recommended namespace prefix for that QName is xpl.

9.2 Identity Processor

9.2.1 Inputs and Outputs

<p:processor name="xpl:identity">
    <p:input name="data">...</p:input>
    <p:output name="data">...</p:output>
</p:processor>

The Identity processor is identified by a QName with URI http://www.orbeon.com/oxf/xpl/standard and local name identity. It statically exposes an input called data and an output called data as well.

9.2.2 Behavior

When a read phase is performed on the data output, the Identity processor must return the XML Infoset available on the data input. The Identity processor should obtain the XML Infoset during the read phase rather than during the initialization phase.

9.3 Pipeline Processor

9.3.1 Rationale

The Pipeline processor provides support for sub-pipelines in XPL. It allows XPL programs to be used and manipulated like XML processors. It allows building the equivalent of functions and procedures in other programming languages.

9.3.2 Inputs and Outputs

<p:processor name="xpl:pipeline">
    <p:input name="pipeline">...</p:input>
</p:processor>

The Pipeline processor is identified by a QName with URI http://www.orbeon.com/oxf/xpl/standard and local name pipeline. It statically exposes an input called pipeline. It may also dynamically expose other inputs and outputs as defined below.

9.3.3 Behavior

When initialized, the Pipeline processor instance must read its pipeline input. The pipeline input must contain an XPL program p conforming to this specification. If p doesn't conform to this specification, the Pipeline processor must raise a dynamic error.

p must not have an input called pipeline. If it does, the Pipeline processor must raise a dynamic error.

The Pipeline processor must connect the inputs of p to its own inputs with the same names, as follows. For each input i of p with name n:

If the Pipeline processor instance has a connected input j with name n, connect input i to j.
If the Pipeline processor instance does not have such a connected input, do nothing.

Connecting two inputs here means that when the Pipeline processor instance executes p, and p requests the XML Infoset associated with its input i, the Pipeline processor instance must obtain the XML Infoset provided to it on its input j and forward it to p unmodified.

If the Pipeline processor instance executes p and p requests an XML Infoset associated with an input i with name n and the Pipeline processor instance does not have an input j with name n, the Pipeline processor must throw a dynamic error.

The Pipeline processor must connect its own outputs to p's outputs with the same name, as follows. For each of its connected output o with name m:

If p has an output q with name m, connect output o to q.
If p does not have such an output, raise a dynamic error.

Connecting two outputs here means that when the Pipeline processor instance is requesting the XML Infoset associated with its output o with name m, the Pipeline processor instance must obtain the XML Infoset provided by p on its output q with name m.

When a read phase is started by the XPL implementation on one of the Pipeline processor output o with name m, if it has any, the Pipeline processor must execute a read phase on the XPL program's output q with name m.

Note:

An XPL program interpreted by a Pipeline processor instance must behave according to this specification. In particular, such an XPL program does not have privileged access to XML Infoset identifiers in scope for the Pipeline processor. As is the case for any XPL program, the set of XML Identifiers in scope at the beginning of the program is defined by the pipeline inputs only.

Note:

The Pipeline processor provides support for creating XPL program dynamically. For example, an XSLT processor could generate an XML Infoset that is a valid XPL program, and send it to the pipeline input of an instance of Pipeline processor.

9.4 Null Serializer

9.4.1 Inputs and Outputs

<p:processor name="xpl:null-serializer">
    <p:input name="data">...</p:input>
</p:processor>

The Null Serializer processor is identified by a QName with URI http://www.orbeon.com/oxf/xpl/standard and local name null-serializer. It statically exposes an input called data.

9.4.2 Behavior

When initialized, the Null Serializer processor reads the XML Infoset associated with its data input. It then discards it without further processing.

Note:

Here "discarding" means that the XML Infoset is no longer required by the XPL implementation. XPL implementations may however use the XML Infoset read for debugging or logging purposes, for example.

10 Inclusions

XPL does not have any particular construct for inclusions or imports. However, a compliant XPL implementation may support a subset of [XInclude]. If it does, it should support at a minimum the xi:include element with the parse attribute either missing or set to xml, and without any other attributes. It may support more of [XInclude]. XPL does not specify what schemes are supported by the href attribute.

If such support is present, xi:include elements may appear at any point in an XPL program allowed by the XInclude 1.0 specification. The resulting XML Infoset may have xml:base attributes resulting from the inclusion.

Whether the XPL implementation itself supports [XInclude] or not, it must allow xml:base attributes on any of its elements, and resolve relative URLs according to [XML Base].

11 Conformance

A conformant XPL implementation must at least implement the Basic XPL Profile described below. Implementors are however encouraged to implement the optional modules. A conformant implementation must document what modules it implements.

11.1 Basic Profile

The Basic XPL profile requires implementing all of this specification except the optional Choose, Repeat and Exception modules. The Pipeline and Processor modules must be implemented.

11.2 Full Profile

The full XPL profile requires implementing all of this specification including the Pipeline, Processor, Choose, Repeat and Exception modules.

11.3 XPath Support

In XPL, XPath is used in the following places:

Within XPointer's xpointer() scheme.
In the test attribute of the p:when element, if the Choose module is implemented.
In the select attribute of the p:for-each element, if the Repeat module is implemented.

XPL constructs using XPath may support the following version of XPath:

XPath 1.0. In this case, [XPath 1.0] must be supported.
XPath 2.0. In this case, [XPath 2.0 Working Draft] must be supported. In this case XPL is the host language. It should not provide any in-scope schema definitions in the static context; it should provide the types defined by XML Schema as in-scope type definitions in the static context; it should not provide any in-scope variable.

An XPL implementation must document the version of XPath used. It may optionally allow users to configure which XPath version must be used when executing a pipeline. XPL itself does not have any provisions to configure, whether on a XPL program basis, or on a finer granularity basis, what version of XPath is used.

An [XPath 2.0 Working Draft] expression must raise a dynamic error if encountered by an XPL implementation supporting only [XPath 1.0]. Optionally, the XPL implementation may raise a static error if it is able to detect the error before execution.

12 Future Improvements

This section lists improvements that should be added to a future version of this specification.

12.1 Exception Module

The Exception module will support exception handling, based on a model found in other programming languages such as Java or WS-BPEL. Exception handling provides a good model to handle dynamic errors.

12.2 Repeat Module `p:while` Element

The p:while statement will provide a repetition statement based on the idea of a repetition until a certain condition is reached.

A References

A.1 Normative References

XML Infoset: Richard Tobin and John Cowan, editors. XML Information Set. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xml-infoset/.)
XML 1.0: Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 (Second Edition). World Wide Web Consortium, 2000. (See http://www.w3.org/TR/2000/REC-xml-20001006.)
XML 1.1: Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, John Cowan, editors. Extensible Markup Language (XML) 1.1. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/xml11/.)
XML Base: Jonathan Marsh, editor. XML Base. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xmlbase/.)
XML Namespaces 1.0: Tim Bray, Dave Hollander, Andrew Layman, editors. Namespaces in XML. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XML Namespaces 1.1: Tim Bray, Dave Hollander, Andrew Layman, Richard Tobin, editors. XML Linking Language (XLink) Version 1.0. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/2004/REC-xml-names11-20040204/.)
XML Schema: Henry S. Thompson, David Beech, Murray Maloney, et al. editors. XML Schema Part 1: Structures. World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xmlschema-1/.)
XLink: Steve DeRose, Eve Maler, David Orchard, editors. XML Linking Language (XLink) Version 1.0. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xlink/.)
XPointer Framework: Paul Grosso, Eve Maler, Jonathan Marsh, Norman Walsh, editors. XPointer Framework. World Wide Web Consortium, 2003. (See http://www.w3.org/TR/xptr-framework/.)
XPointer xpointer() Scheme: Steven DeRose, Eve Maler, Ron Daniel Jr., editors. XPointer xpointer() Scheme. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/2002/WD-xptr-xpointer-20021219/.)
XPointer xmlns() Scheme: Steven J. DeRose, Ron Daniel Jr., Eve Maler, Jonathan Marsh, editors. XPointer xmlns() Scheme. World Wide Web Consortium, 2003. (See http://www.w3.org/TR/xptr-xmlns/.)
XPath 1.0: James Clark, Steve DeRose, editors. XML Path Language (XPath) 1.0. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xpath.html.)
XPath 2.0 Working Draft: Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernández, Michael Kay, Jonathan Robie, Jérôme Siméon, editors. XML Path Language (XPath) 2.0 Working Draft. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/2004/WD-xpath20-20041029/.)
RELAX NG: James Clark, editor. OASIS RELAX NG Technical Committee. OASIS. 2001. (See http://www.oasis-open.org/committees/relax-ng/.)
RFC 2119: S. Bradner, editor. Key words for use in RFCs to Indicate Requirement Levels. IETF (Internet Engineering Task Force), March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
XInclude: Jonathan Marsh and David Orchard, editors. XML Inclusions (XInclude) Version 1.0. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xinclude/.)

A.2 Other References

RFC2396: T. Berners-Lee, R. Fielding, L. Masinter, editors. Uniform Resource Identifiers (URI): Generic Syntax. IETF RFC 2396. (See http://www.ietf.org/rfc/rfc2396.txt.)
XQuery: Don Chamberlin, James Clark, Daniela Florescu, et al., editors. XQuery 1.0: An XML Query Language. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xquery/.)
XSLT 1.0: James Clark, editor. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xslt.)
XQuery 1.0 and XPath 2.0 Data Model: Mary Fernández, Ashok Malhotra, Jonathan Marsh, Marton Nagy, Norman Walsh, editors. XQuery 1.0 and XPath 2.0 Data Model. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/xpath-datamodel/.)
XSLT 2.0 Working Draft: Michael Kay, editor. XSL Transformations (XSLT) Version 2.0 Working Draft. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/2004/WD-xslt20-20041105/.)
XML Pipeline Definition Language: Norman Walsh, Eve Maler editors. XML Pipeline Definition Language Version 1.0. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/2002/NOTE-xml-pipeline-20020228/.)
XML Processing Model: Dmitry Lenkov, Norman Walsh editors. XML Processing Model Requirements. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/2004/NOTE-proc-model-req-20040405/NOTE-proc-model-req-20040405.xml.)

B Relationship with Other Specifications (Non-Normative)

This section discusses the relationship of XPL with other relevant specifications. It is not normative, but is there to acknowledge that those specifications were taken into account and to highlight similarities and differences.

B.1 February 2002 XML Pipeline Definition Language W3C Note

Work on XPL started in 2002 independently from [XML Pipeline Definition Language], a W3C Note published in February 2002. The similarities between XPL and the W3C Note may appear startling. This is easily explained by the fact that both initiatives aimed at solving a similar problem. The similarities however remain mostly at the surface. XPL has a different processing model, and proposes constructs different from those of the W3C Note.

XPL does:

Define an XML pipeline language
Describe the equivalent of "processes"

The basics of XPL are arguably simpler than the W3C Note:

XPL does not follow a build system approach with concepts such as "targets" being "up to date". Rather, an XPL pipeline is executed, and may return zero, one or more XML Infosets. This approach is closer from the approach followed by most programming languages. Further, it is thought that the concept of target is not necessary: choosing two different targets can be reduced to choosing two different pipelines.
XPL remains declarative, in that the products of the execution of a pipeline determines the processing order.
XPL does not specify how "processes" ("processors" in the XPL terminology) are defined (W3C Note's process definitions). Such definitions are outside the scope of XPL and left to language implementors. The only assumption is that processors are declared and that they expose an XML qualified name (QName) that identifies them.
XPL processors only support inputs and outputs that are XML Infosets. There is no concept of additional parameters (W3C Note p:param) to pass to processors. The rationale for this decision is that it is always possible to pass such parameters in the form of an XML Infoset.
XPL does not limit the types of Infosets produced by processors.
XPL pipelines either entirely succeed or entirely fail (basic profile). When exception handling is added (Exception module), exceptions are handled accordingly.
There is no p:document element. Instead, a particular processor, such as the identity processor, can be used for the same purpose.

B.2 April 2004 XML Processing Model W3C Note

[XML Processing Model], a W3C Note published in April 2004, sets forth a number of requirements for an XML processing model. XPL hopes to answer all those requirements:

The language must be rich enough to address practical interoperability concerns. To achieve this goal, the language cannot be simplistic and needs to implement a number of basic features. The XPL specification defines a basic profile, and a number of modules that can be optionally implemented. XPL fulfills this requirement.
The language should be as small and simple as possible. XPL supports a fairly small set of features and the basic concepts remain simple. Whenever possible, existing specifications are leveraged (XPath, XPointer, for example). In addition XPL, like XSLT before it, may be written by hand, so the language tries to be reasonably concise. The notion of modules allows implementors to start small and then add more advanced features. XPL fulfills this requirement.
The language must allow the inputs, outputs, and other parameters of a components to be specified. In this specification, the language only uses XML Infosets to pass inputs and parameters to as component. It is thought that this makes XPL simpler. Our interpretation is that this is compatible with the requirement. XPL fulfills this requirement.
The language must define the basic minimal set of mandatory input processing options and associated error reporting options required to achieve interoperability. XPL supports static and dynamic errors. XPL fulfills this requirement.
Given a set of components and a set of documents, the language must allow the order of processing to be specified. XPL determines an execution order. Some details of the processing model are currently in the list of open issues, and will be addressed. XPL will fulfill this requirement.
It should be relatively easy to implement a conformant implementation of the language, but it should also be possible to build a sophisticated implementation that can perform parallel operations, lazy or greedy processing, and other optimizations. XPL fulfills this requirement.
The model should be extensible enough so that applications can define new processes and make them a component in a pipeline. XPL fulfills this requirement.
The model must provide mechanisms for addressing error handling and fallback behaviors. The Exception module will be designed for this purpose. XPL will fulfill this requirement.
The model could allow conditional processing so that different components are selected depending on run-time evaluation. XPL fulfills this requirement.
The model should not prohibit the existence of streaming pipelines. XPL fulfills this requirement.
The model should allow multiple inputs and multiple outputs for a component. XPL fulfills this requirement.
The model should allow any data set conforming to one of the W3C standards, such as XML 1.1, XSLT 1.0, XML Query 1.0, etc., to be specified as an input or output of a component. How this should be interpreted is not clear. Limiting inputs and outputs to XML Infosets makes the language simpler, while still not prohibiting passing non-XML Infoset data by encapsulating it within an XML Infoset, be it a simple root element containing character data. The interpretation is that XPL fulfills this requirement.
Information should be passed between components in a standard way, for example, as one of the data sets conforming to an industry standard. This is not a clear requirement. Each component defines the Infosets it produces and generates, and it should not be up to the pipeline language to define what they are. However, XPL supports inline validation able to enforce such constraints. The interpretation is that XPL fulfills this requirement.
The language should be expressed in XML. XPL fulfills this requirement.
The pipeline language should be declarative, not based on APIs. XPL fulfills this requirement.
The model should be neutral with respect to implementation language. XPL fulfills this requirement.

It should be noted that XPL provides more than the minimal requirements above, and that in addition, the use cases of the Note can all be satisfied by XPL.

C Summary of Issues (Non-Normative)

C.1 Open Issues

This appendix identifies open issues with this specification:

While the execution model for processors with outputs has been well received, this document proposes an execution model for processors without outputs that is the subject of debate, because:
- It does not appear completely in line with the lazy evaluation model.
- It can appear confusing to users, especially beginners.
- Some use cases involving such processors are difficult to implement.
- Some use cases involving the controlled order of execution of such processors are difficult to implement.
A satisfactory solution needs to be found, whether the current execution model is deemed good enough, or whether a new solution is proposed.

Current directions of thinking suggest for example mandating that all processors have at least one output, or implementing a separate syntax to express certain execution dependencies.
It is thought that exception handling is a very important optional language construct to define as soon as possible. A proposal needs to be made, experimented with, and then needs to make it into a stable version of XPL.
It is thought that a "while" statement is an important optional language construct to define. A proposal needs to be made, experimented with, and then needs to make it into a stable version of XPL.
The reuse of p:input and p:output in multiple places may appear confusing. Pipeline inputs and outputs could use p:with-input and p:with-output. Reuse between p:processor, p:choose and p:for-each is subject to discussion. A proliferation of names and increase in syntactic overhead are probably not desirable.
XPL must have a company-agnostic namespace used for the elements of its syntax and for the Standard XML Processor Library. This can be done once the specification has found a host organization.
XPL needs a tracing facility. Current implementations allow for a debug attribute on inputs and outputs that has the semantic of logging the XML Infoset going through the associated input or output. It is found that the term "debug" is not appropriate, and that there may be a better way to construct the tracing facility. Optionally, this could be implementation-dependent and controlled by foreign attributes. The current thinking is that an attribute named trace could be standardized.
The use of # to refer to XML Infosets may not be in line with the accepted use of such because in XPL there is a notion of scoping. An alternative way of refererring to XML Infoset identifiers may have to be proposed.
Should the XPL schema be open, i.e. accept elements and attributes in foreign namespaces?
[XSLT 2.0 Working Draft] imports schemas using attributes called namespace and schema-location. Should we follow this convention? Because XSLT 2.0 is based on XML Schema, namespace refers to a schema for that namespace. Right now in XPL, it is just a convenient indirection.
Another solution to the schema reference question could be to look at using XML catalogs.
Pipeline inputs could provide a default XML Infoset using the infosetref attribute. This would implicitly define such pipeline inputs as optional static inputs. If connected from outside, the XML Infoset referred to by the infosetref attribute is ignored. If not connected, the XML Infoset referred to internally is used.
Default pipeline inputs could support a "merge" feature to merge default documents with documents actually passed to the pipeline. Such a feature would allow merging data in more complex XML processor configurations to provide finer-grained defaults.
Should the Pipeline processor be renamed the XPL processor?
The XPL program definition mentions that an XPL program is an XML document. How does this relate with XInclude inclusions?
The XPL program definition mentions versions 1.0 of XML and namespaces. Should versions 1.1 be allowed?
The specification should provide a W3C XML Schema or a Relax NG schema for XPL.
The specification should provide non-normative examples and use cases.
The specification should provide a non-normative comparison with WS-BPEL 2.0.
Should a test suite be provided? It would probably require definining more processors, like an XSLT processor, to expand the range of testable features. Testing could be provided through pipeline inputs and outputs, and/or through tracing.

XML Pipeline Language (XPL) Version 1.0 (Draft)

W3C Member Submission 11 April 2005

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 What is XPL?

1.2 Motivation

2 Concepts

2.1 Terminology

2.2 Notation

2.3 XPL Implementation

2.4 Error Handling

2.5 Qualified Names

2.6 Namespace

3 XML Processors

3.1 Definition

3.2 Instances

3.3 Inputs and Outputs

3.4 Behavior

4 XPL Program

4.1 Structure

4.2 XML Infoset Identifiers

4.3 Statements

4.4 Sequence of Statements

5 Syntax

5.1 Introduction

5.2 Pipeline Module

5.2.1 The p:pipeline Element

5.2.2 The p:input Element

5.2.3 The p:output Element

5.3 Processor Module

5.3.1 The p:processor Element

5.3.2 The p:input Element

5.3.3 The p:output Element

5.4 Choose Module

5.4.1 The p:choose Element

5.4.2 The p:output Element

5.4.3 The p:when Element

5.4.4 The p:otherwise Element

5.4.5 Branches

5.5 Repeat Module

5.5.1 The p:for-each Element

5.5.2 The p:output Element

5.6 Schema References

6 Processing Model

6.1 Introduction

6.2 Input Invariance

6.3 Output Invariance

6.4 Connections

6.5 XML Processor Execution

6.6 Sequence of Statements Execution

6.7 XPL Program Execution

6.8 p:choose Execution

6.9 p:for-each Execution

6.10 Schema References

7 Infoset Extraction

8 Infoset Reference

8.1 Local Reference

8.2 URI Reference

8.3 Aggregation

8.4 Current XML Infoset

8.5 XPointer

9 Standard XML Processor Library

9.1 Introduction

9.2 Identity Processor

9.2.1 Inputs and Outputs

9.2.2 Behavior

9.3 Pipeline Processor

9.3.1 Rationale

9.3.2 Inputs and Outputs

9.3.3 Behavior

9.4 Null Serializer

9.4.1 Inputs and Outputs

9.4.2 Behavior

10 Inclusions

11 Conformance

11.1 Basic Profile

11.2 Full Profile

5.2.1 The `p:pipeline` Element

5.2.2 The `p:input` Element

5.2.3 The `p:output` Element

5.3.1 The `p:processor` Element

5.3.2 The `p:input` Element

5.3.3 The `p:output` Element

5.4.1 The `p:choose` Element

5.4.2 The `p:output` Element

5.4.3 The `p:when` Element

5.4.4 The `p:otherwise` Element

5.5.1 The `p:for-each` Element

5.5.2 The `p:output` Element

6.8 `p:choose` Execution

6.9 `p:for-each` Execution

12.2 Repeat Module `p:while` Element