Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The W3C Multimodal Interaction working group aims to develop specifications to enable access to the Web using multi-modal interaction. This document is part of a set of specifications for multi-modal systems, and provides details of an XML markup language for describing the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from a speech or pen input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a W3C Working Draft for review by W3C members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than 'work in progress.'
This specification describes markup for representing interpretations of user input (speech, keystrokes, pen input etc.) together with annotations for confidence scores, timestamps, input medium etc., and forms part of the proposals for the W3C Multimodal Interaction Framework. This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group (W3C Members only). This is a Royalty Free Working Group, as described in W3C's Current Patent Practice NOTE. Working Group participants are required to provide patent disclosures.
This is the second version of this Working Draft, and your feedback is welcomed, especially on the open issues described within the specification. Please send comments about this document to the public mailing list: www-multimodal@w3.org (public archives). See W3C mailing list and archive usage guidelines.
This document presents an XML specification for EMMA, an Extensible MultiModal Annotation markup language, responding to the requirements documented in W3C Requirements for EMMA. This markup language is intended for use by systems that provide semantic interpretations for a variety of inputs, including but not necessarily limited to, speech, natural language text, GUI and ink input.
It is expected that this markup will be used primarily as a standard data interchange format between the components of a multimodal system; in particular, it will normally be automatically generated by interpretation components to represent the semantics of users' inputs, not directly authored by developers.
The language is focused on annotating the interpretation information of single and composed inputs, as opposed to (possibly identical) information that might have been collected over the course of a dialog.
The language provides a set of elements and attributes that are focused on accurately representing annotations on the input interpretations.
An EMMA document can be considered to hold three types of data:
instance data
Application-specific markup corresponding to input information which is meaningful to the consumer of an EMMA document. Instances are application-specific and built by input processors at runtime. Given that utterances may be ambiguous with respect to input values, an EMMA document may hold more than one instance.
data model
Constraints on structure and content of an instance. The data model is typically pre-established by an application, and may be implicit, that is, unspecified.
metadata
Annotations associated with the data contained in the instance. Annotation values are added by input processors at runtime.
Given the assumptions above about the nature of data represented in an EMMA document, the following general principles apply to the design of EMMA:
The annotations of EMMA should be considered 'normative' in the sense that if an EMMA component produces annotations as described in Section 3, these annotations must be represented using the EMMA syntax. The Multimodal Interaction Working Group may address in later drafts the issues of modularization and profiling, that is: which sets of annotations are to be supported by which classes of EMMA component.
The Multimodal Interaction Working Group is currently considering the role of RDF in EMMA syntax and processing. It appears useful for EMMA to adopt the spirit of the RDF conceptual triples model, and thereby enable RDF processing in RDF environments.
However, on one hand, there is concern that unnecessary processing overhead will be introduced by a requirement for all EMMA environments to support the RDF syntax and its related constructs. An inline syntax would remove this requirement, provide a more compact representation, and enable queries on annotations using XPath, just as for queries on instance data. This would not preclude the use of RDF processors to build an RDF representation of the EMMA document.
On the other hand, mixing data and metadata may have its own processing costs when it is necessary to separate the two. The RDF syntax makes it easy to annotate attributes in addition to elements, and to apply the same annotation to multiple nodes without the need for an inheritance mechanism. This is based upon an XPointer subset, which can in principle also be used for querying annotation data in combination with the name of the RDF property. The basic mechanism for supporting queries in the RDF syntax still needs to be determined. In addition, advanced query mechanisms for both syntax proposals may still need to be determined.
In view of this open issue, three syntax proposals are provided for public review:
The Multimodal Interaction Working Group is seeking feedback from the broader community on the relevance and role of RDF in EMMA, and encourages comments on this issue to be sent to the public mailing list at www-multimodal@w3.org.
The general purpose of EMMA is to represent information automatically extracted from a user's input by an interpretation component, where input is to be taken in the general sense of a meaningful user input in any modality supported by the platform. The reader should refer to the sample architecture in W3C Multimodal Interaction Framework, which shows EMMA conveying content between user input modality components and an interaction manager.
Components that generate EMMA markup:
Components that use EMMA include:
Although not a primary goal of EMMA, a platform may also choose to use this general format as the basis of a general semantic result that is carried along and filled out during each stage of processing. In addition, future systems may also potentially make use of this markup to convey abstract semantic content to be rendered into natural language by a natural language generation component.
As noted above, the main components of an interpreted user input in EMMA are the instance data, an optional data model, and the metadata annotations that may be applied to that input. The realization of these components in EMMA is as follows:
An EMMA interpretation is the primary unit for holding user input as interpreted by an EMMA processor. As will be seen below, multiple interpretations of a single input are possible.
EMMA provides a simple structural syntax for the organization of interpretations and instances, and an annotative syntax derived from RDF to apply the annotation to the input data at any level.
An outline of the structural syntax of EMMA documents is as follows. A fuller definition may be found in the description of individual features in section 3.
EMMA annotations may apply to interpretations and any node within the XML tree for the application-specific markup for a specific interpretation.
Here is an example of a complete EMMA document, illustrating the application of the RDF XML syntax for annotations at various levels.
Example:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of emma:id="r1"> <emma:interpretation emma:id="int1"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:id="int2"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> <rdf:RDF> <!-- time stamp for result --> <rdf:Description rdf:about="#r1" emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"/> <!-- confidence score for first interpretation --> <rdf:Description rdf:about="#int1" emma:confidence="0.75"/> <!-- confidence score for second interpretation --> <rdf:Description rdf:about="#int2" emma:confidence="0.68"/> <!-- time stamps for date in first interpretation --> <rdf:Description rdf:about="#emma(id('int1')/date)"> <emma:absolute-timestamp emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"/> </rdf:Description> </rdf:RDF> </emma:emma>
This example shows a recognition result (emma:id="r1") with two exclusive interpretations (emma:id="int1" and emma:id="int2"). There are four annotations. The first gives the start and end timestamps for the result. The second and third give confidence scores for the two interpretations. The fourth gives a timestamp for the date value in the first interpretation, making use of the EMMA scheme for a subset of the XPointer syntax, as defined in section 2.1.2.
Here is the same example using the inline annotation syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:one-of emma:id="r1" emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"> <emma:interpretation emma:id="int1" emma:confidence="0.75" > <origin>Boston</origin> <destination>Denver</destination> <date> <emma:absolute-timestamp emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"/> 03112003 </date> </emma:interpretation> <emma:interpretation emma:id="int2" emma:confidence="0.68" > <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> </emma:emma>
The mixed representation syntax (Syntax 3 in the open issue in the Introduction) can be transformed into RDF syntax through an XSLT transform, and processed by the same EMMA processor for RDF syntax. On the other hand, for a resource limited inline syntax EMMA processor, the inline part of the mixed syntax will remain the same, but the RDF syntax section may not be processed. The mixed syntax annotation allows the annotation to be expanded along the chain to the interaction management, from the low level input device to high level understanding module, and at the same time, it provides flexibility to the application developer to control the processing complexity, bandwidth requirement, etc. for their application.
An EMMA data model expresses the constraints on the structure and content of instance data, for the purposes of validation. As such, the data model may be considered as a particular kind of annotation (although, unlike other EMMA annotations, it is not a feature pertaining a specific user input at a specific moment in time, it is rather a static and, by very definition, application-specific structure). Its specification in EMMA is optional.
Since Web applications today use different formats to specify data models, e.g. XML Schema, XForms, Relax-NG, etc., EMMA itself is agnostic to the format of data model used.
Data model definition and reference is defined in section 3.1.
The emma() XPointer Scheme is intended to be used with the XPointer Framework [XPointer] to allow addressing within documents conforming to the EMMA specification. This scheme defines a subset of XPath [XPath] for use in addressing element and attribute nodes within EMMA documents. The subset was chosen to enable implementations in devices with tight resource limits. It omits other XPath features such as ranges, general predicates and variables. Support for the EMMA XPointer Scheme is required for processing documents using the RDF syntax.
As specified by the XPointer Framework, an EMMA XPointer processor takes as input an EMMA document and a string to be used as a pointer. This string is a fragment identifier with escaping reversed, and taken from the URI used to reference a node within the EMMA document. The processor attempts to evaluate the pointer with respect to the document and produces as output an identification of a node list or one or more errors.
The scheme name is "emma". The scheme data syntax is as follows; if scheme data in a pointer part with the emma() scheme does not conform to the syntax defined in this section the pointer part does not identify a subresource.
The formal grammar for the subset of XPointer is given using simple Extended Backus-Naur Form (EBNF) notation, as described in the XML Recommendation [XML].
[1] EmmaSchemeData ::= ElementPath Attribute? [2] ElementPath ::= IdReference? ElementStep* [3] IdReference ::= 'id(' QuotedName ')' [4] ElementStep ::= ('//' | '/') Name Position? [5] QuotedName ::= \' Name \' | \" Name \" [6] Position ::= '[' [1-9] [0-9]* ']' [7] Attribute ::= '/@' Name
The id() has the same semantics as in XPath, i.e. it selects elements by their unique ID. However, unlike XPath, is it constrained to only accept strings as arguments. The Position rule has the same semantics of XPath's positional predicate.
The evaluation context of an emma() epxression consists of the namespace declarations in scope at the location of the expression. For example, the namespace URI of the element geo:location in the third example below is determined by the binding of the geo prefix at that point in the document
It is an error if the content of an emma() expression does not reference a part of the document, either by a violation of the syntactic rules or by pointing to a non-existant resource.
Examples:
#emma(//destination) all elements called "destination" #emma(id('s12')/quantity) a child of element with id 's12' #emma(//geo:location[2]) any second geo:location element #emma(//rdf:RDF/rdf:Description[1]) first annotation in the document
The EMMA Scheme data can be used from RDF to annotate arbitrary element or attribute nodes in an EMMA document. It can also be used to query an EMMA document for the annotations on any element or attribute. This makes it an easy task to obtain the value of a given annotation, such as emma:confidence, or to determine the set of RDF triples that annotate a specific node in an EMMA document. Implementations may choose to support this by preprocessing EMMA documents to decorate the DOM tree with nodes representing RDF triples.
W3C's Resource Description Framework (RDF) is a well-established, extensible framework for describing properties for things that can be named with URIs. RDF statements are made in terms of a predicate, and the subject and object that the predicate applies to.
For many annotations in this document, this conceptual model can be reflected in two possible ways:
The RDF model provides a powerful yet lightweight means to describe annotations on interpretations of input data whether the input is from speech, ink, key strokes or other modes of input. In most cases, the annotations will be generated automatically by the input processors.
For speech, the semantic intepretation specification [SI], provides a means for authors to indicate how recognition against a speech grammar can be used to generate XML interpretations of spoken input. The input processor can wrap this into an XML document conforming to the EMMA specification, in the process, adding annotations for time stamps, confidence scores, input medium etc.
The Multimodal Interaction Framework [MMIF] shows how EMMA documents are produced by input components and consumed by the multimodal integration and interaction manager components under the control of rules provided by authors. Both the semantic interpretations and their annotations can be accessed via simple XPath [XPATH] expressions.
As noted earlier, one possible syntax for annotations in EMMA is to use RDF XML syntax. The use of RDF in EMMA allows processors to compare and evaluate the equivalence of the meaning in EMMA annotation based on RDF syntax (triples: subject, predicate, object) and the associated labeled directed graphs. The metadata annotation in EMMA incorporates vocabularies (e.g. confidence, etc.) that are in the EMMA namespace, and may include additional properties in application specific vocabularies.
Example stating confidence scores for two mutually exclusive interpretations:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of emma:id="r1"> <emma:interpretation emma:id="int1"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:id="int2"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> <rdf:RDF> <!-- confidence score for first interpretation --> <rdf:Description rdf:about="#int1" emma:confidence="0.75" /> <!-- confidence score for second interpretation --> <rdf:Description rdf:about="#int2" emma:confidence="0.68" /> </rdf:RDF> </emma:emma>
The explicit representation of RDF triples in EMMA as XML conforms to a subset of the RDF/XML Syntax Specification [RDF-Syntax]. This subset is restricted to the rdf:Description element and the associated attributes: rdf:about, rdf:ID, and rdf:resource. Properties may be represented either as attributes on rdf:Description elements, or as child elements of rdf:Description elements, where the value of the property is given as the content of the child elements. In some cases, the value may be another rdf:Description element
Within EMMA documents that use the RDF syntax, RDF statements can use the EMMA XPointer schema as the basis for naming nodes within the XML tree corresponding to the EMMA document. This provides the power to annotate any element or attribute within an EMMA document.
Example using the EMMA XPointer reference "#emma(id('int1')/date)" to address the date element in the first interpretation.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of emma:id="r1"> <emma:interpretation emma:id="int1"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:id="int2"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> <rdf:RDF> <!-- time stamps for date in first interpretation --> <rdf:Description rdf:about="#emma(id('int1')/date)" emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"/> </rdf:RDF> </emma:emma>
This section defines annotations in the EMMA namespace. The values are specified in terms of the data types defined by XML Schema Part 2: Datatypes [XSD].
The root element of an EMMA document is named emma. It holds one or more interpretation or grouping elements, and attributes for information pertaining to EMMA itself, along with any namespaces which are declared for the entire document, and any other EMMA annotative data.
Attributes:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> .... </emma:emma>
The emma:interpretation element holds a single interpretation represented in application specific markup.
Attributes:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="r1"> ... </emma:interpretation> </emma:emma>
or
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="r1"> ... </emma:interpretation> <emma:interpretation emma:id="r2"> ... </emma:interpretation> <rdf:RDF> ... </rdf:RDF> </emma:emma>
The emma:one-of element acts as a container for two or more emma:interpretation elements, and denotes that these are mutually exclusive interpretations.
Attributes:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of emma:id="r1"> <emma:interpretation emma:id="int1"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:id="int2"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> </emma:emma>
Note: another possibility would be to represent the relationships between mutually exclusive interpretations using RDF properties, e.g. a link from one interpretation to the next alternative interpretation. The emma:one-of element has the advantage of making it possible to make statements about the set of alternatives as a whole.
Annotation | emma:model |
---|---|
Inline Syntax | An element with the attribute ref of type xsd:anyURI referencing the data model, alternatively the data model can be provided inline as the content of the emma:model element. |
RDF XML Syntax | xsd:anyURI value referencing the data model |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
The data model that may be used to express constraints on the structure and content of instance data is specified as one of the annotations of the instance. Specifying the data model is optional, in which case the data model can be said to be implicit. Typically the data model is pre-established by the application.
The data model is specified with the emma:model annotation defined in the EMMA namespace. In the inline case, it is represented as an element, in the RDF syntax as an RDF property.
The data model is closely related to the interpretation data, and is typically specified as the annotation related to the <interpretation> or <one-of> elements.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <city> London </city> <country> UK </country> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#int1" emma:model="http://myserver/models/city.xml"/> </rdf:RDF> </emma:emma>
The emma:model annotation can reference any element or attribute in the application instance data, as well as any EMMA container element (emma:one-of, emma:group, or emma:sequence).
The following is an example for the inline syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" > <emma:interpretation emma:id="int1"> <emma:model ref="http://myserver/models/city.xml"/> <city> London </city> <country> UK </country> </emma:interpretation> </emma:emma>
Annotation | emma:derived-from |
---|---|
Inline Syntax | An empty element with the attribute resource of type xsd:anyURI that references the interpretation from which the current interpretation is derived. |
RDF XML Syntax | xsd:anyURI value referencing the interpretation from which the current interpretation is derived. |
Applies to | emma:interpretation |
Instances of interpretations are in general derived from other instances of interpretation in a process that goes from raw data to increasingly refined representations of the input. The derivation annotation is used to link any two interpretations that are related by representing the source and the outcome of an interpretation process. For instance, a speech recognition process can return the following result in the form of raw text:
<emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation>
A first interpretation process will produce:
<emma:interpretation emma:id="better"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation>
A second interpretation process, aware of the current date, will be able to produce a more refined instance, such as:
<emma:interpretation emma:id="best"> <origin>Boston</origin> <destination>Denver</destination> <date>20030315</date> </emma:interpretation>
The interaction manager may need to have access to the three levels of interpretation. The emma:derived-from annotation can be used to establish a chain of derivation relationships as in the following example:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best"> <origin>Boston</origin> <destination>Denver</destination> <date>20030315</date> </emma:interpretation> <rdf:RDF> <!-- derivation for second interpretation --> <rdf:Description rdf:about="#better"> <emma:derived-from rdf:resource="#raw" /> </rdf:Description> <!-- derivation for third interpretation --> <rdf:Description rdf:about="#best"> <emma:derived-from rdf:resource="#better" /> </rdf:Description> </rdf:RDF> </emma:emma>
The corresponding example for the inline syntax is:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better"> <emma:derived-from resource="#raw" /> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best"> <emma:derived-from resource="#better" /> <origin>Boston</origin> <destination>Denver</destination> <date>20030315</date> </emma:interpretation> </emma:emma>
Section 4 provides further examples of the use of <emma:derived-from> to represent both sequential derivations like those above and composite derivations in which inputs from multiple different modalities are combined, and addresses the issue of the scope of EMMA annotations across derivations of user input.
Annotation | emma:group |
---|---|
Inline Syntax | An element with attribute emma:id of type xsd:anyURI. The element acts as a container for EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
RDF XML Syntax | Not applicable |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Introduced in section 2.1, the emma:group element is used to indicate that the contained interpretations are related in some manner. The following example shows three interpretations derived from the speech input "Move this ambulance here" and the haptic input related to two consecutive points on a map. The group is associated with time stamps defining the beginning and end of a time window used to group the events.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:group emma:id="grp"> <emma:interpretation> <action>move</action> <object>ambulance</object> <destination>here</destination> </emma:interpretation> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> <rdf:RDF> <rdf:Description rdf:about="#grp" emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.515"/> </rdf:RDF> </emma:emma>
The emma:one-of and emma:group containers can be nested arbitrarily.
An analogous example for the inline syntax is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:group emma:id="grp" emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-36T0:00:00.515"> <emma:interpretation> <action>move</action> <object>ambulance</object> <destination>here</destination> </emma:interpretation> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
Annotation | emma:group-info |
---|---|
Inline Syntax | An element with the attribute ref of type xsd:anyURI referencing the grouping criteria, alternatively the criteria can be provided inline as the content of the emma:group-info element. |
RDF XML Syntax | xsd:anyURI value referencing an rdf:Description element, alternatively the rdf:Description element can be placed as the content of the emma:group-info element with the effect of an anonymous URI, see example below. |
Applies to | emma:group |
Sometimes it may be convenient to indirectly associate a given group with information, such as grouping criteria. The emma:group-info annotation can be used to associate a group with information expressed as a set of RDF properties. In the following example, a group of two points is associated with a description of grouping criteria based upon a sliding temporal window of two seconds duration.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:ex="http://www.example.com/ns/group#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:group emma:id="grp"> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> <rdf:RDF> <rdf:Description rdf:about="#grp"> <emma:group-info> <rdf:Description ex:mode="temporal" ex:duration="2s"/> </emma:group-info> </rdf:Description> </rdf:RDF> </emma:emma>
Here is the inline equivalent:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:ex="http://www.example.com/ns/group#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:group emma:id="grp"> <emma:group-info> <ex:mode>temporal</ex:mode> <ex:duration>2s</ex:duration> </emma:group-info> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
You can also use emma:group-info to refer to a named grouping criterion using external reference, for instance:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:ex="http://www.example.com/ns/group#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:group emma:id="grp"> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> <rdf:RDF> <rdf:Description rdf:about="#grp" emma:group-info="http://www.example.com/criterion42"/> </rdf:RDF> </emma:emma>
Here is the inline equivalent:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:ex="http://www.example.com/ns/group#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:group emma:id="grp"> <emma:group-info ref="http://www.example.com/criterion42"/> <emma:interpretation> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group> </emma:emma>
Annotation | emma:sequence |
---|---|
Inline Syntax | An element that can contain EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence). It has an optional attribute emma:id of type xsd:anyURI |
RDF XML Syntax | Not applicable |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Introduced in section 2.1, the emma:sequence element is used to indicate that the contained interpretations are sequential in time, as in the following example:.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation> <action>move</action> <object>this-battleship</object> <destination>here</destination> </emma:interpretation> <emma:sequence> <emma:interpretation> <x>0.253</x> <y>0.124<y> </emma:interpretation> <emma:interpretation> <x>0.866</x> <y>0.724<y> </emma:interpretation> </emma:sequence"> </emma:emma>
The emma:sequence container can be combined with emma:one-of and emma:group in arbitrary nesting structures. The order of children in the content of emma:sequence element corrresponds to a sequence of interpretations. This ordering does not imply any particular definition of sequentiality. EMMA processors may therefore use the emma:sequence element to hold interpretations which are either strictly sequential in nature (e.g. the end-time of an interpretation precedes the start-time of its follower), or which overlap in some manner (e.g. the start-time of a follower interpretation precedes the end-time of its precedent). It is possible to use timestamps to provide fine grid annotation for the sequence of interpretations that are sequential in time.
Annotation | emma:tokens |
---|---|
Inline Syntax | An attribute of type xsd:string holding a sequence of input tokens. |
RDF XML Syntax | xsd:string value holding a sequence of input tokens |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
The emma:tokens annotation holds a list of input tokens. In the following description, the term tokens is used in the computational and syntactic sense of units of input, and not in the sense of XML tokens.
The value held in emma:tokens is the list of the tokens of input as produced by the processor which generated the EMMA document. In the case where a grammar is used to constrain input, the value will correspond to tokens as defined by the grammar. So for an EMMA document produced by input to a W3C SRGS grammar [SRGS], the value of emma:tokens will be the list of words and/or phrases that are defined as tokens in SRGS (through white-spaced character data or the <token>; element, see SRGS section 2.1 Tokens). Items in the emma:tokens list are delimited by white space and/or quotation marks for phrases containing white space. For example:
emma:tokens="arriving at 'Liverpool Street'"
where the three tokens of input are arriving, at and Liverpool Street.
The tokens annotation may be applied not just to the lexical words and phrases of language but to any level of input processing. Other examples of tokenization include phonemes, ink strokes, gestures and any other discrete units of input at any level.
Examples:
Inline:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:tokens="From Cambridge to London tomorrow"> <origin emma:tokens="From Cambridge">Cambridge</origin> <destination emma:tokens="to London">London</destination> <date emma:tokens="tomorrow">20030315</date> </emma:interpretation> </emma:emma>
RDF:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <origin>Cambridge</origin> <destination>London</destination> <date>20030315</date> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#int1" emma:tokens="From Cambridge to London tomorrow" /> <rdf:Description rdf:about="#emma(//origin)" emma:tokens="From Cambridge" /> <rdf:Description rdf:about="#emma(//destination)" emma:tokens="to London" /> <rdf:Description rdf:about="#emma(//date)" emma:tokens="tomorrow" /> </rdf:RDF> </emma:emma>
Annotation | emma:process |
---|---|
Inline Syntax | An attribute of type xsd:anyURI referencing the process used to generate the interpretation. |
RDF XML Syntax | xsd:anyURI value referencing the process used to generate the interpretation. |
Applies to | emma:interpretation |
A reference to the information concerning the processing that was used for generating an interpretation can be made as in the following example:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#better" emma:derived-from="#raw" emma:process="http://example.com/mysemproc1.xml"/> <rdf:Description rdf:about="#best" emma:derived-from="#better" emma:process="http://example.com/mysemproc2.xml"/> </rdf:RDF> </emma:emma>
The process description document, referenced by the emma:process annotation can include information on the process itself, such as grammar, type of parser, etc. EMMA is not normative about the format of the process description document.
For the inline syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" > <emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better" emma:process="http://example.com/mysemproc1.xml"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> <emma:derived-from emma:resource="#raw"/> </emma:interpretation> <emma:interpretation emma:id="best" emma:process="http://example.com/mysemproc2.xml"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> <emma:derived-from emma:resource="#better"/> </emma:interpretation> </emma:emma>
Annotation | emma:no-input |
---|---|
Inline Syntax | Attribute holding xsd:boolean value that is true if there was no input. |
RDF XML Syntax | xsd:boolean value that is true if there was no input |
Applies to | emma:interpretation, application instance data |
The case of lack of input can be annotated as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="int1" emma:no-input="true" /> </emma:emma>
or alternatively:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"/> <rdf:RDF> <rdf:Description rdf:about="#int1" emma:no-input="true"/> </rdf:RDF> </emma:emma>
Annotation | emma:uninterpreted |
---|---|
Inline Syntax | Attribute holding xsd:boolean value that is true if the input could not be interpreted |
RDF XML Syntax | xsd:boolean value that is true if the input could not be interpreted |
Applies to | emma:interpretation, application instance data |
Input that cannot be interpreted can be annotated as in the following example:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"> <answer>one sixty two flight fourth yes</answer> </emma:interpretation> <emma:interpretation emma:id="better"> <emma:uninterpreted/> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#raw" emma:process="http://example.com/myasr.xml"/> <rdf:Description rdf:about="#better" emma:process="http://example.com/mysemproc1.xml" derived-from="#raw"/> </rdf:RDF> </emma:emma>
where the input ("raw") did not lead to any possible interpretation.
Alternatively one can use the following syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"> <answer>one sixty two flight fourth yes</answer> </emma:interpretation> <emma:interpretation emma:id="better"/> <rdf:RDF> <rdf:Description rdf:about="#raw" emma:process="http://example.com/myasr.xml"/> <rdf:Description rdf:about="#better" emma:uninterpreted="true" emma:process="http://example.com/mysemproc1.xml" emma:derived-from="#raw"/> </rdf:RDF> </emma:emma>
The notation for uninterpretable input can refer to any possible stage of interpretation processing, including raw transcriptions. For instance, if input speech cannot be correctly recognized or the spoken input is not matched by a grammar (or language constraint given to the recognition), it can be tagged as emma:uninterpreted as in the following example:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"/> <rdf:RDF> <rdf:Description rdf:about="#raw" emma:uninterpreted="true" emma:process="http://example.com/myasr.xml"/> </rdf:RDF> </emma:emma>
An example for the inline syntax is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="raw" emma:process="http://example.com/myasr.xml" emma:uninterpreted="true"/> </emma:emma>
Annotation | emma:lang |
---|---|
Inline Syntax | An attribute of type xsd:language indicating the language for the input. |
RDF XML Syntax | xsd:language value indicating the language for the input |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
The emma:lang annotation is used to indicate the human language for the input that it annotates. The values of the emma:lang attribute are language identifiers as defined by [IETF RFC 1766]. For example, emma:lang="fr" denotes French, and emma:lang="en-US" denotes US English. emma:lang can be applied to any emma:interpretation element. Its annotative scope follows the annotative scope of these elements. In contrast, the attribute xml:lang in XML 1.0 is used to specify the language used in the contents and attribute values of any element in an XML document. The attribute emma:lang must be used if the xml:lang can no longer apply. For example, the contents and attribute values of an element in the EMMA document are from different languages, such as in the case where the input language is in French, and the language of the annotated attributes is in English.
The following example shows the use of emma:lang for annotating an input interpretation.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <answer>arretez</answer> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#int1" emma:lang="fr"/> </rdf:RDF> </emma:emma>
and for the inline syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="int1" emma:lang="fr"> <answer>arretez</answer> </emma:interpretation> </emma:emma>
The following example shows the annotation of different interpretations derived from the same input in a multilingual application:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <rawtext>please stop arretez s'il vous plait</rawtext> </emma:interpretation> <emma:interpretation emma:id="int2"> <command> CANCEL </command> </emma:interpretation> <emma:interpretation emma:id="int3"> <command> CANCEL </command> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#int2" emma:lang="en" emma:derived-from="#int1" emma:process="http://example.com/EnglishInterpreter.xml"/> <rdf:Description rdf:about="#int3" emma:lang="fr" emma:derived-from="#int1" emma:process="http://example.com/FrenchInterpreter.xml"/> </rdf:RDF> </emma:emma>
and analogously for the inline syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="int1"> <rawtext>please stop arretez s'il vous plait</rawtext> </emma:interpretation> <emma:interpretation emma:id="int2" emma:lang="en" emma:process="http:/example.com/EnglishInterpreter.xml" > <command> CANCEL </command> <emma:derived-from resource="#int1"/> </emma:interpretation> <emma:interpretation emma:id="int3" emma:lang="fr" emma:process="http:/example.com/FrenchInterpreter.xml"> <command> CANCEL </command> <emma:derived-from resource="#int1"/> </emma:interpretation> </emma:emma>
Annotation | emma:signal |
---|---|
Inline Syntax | An attribute of type xsd:anyURI referencing the input signal. |
RDF XML Syntax | xsd:anyURI value referencing the input signal |
Applies to | emma:interpretation, application instance data. |
A URI reference to the signal that originated the input recognition process may be represented in EMMA using the emma:signal annotation.
Here is an example where the reference to the signal is applied to the emma:interpretation element:
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="intp1"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#intp1" emma:signal="http://example.com/signals/sg23.bin"/> </rdf:RDF> </emma:emma>
and for the inline syntax:
<emma:emma version="1.0" xmlns="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="intp1" emma:signal="http://example.com/signals/sg23.bin"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
Annotation | emma:signal-codec |
---|---|
Inline Syntax | An attribute of type xsd:string holding the MIME type associated with the signal's encoding or file format. |
RDF XML Syntax | xsd:string value holding the MIME type associated with the signal's encoding or file format. |
Applies to | emma:interpretation, application instance data. |
The encoding or file format of the signal that originated the input may be represented in EMMA using the emma:signal-codec annotation. The value of emma:signal-codec is a MIME type. An initial set of MIME media types is defined by [RFC2046].
Here is an example where the signal codec is applied to the emma:interpretation element:
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="intp1"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#intp1" emma:signal-codec="audio/3gpp"/> </rdf:RDF> </emma:emma>
and for the inline syntax:
<emma:emma version="1.0" xmlns="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="intp1" emma:signal-codec="audio/3gpp"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
Annotation | emma:confidence |
---|---|
Inline Syntax | An attribute of type xsd:decimal in range 0.0 to 1.0, indicating the recognition confidence. |
RDF XML Syntax | xsd:decimal value in range 0.0 to 1.0, indicating the recognition confidence |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Confidence score in EMMA is used to indicate the quality of the input, and it is the value assigned to emma:confidence in the EMMA namespace. The confidence score is a number in the range from 0.0 to 1.0 inclusive. A value of 0.0 indicates minimum confidence, and a value of 1.0 indicates maximum confidence. The confidence score values do not have to be interpreted as probabilities. In fact confidence score values are platform-dependent, since their computation is likely to differ between platforms and different EMMA processors. Confidence scores are annotated explicitly in EMMA in order to provide this information to the subsequent processes for multimodal interaction. The example below illustrate how confidence scores are annotated in EMMA.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of> <emma:interpretation emma:id="meaning1> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:id="meaning2"> <location> Austin </location> </emma:interpretation> </emma:one-of> <rdf:RDF> <rdf:Description rdf:about="#meaning1" emma:confidence="0.6"/> <rdf:Description rdf:about="#meaning2" emma:confidence="0.4"/> </rdf:RDF> </emma:emma>
and analogously, for the inline syntax:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:one-of> <emma:interpretation emma:id="meaning1" emma:confidence="0.6"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:id="meaning2" emma:confidence="0.4"> <location> Austin </location> </emma:interpretation> </emma:one-of> </emma:emma>
The emma:confidence annotation may also be applied to attributes. Here is an example stating the confidence for the size attribute denoting pizza size:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="p1"> <pizza size="large" style="quattro stagioni"/> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="//pizza/@size)" emma:confidence="0.6"/> </rdf:RDF> </emma:emma>
The current EMMA draft does not specify how the inline syntax can be used to apply annotations to attributes in the instance data. It may be a reasonable constraint to require that data which is intended to be annotated should be be realized in element nodes rather than attribute nodes. If not, an inline syntax could be derived which enables the annotation of attribute data through an attribute-referencing mechanism, as in the following example:
<pizza size="large"> <emma:annotation emma:attribute="size" emma:confidence="0.6" /> </pizza>
This would allow the XPath querying model to be maintained for all data and annotation queries.
Annotation | emma:source |
---|---|
Inline Syntax | An attribute of type xsd:anyURI referencing the source of input. |
RDF XML Syntax | xsd:anyURI referencing the source of input |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
The source of an interpreted input may be represented in EMMA as a URI resource using the emma:source annotation.
Here is an example that shows different input sources for different input interpretations.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:myapp="http://www.example.com/myapp"> <emma:one-of> <emma:interpretation emma:id="intp1"> <myapp:destination>Boston</myapp:destination> </emma:interpretation> <emma:interpretation emma:id="intp2"> <myapp:destination>Austin</myapp:destination> </emma:interpretation> </emma:one-of> <rdf:RDF> <rdf:Description rdf:about="#intp1" emma:source="http://example.com/microphone/NC-61"/> <rdf:Description rdf:about="#intp2" emma:source="http://example.com/microphone/NC-4024"/> </rdf:RDF> </emma:emma>
An analogous example for the inline syntax is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:myapp="http://www.example.com/myapp"> <emma:one-of> <emma:interpretation emma:id="intp1" emma:source="http://example.com/microphone/NC-61"> <myapp:destination>Boston</myapp:destination> </emma:interpretation> <emma:interpretation emma:id="intp2" emma:source="http://example.com/microphone/NC-4024"> <myapp:destination>Austin</myapp:destination> </emma:interpretation> </emma:one-of> </emma:emma>
Annotation | emma:start, emma:end, emma:from-start-of, emma:from-end-of, emma:start-offset, emma:end-offset |
---|---|
RDF and Inline Syntax | emma:start and emma:end, which are of type xsd:dateTime or xsd:time, indicate the absolute starting and ending times of an input. |
emma:from-start-of and emma:from-end-of are of type xsd:anyURI, and indicate a timestamp relative to a reference point, designated by the value of the attribute. emma:start-offset and emma:end-offset, which are of type xsd:duration, indicate the time offset with respect to the reference point. | |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence) |
The start and end times for input can be indicated using either absolute timestamps, or as relative timestamps, expressed as time offsets from the start or end of a temporal reference point, such as another input.
The timestamp format uses the XML Schema datatypes [XSD]. The timestamp attributes for relative time offsets from the reference point use the xsd:duration datatype. The absolute time attributes, emma:start and emma:end, use the xsd:dateTime if both the date and time are included, or xsd:time if only the time is included.
Here is an example of a timestamp for an absolute time.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" <emma:interpretation emma:id="int1" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2"> <destination>Chicago</destination> </emma:interpretation> </emma:emma>
In order to indicate relative time positioning of different inputs, emma:from-start-of and emma:from-end-of are used to indicate the input the offset refers to. Both of these are of type xsd:anyURI. emma:from-start-of is used when the offset is from the start of the referenced input, whereas emma:from-end-of is used when the offset is from the end of the referenced input. emma:start-offset and emma:end-offset, of type xsd:duration, are used to represent the amount of elapsed time since the reference point. Note that the referenced input may be in a different EMMA document.
Here is an example where the referenced input is in the same document:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <origin>Denver</origin> </emma:interpretation> <emma:interpretation emma:id="int2" emma:start-offset="P5S" emma:from-start-of="#int1"> <destination>Chicago</destination> </emma:interpretation> </emma:emma>
The emma:from-start-of and emma:from-end-of annotations for specifying the reference point of a relative time are added by the producer of the EMMA document. In order to make use of these annotations, the consumer of the EMMA document must have access to the resources referenced by the URI. This may not always be the case in a distributed architecture.
Note that the reference point refers to an input from a user. It does not necessarily refer to a complete input. For example, if a speech recognizer timestamps each word in an utterance, the reference point might refer to the timestamp for just one word.
The absolute and relative timestamps are not mutually exclusive; that is, it is possible to have both relative and absolute timestamp attributes on the same EMMA container element.
Timestamps of inputs collected by different devices will be subject to variation if the times maintained by the devices are not synchronized. This concern is outside of the scope of the EMMA working group.
The treatment of relative timestamps in EMMA is currently undergoing review by the EMMA subgroup. One problem that needs to be resolved is that current mark up does not provide means for specifying the end point of the user input with respect to the reference point. This issue will be resolved in the future.
Annotation | emma:medium |
---|---|
Inline Syntax | An attribute of type xsd:String constrained to values in the set {acoustic, tactile, visual}. |
RDF XML Syntax | xsd:String constrained to values in the closed set {acoustic, tactile, visual}. |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Annotation | emma:mode |
Inline Syntax | An attribute of type xsd:String constrained to values in the open set {speech, dtmf_keypad, ink, gui, keys, video,photograph, ...}. |
RDF XML Syntax | xsd:String constrained to values in the open set {speech, dtmf_keypad, ink, gui, keys, video, photograph, ...}. |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Annotation | emma:function |
Inline Syntax | An attribute of type xsd:String constrained to values in the open set {recording, transcription, dialog, verification, ...}. |
RDF XML Syntax | xsd:String constrained to values in the open set {recording, transcription, dialog, verification, ...}. |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
Annotation | emma:verbal |
Inline Syntax | An attribute of type xsd:boolean. |
RDF XML Syntax | xsd:boolean |
Applies to | EMMA container elements (emma:interpretation, emma:group, emma:one-of, emma:sequence), and application instance data |
EMMA provides two properties for the annotation of input modality. One indicating the broader medium or channel (medium) and another indicating the specific mode of communication used on that channel (mode).The input medium is defined from the users perspective and indicates whether they use their voice (acoustic), touch (tactile), or visual appearance/motion (visual) as input. Tactile includes most hand-on input device types such as pen, mouse, keyboard, and touch screen. Visual is used for camera input.
emma:medium ::= [acoustic|tactile|visual]
The mode property provides the ability to distinguish between different modes of communication that may be within a particular medium. For example, in the tactile medium, modes include electronic ink (ink), and pointing and clicking on a graphical user interface.
emma:mode ::= [speech|dtmf_keypad|ink|gui|keys|video|photograph| ... ]
Orthogonal to the mode, user inputs can also be classified with respect to their communicative function. This enables a simpler mode classification.
emma:function ::= [recording|transcription|dialog|verification| ... ]
For example, speech can be used for recording (e.g. voicemail), transcription (e.g. dictation), dialog (e.g interactive spoken dialog systems), and verification (e.g. identifying the user through their voiceprint).
EMMA also supports an additional property verbal which distinguishes verbal use of an input mode from non-verbal. This can be used to distinguish the use of electronic ink to convey handwritten commands from the user of electronic ink for symbolic gestures such as circles and arrows. Handwritten commands, such as writing downtown in order to change a map display to show the downtown are classified as verbal (verbal="true"). Pen gestures (arrows, lines, circles, etc), such as circling a building, are classified as non-verbal dialog (function="dialog" verbal="false"). The use of handwritten words to transcribe an email message are classified as transcription (function="transcription").
emma:verbal ::= [true|false|0|1]
Handwritten words and ink gestures are typically recognized using different kinds of recognition components (handwriting recognizer vs. gesture recognizer) and the verbal annotation will be added by the recognition component which classifies the input. The original input source, a pen in this case, will not be aware of this difference. The input source identifier will tell you that the input was from a pen of some kind but will not tell you if the mode of input was handwriting (show downtown) or gesture (e.g. circling an object or area).
Here is an example of the EMMA annotation for a pen input where the user's ink is recognized as either a word or as an arrow:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of> <emma:interpretation emma:id="interp1"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:id="interp2"> <direction>45</direction> </emma:interpretation> </emma:one-of> <rdf:RDF> <rdf:Description rdf:about="#interp1" emma:confidence="0.6" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="true"/> <rdf:Description rdf:about="#interp2" emma:confidence="0.4" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"/> </rdf:RDF> </emma:emma>
An analogous example for the inline syntax is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:one-of> <emma:interpretation emma:id="interp1" emma:confidence="0.6" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="true"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:id="interp2" emma:confidence="0.4" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> <direction>45</direction> </emma:interpretation> </emma:one-of> </emma:emma>
Here is an example of the EMMA annotation for a spoken command which is recognized as either Boston or Austin:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:one-of> <emma:interpretation emma:id="interp1"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:id="interp2"> <location>Austin</location> </emma:interpretation> </emma:one-of> <rdf:RDF> <rdf:Description rdf:about="#interp1" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true"/> <rdf:Description rdf:about="#interp2" emma:confidence="0.4" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true"/> </rdf:RDF> </emma:emma>
The following table shows the relationship between the medium, mode, and function properties and serves as an aid for classifying inputs. For the dialog function it also shows some examples of the classification of inputs as verbal vs. non-verbal.
Medium | Device | Mode | Function | |||
---|---|---|---|---|---|---|
recording | dialog | transcription | verification | |||
acoustic | microphone | speech | audiofile (e.g. voicemail) | spoken command / query / response (verbal = true) | dictation | speaker recognition |
singing a note (verbal = false) | ||||||
tactile | keypad | dtmf | audiofile / character stream | typed command / query / response (verbal = true) | text entry (T9-tegic, word completion, or word grammar) | password / pin entry |
command key "Press 9 for sales" (verbal = false) | ||||||
keyboard | keys | character / key-code stream | typed command / query / response (verbal = true) | typing | password / pin entry | |
command key "Press S for sales" (verbal = false) | ||||||
pen | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | signature, handwriter recognition | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | tapping on named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, tapping on map (verbal = false) | ||||||
mouse | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | N/A | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | clicking named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, clicking on map (verbal = false) | ||||||
joystick | ink | trace,sketch | gesture (e.g. circling building) (verbal = false) | N/A | N/A | |
gui | N/A | pointing, clicking button / menu (verbal = false) | soft keyboard | password / pin entry | ||
visual | page scanner | photograph | image | handwritten command / query / response (verbal = true) | optical character recognition, object/scene recognition (markup, e.g. SVG) | N/A |
drawings and images (verbal = false) | ||||||
still camera | photograph | image | objects (verbal = false) | visual object/scene recognition | face id, retinal scan | |
video camera | video | movie | sign language (verbal = true) | audio/visual recognition | face id, gait id, retinal scan | |
face / hand / arm / body gesture (e.g. pointing, facing) (verbal = false) |
This section concerns the scope of EMMA annotations across derivations of user input connected using the <derived-from> element (Section 3.2). The EMMA <derived-from> element (Section 3.2) can be used to capture both sequential and composite derivations. Sequential derivations involve processing steps that do not involve multimodal integration, such as applying natural language understanding and then reference resolution to a speech transcription.
Annotation scope in sequential derivations is addressed in Section 4.1. Composite derivations involve combination of inputs from multiple different input modes. These are addressed in Section 4.2 below. Note that an EMMA derivation may include both sequential and composite derivation steps. EMMA derivations describe only single turns of user input and are not intended to describe a sequence of dialogue turns.
In order to indicate whether an <emma:derived-from/> element describes a sequential derivation step or a composite derivation step, the <emma:derived-from/> has an attribute composite which has a boolean value. A composite <emma:derived-from/> needs to be marked as composite="true" while a sequential <emma:derived-from/> is marked as composite="false". If this attribute is not specified the value is "false" by default.
This section concerns the scope of EMMA annotations in sequential derivations. EMMA enables the annotation of whole derivations of user input. For example an EMMA document could contain <emma:interpretation> elements for the transcription, interpretation, and reference resolution of a speech input, utilizing the id values: raw, better, and best respectively:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:description rdf:about="#raw" emma:process="http://example.com/myasr1.xml"/> <rdf:description rdf:about="#better" emma:process="http://example.com/mynlu1.xml"> <emma:derived-from resource="#raw" composite="false"/> </rdf:Description> <rdf:description rdf:about="#best" emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> </rdf:description> </rdf:RDF> </emma:emma>
The inline variant is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="raw" emma:process="http://example.com/myasr1.xml"/>> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better" emma:process="http://example.com/mynlu1.xml"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best" emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
Each member of the derivation chain is linked to the previous one by a <derived-from> element (Section 3.1.5), which has an attribute resource that provides a pointer to the <emma:interpretation> from which it is derived. The <emma:process> annotation (Section 3.2.2) provides a pointer to the process used to for each stage of the derivation.
The scope of EMMA annotations becomes in EMMA documents with a more fully specified set of the EMMA annotations, as illustrated in the following example.
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2"> <transcript>from Boston to Denver tomorrow</transcript> </emma:interpretation> <emma:interpretation emma:id="better" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2">> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:description rdf:about="#raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"/> <rdf:description rdf:about="#better" emma:process="http://example.com/mynlu1.xml"> emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#raw" composite="false"/> </rdf:Description> <rdf:description rdf:about="#best" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> </rdf:description> </rdf:RDF> </emma:emma>
The inline variant is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better" emma:process="http://example.com/mynlu1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
EMMA annotations on earlier stages of the derivation may still be true of later stages of the derivation. Although this can be captured in EMMA by repeating the annotations on each emma:interpretation within the derivation, as in the example above, there are two disadvantages of this approach to annotation. First, the repetition of annotations makes the resulting EMMA documents significantly more verbose. Second, EMMA processors used for intermediate tasks such as natural language understanding and reference resolution will need to read in all of the annotations and write them all out again.
EMMA overcomes these problems by assuming that annotations on earlier stages of a derivation automatically apply to later stages of the derivation unless a new value is specified. Later stages of the derivation essentially inherit annotations from earlier stages in the derivation. For example, if there was an emma:source annotation on the transcription (raw) it would also apply to the later stages of the derivation such as the result of natural language understanding (better) or reference resolution (best).
Because of the assumption in EMMA that annotations have scope over later stages of a sequential derivation, the example EMMA document above can be equivalently represented as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="raw" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2"> <transcript>from Boston to Denver tomorrow</transcript> </emma:interpretation> <emma:interpretation emma:id="better"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:description rdf:about="#raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"/> <rdf:description rdf:about="#better" emma:process="http://example.com/mynlu1.xml"> emma:confidence="0.8"> <emma:derived-from resource="#raw" composite="false"/> </rdf:Description> <rdf:description rdf:about="#best" emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> </rdf:description> </rdf:RDF> </emma:emma>
The inline variant is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="raw" emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:id="better" emma:process="http://example.com/mynlu1.xml" emma:confidence="0.8"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:id="best" emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
The fully specified derivation illustrated above is equivalent to the reduced form derivation following it where only annotations with new values are specified at each stage. These two EMMA documents should be yield the same result when processed by an EMMA processor.
The emma:confidence annotation is respecified on the better interpretation. This indicates the confidence score for natural language understanding, whereas emma:confidence on the raw interpretation indicates the speech recognition confidence score.
In order to determine the full set of annotations that apply to an <emma:interpretation> element an EMMA processor or script needs to access the annotations directly on that element and for any that are not specified follow the reference in the resource attribute of the <emma:derived-from> element to add in annotations from earlier stages of the derivation.
The EMMA annotations breakdown into three groups with respect to their scope in sequential derivations. One group of annotations always hold true for all members of a sequential derivation. A second group are always respecified on each stage of the derivation. A third group may or may not be respecified.
Classification | Annotation |
---|---|
Applies to whole derivation | emma:signal |
emma:source | |
emma:medium | |
emma:mode | |
emma:function | |
emma:verbal | |
emma/xml:lang | |
emma:tokens | |
emma:start | |
emma:end | |
emma:from-start-of | |
emma:from-end-of | |
emma:start-offset | |
emma:end-offset | |
Specified at each stage of derivation | <emma:derived-from> |
emma:process | |
May be respecified | emma:confidence |
emma:model | |
emma:no-input | |
emma:uninterpreted |
One potential problem with this annotation scoping mechanism is that earlier annotations could be lost if earlier stages of a derivation were dropped in order to reduce message size. This problem can be overcome by considering annotation scope at the point where earlier derivation stages are discarded and populating the final interpretation in the derivation with all of the annotations which it could inherit. For example, if the raw and better stages were dropped the resulting EMMA document would be:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="best" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> <rdf:RDF> <rdf:description rdf:about="#best" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> </rdf:description> </rdf:RDF> </emma:emma>
The inline variant is as follows:
<emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="best" emma:start="2003-03-26T0:00:00" emma:end="2003-03-26T0:00:00.2" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation> </emma:emma>
In addition to representing sequential derivations, the EMMA <emma:derived-from> element can also be used to capture composite derivations. Composite derivations involve combination of inputs from different modes. In the following composite derivation example the user said "destination" and circled Boston on a map:
<emma:emma emma:version="1.0" xmlns="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="speech1" emma:start="2003-03-26T0:00:00.2" emma:end="2003-03-26T0:00:00.4"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation emma:id="pen1" emma:start="2003-03-26T0:00:00.1" emma:end="2003-03-26T0:00:00.3"> <rawinput>Boston</rawinput> </emma:interpretation> <emma:interpretation emma:id="multimodal1"> <destination>Boston</destination> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#speech1" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"/> <rdf:Description rdf:about="#pen1" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"/> <rdf:Description rdf:about="#multimodal1" emma:process="http://example.com/myintegrator.xml"> <emma:derived-from resource="#speech1" composite="true"/> <emma:derived-from resource="#pen1" composite="true"/> </rdf:Description> </rdf:RDF> </emma:emma>
The inline variant is:
<emma:emma emma:version="1.0" xmlns="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="speech1" emma:start="2003-03-26T0:00:00.2" emma:end="2003-03-26T0:00:00.4" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation emma:id="pen1" emma:start="2003-03-26T0:00:00.1" emma:end="2003-03-26T0:00:00.3" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> <rawinput>Boston</rawinput> </emma:interpretation> <emma:interpretation emma:id="multimodal1" emma:process="http://example.com/myintegrator.xml"> <emma:derived-from resource="#speech1" composite="true"/> <emma:derived-from resource="#pen1" composite="true"/> <destination>Boston</destination> </emma:interpretation> </emma:emma>
In this example, annotations on the multimodal interpretation indicate the process used for the integration and there are two <emma:derived-from> elements, one pointing to the speech and one pointing to the pen gesture.
In EMMA, while annotations are assumed to have scope over later stages in sequential derivation, they are not assumed to have scope over compositional derivation steps. Annotations do not have scope over composition derivation steps because the combining inputs often have different values of a given annotation, as in the annotations: emma:signal, emma:source, emma:confidence, <emma:start>, and <emma:end>. For some of these annotations, no single value can be determined for the multimodal intepretation, for example, emma:signal and emma:source. For others a single value may be computed for the multimodal interpretation, but it may involve more than simple inheritance. For example, the value of <emma:start> for the multimodal interpretation should be the earlier of the two time values from the two combining inputs. In the above example: emma:start="2003-03-26T0:00:00.1". For <emma:end> it should be the later of the two values on the combining inputs: emma:end="2003-03-26T0:00:00.4". In the case of emma:confidence, the value for the composite is result of a numerical function defined by the author of the multimodal integration component or script. In the case of other annotations such as emma:verbal, if either of the inputs has the value true then the multimodal interpretation is emma:verbal="true". In other words the annotation for the composite input is the result of an inclusive OR of the boolean values of the annotations on the inputs.
If an annotation is only specified in one of the combining inputs then it can be assumed to apply to the multimodal interpretation of the composite input. For example, emma:lang="en-US" is only specified for the speech input.
Given the complexity of annotation scope across composite derivation steps, EMMA does not require any annotations to have scope over composite derivation steps. However, guidance is provided here for authors of multimodal integration components as to how EMMA annotations should be handled in composite derivations. The following table breaks down EMMA annotations in categories depending on their behavior in composite derivations.
Classification | Annotation | Function for value |
---|---|---|
1. Always has different values | emma:signal | 'multiple' |
emma:source | ||
emma:tokens | ||
emma:process | New value(s) describing composite integration | |
<emma:derived-from> | ||
2. Sometimes has different values | emma:medium | Common value or 'multiple' if they conflict |
emma:mode | ||
emma/xml:lang | ||
emma:model | ||
3. Function combines values | emma:start | The earlier of the two start timestamps (standard) |
emma:end | The later of the two end timestamps (standard) | |
emma:from-start-of | TBD (see open issue below) | |
emma:from-end-of | TBD (see open issue below) | |
emma:start-offset | TBD (see open issue below) | |
emma:end-offset | TBD (see open issue below) | |
emma:confidence | combination of confidence scores (author-defined) | |
emma:function | some functions are dominant (e.g. 'dialog') (standard) | |
emma:verbal | inclusive OR of values (standard) | |
4. Not integrated | emma:uninterpreted | Not applicable |
emma:no-input |
When a multimodal integration component generates the EMMA document for composite intepretation, each of these sets of EMMA annotations should be handled as indicated below.
1. Always has different values: The value of the annotation on the multimodal interpretation should be multiple indicating the presence of the conflict. In the case of emma:process and <emma:derived-from>, there will be new value(s) describing the integration process and references to the combined inputs.
2. Sometimes has different values: If the values of an annotation are the same for the combined inputs then that value should be used in the annotation on the composite. If they are not the same then the annotation value on the multimodal interpretation should be multiple indicating the presence of the conflict. If an annotation only appears on one of the inputs, then the value for the input that has the annotation should be used for the composite.
3. Function combines values: The values should be combined in accordance with the specific function require for that annotation. For some annotations the combination function is standard; e.g. earliest value for emma:start, latest value for emma:end, inclusive OR for emma:verbal. For others, such as emma:confidence there is no standard function and the function used will be defined by the application developer.
4. Not integrated: Inputs with these annotations will not be part of composite inputs and so they will not need to be annotated in composite interpretations.
For 1. and 2. above, conflicts are indicated on the annotations on the composite using the value multiple. If the values of the annotations on the combining inputs are needed then they can be accessed through the pointers in the resource attributes in the <emma:derived-from> elements. However if the early stages of the derivation have been dropped or are only remotely accessible this may not be feasible. Unlike the sequential derivation case, since the values may clash, the problem cannot be avoided by fully instantiating the <emma:interpretation> at the end of the derivation chain.
In order to address this problem, values of conflicting annotations must be indicated directly on the <emma:derived-from> element. There will be one <emma:derived-from> element for each combining input, providing a place holder for annotations with conflicting values.
The fully specified EMMA document for the composite input described above is as follows:
<emma:emma emma:version="1.0" xmlns="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="speech1" emma:start="2003-03-26T0:00:00.2" emma:end="2003-03-26T0:00:00.4"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation emma:id="pen1" emma:start="2003-03-26T0:00:00.1" emma:end="2003-03-26T0:00:00.3"> <rawinput>Boston</rawinput> </emma:interpretation> <emma:interpretation emma:id="multimodal1" emma:start="2003-03-26T0:00:00.1" emma:end="2003-03-26T0:00:00.4"> <destination>Boston</destination> </emma:interpretation> <rdf:RDF> <rdf:Description rdf:about="#speech1" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"/> <rdf:Description rdf:about="#pen1" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> </rdf:description> <rdf:Description rdf:about="#multimodal1" emma:process="http://example.com/myintegrator.xml" emma:source="multiple" emma:signal="multiple" emma:confidence="0.3" emma:medium="multiple" emma:mode="multiple" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <emma:derived-from resource="#speech1" composite="true" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:medium="acoustic" emma:mode="speech"/> <emma:derived-from resource="#pen1" composite="true" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:medium="tactile" emma:mode="ink"/> </rdf:description> </rdf:RDF> </emma:emma>
The inline variant is:
<emma:emma emma:version="1.0" xmlns="http://www.w3.org/2003/04/emma#"> <emma:interpretation emma:id="speech1" emma:start="2003-03-26T0:00:00.2" emma:end="2003-03-26T0:00:00.4" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation emma:id="pen1" emma:start="2003-03-26T0:00:00.1" emma:end="2003-03-26T0:00:00.3" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> <rawinput>Boston</rawinput> </emma:interpretation> <emma:interpretation emma:id="multimodal1" emma:source="multiple" emma:signal="multiple" emma:confidence="0.3" emma:medium="multiple" emma:mode="multiple" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <emma:derived-from resource="#speech1" composite="true" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:medium="acoustic" emma:mode="speech"/> <emma:derived-from resource="#pen1" composite="true" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:medium="tactile" emma:mode="ink"/> <destination>Boston</destination> </emma:interpretation> </emma:emma>
In this example, the annotations for emma:source, emma:signal, emma:medium, and emma:mode all have conflicting values on the inputs (#speech1and #pen1) and are marked as "multiple" on the composite interpretation (#multimodal1). The emma:lang and emma:tokens are only specified on the speech (#speech1) and therefore are inherited by the composite interpretation (#multimodal1). The <emma:start> and <emma:end> annotations are combined by standard functions yielding the earliest and latest time values respectively on #multimodal1. The emma:verbal annotation and emma:function annotations are determined by standard combination functions. Since the emma:verbal annotation is "true" on the speech (#speech1)and "false" on the pen (#pen1), the annotation on the composite interpretation is "true". Since both the speech and pen have emma:function="dialog", the composite is annotated as emma:function="dialog". The emma:confidence annotation on the composite is determined by a non-standard function defined by the author of the integration component. In this case the function is multiplication and the resulting annotation is emma:confidence="0.3".
In implementing an EMMA processor for composite input, the EMMA annotations for timestamps, emma:function and emma:verbal on the EMMA document representing the composite input should be handled as indicated in the table above. This is a constraint on documents representing composite derivation in EMMA.
The treatment of relative timestamp annotations, using emma:from-start-of, emma:from-end-of, emma:start-offset, and emma:end-offset, is still currently an open issue under discussion and will be resolved in the next draft. One of the issues that arises is whether two combining inputs with relative timestamps should be required to be anchored with respect to the same reference point. Another issue concerns how to determine the combined timestamp annotation when one of the combining inputs has an absolute timestamp and the other has a relative timestamp.
(TBD)
Conformance issues are deferred until a later revision of the specification.
This section defines the formal syntax for EMMA documents in terms of a normative XML Schema, an informative Document Type Definition (DTD) and a normative RDF Schema for the RDF properties defined by EMMA.
(TBD)
Leading and trailing spaces in utterances are not significant. This will be defined in the Schema by specifying "xml:space=default".
(TBD)
(This section is informative)
(TBD)
This is the RDF Schema for the RDF properties defined in the EMMA namespace. It provides a binding of these properties to human readable descriptions. Implementors are recommended to provide an RDF Schema for any application specific RDF properties used to extend EMMA. It is NOT permissible to include such extensions within the EMMA namespace.
Normative References
Informative References:
The editors would like to recognize the contributions of the following members of the W3C Multimodal Interaction Group (listed in alphabetical order):
Paolo Baggia, Loquendo
Daniel Burnett, Nuance Communications
Max Froumentin, W3C
Katriina Halonen, Nokia
Gerald McCobb, IBM
Stephen Potter, Microsoft
Yuan Shao, Canon