Multimodal Architecture and Interfaces

A Contributors

The following people contributed to the development of this specification.

Brad Porter
T.V. Raman

B Examples of Life-Cycle Events

In this specification we use elements from a fictional "dcont" namespace in some examples. The W3C Ubiquitous Web Application Working Group (UWA-WG) is developing such an ontology and expects to define a "dcont" namespace. The examples below are informative only and may, unintentionally, be incompatible with the work of the UWA-WG. For authoritative information on a (future) "dcont" namespace, please consult the Delivery Context Ontology specification.

1. newContextRequest (from MC to IM)

(The definition of "media" and the details of the media element will be discussed in the next draft.)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:newContextRequest source="someURI" requestID="request-1">
		<media id="mediaID1>media1</media>
		<media id="mediaID2">media2</media>
   	<mmi:data xmlns:dcont="http://www.w3.org/2008/04/dcont">
     		<dcont:DeliveryContext>
   		... 
   		</dcont:DeliveryContext >
   	</mmi:data>
   </mmi:newContextRequest>
</mmi:mmi>

2. newContextResponse (from IM to MC)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:newContextResponse source="someURI" requestID="request-1" status="success" context="URI-1">
   	<media>media1</media>
   	<media>media2</media>
   </mmi:newContextResponse>
</mmi:mmi>

3. prepareRequest (from IM to MC, with external markup)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:prepareRequest source="someURI" context="URI-1" requestID="request-1">
   	<mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s"/>
	</mmi:prepareRequest>
</mmi>

4. prepareRequest (from IM to MC, inline VoiceXML markup)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:prepareRequest source="someURI" context="URI-1" requestID="request-1" >
   	<mmi:content>
   		<vxml:vxml version="2.0">
   			<vxml:form>
   				<vxml:block>Hello World!</vxml:block>
   			</vxml:form>
   		</vxml:vxml>
   	</mmi:content>
   </mmi:prepareRequest>
</mmi:mmi>

5. prepareResponse (from MC to IM, success)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:prepareResponse source="someURI" context="someURI" requestID="request-1" status="success"/>
</mmi:mmi>

6. prepareResponse (from MC to IM, failure)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:prepareResponse source="someURI" context="someURI" requestID="request-1" status="failure">
   	<mmi:statusInfo>
   		NotAuthorized
  	    </mmi:statusInfo>
   </mmi:prepareResponse>
</mmi:mmi>

7. startRequest (from IM to MC)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:startRequest source="someURI" context="URI-1" requestID="request-1">
   	<mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s">
  </mmi:startRequest>
</mmi>

8. startResponse (from MC to IM)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:startResponse source="someURI" context="someURI" requestID="request-1" status="failure">
   	<mmi:statusInfo>
   		NotAuthorized
   	</mmi:statusInfo>
   </mmi:startResponse>
</mmi:mmi>

9. doneNotification (from MC to IM, with EMMA result)

This requestID corresponds to the requestID of the "startRequest" event that started it.

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="someURI" context="someURI" status="success" requestID="request-1" >
  		<mmi:data>
   		<emma:emma version="1.0"
  				<emma:interpretation id="int1" emma:medium="acoustic" emma:confidence=".75" emma:mode="voice" emma:tokens="flights from boston to denver">
   				<origin>Boston</origin>
   				<destination>Denver</destination>
   				</emma:interpretation>
   			</emma:emma>
   	</mmi:data>
   </mmi:doneNotification>
</mmi:mmi>

10. doneNotification (from MC to IM, with EMMA "no-input" result)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:doneNotification source="someURI" context="someURI" status="success" requestID="request-1" >
  	 <mmi:data>
   		<emma:emma version="1.0"
 				<emma:interpretation id="int1" emma:no-input="true"/>
   		</emma:emma>
    </mmi:data>
   </mmi:doneNotification>
</mmi:mmi>

11. cancelRequest (from IM to MC)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:cancelRequest context="someURI" source="someURI" immediate="true" requestID="request-1">
   </mmi:cancelRequest>
</mmi>

12. cancelResponse (from MC to IM)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:cancelResponse source="someURI" context="someURI" requestID="request-1" status="success"/>
   </mmi:cancelResponse>
</mmi:mmi>

13. pauseRequest (from IM to MC)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:pauseRequest context="someURI" source="someURI" immediate="true" requestID="request-1"/>
</mmi>

14. pauseResponse (from MC to IM)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:cancelResponse source="someURI" context="someURI" requestID="request-1" status="success"/>
</mmi:mmi>

15. resumeRequest (from IM to MC)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:resumeRequest context="someURI" source="someURI" requestID="request-1"/>
</mmi>

16. resumeResponse (from MC to IM)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:resumelResponse source="someURI" context="someURI" requestID="request-2" status="success"/>
</mmi:mmi>

17. extensionNotification (formerly the data event, sent in both directions)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> 
   <mmi:extensionNotification name="appEvent" source="someURI" context="someURI" requestID="request-1" >
   	<applicationdata/> 
   </mmi:extensionNotification>
</mmi:mmi>

18. clearContextRequest (from the IM to the MC)

<mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0">
   <mmi:clearContextRequest source="someURI" context="someURI" requestID="request-2"/>
</mmi:mmi>

19. statusRequest (from the IM to the MC)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> 
   <mmi:statusRequest requestAutomaticUpdate="true" source="someURI" requestID="request-3"/>
</mmi:mmi>

20. statusResponse (from the MC to the IM)

<mmi:mmi xmlns="http://www.w3.org/2008/04/mmi-arch" version="1.0"> 
	<mmi:statusResponse automaticUpdate="true" status="alive" source="someURI" requestID="request-3"/> 
</mmi:mmi>

mmi.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 NewContextRequest schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="NewContextRequest.xsd"/>
	<xs:include schemaLocation="NewContextResponse.xsd"/>
	<xs:include schemaLocation="ClearContextRequest.xsd"/>
	<xs:include schemaLocation="ClearContextResponse.xsd"/>
	<xs:include schemaLocation="CancelRequest.xsd"/>
	<xs:include schemaLocation="CancelResponse.xsd"/>
	<xs:include schemaLocation="CreateRequest.xsd"/>
	<xs:include schemaLocation="CreateResponse.xsd"/>
	<xs:include schemaLocation="DoneNotification.xsd"/>
	<xs:include schemaLocation="ExtensionNotification.xsd"/>
	<xs:include schemaLocation="PauseRequest.xsd"/>
	<xs:include schemaLocation="PauseResponse.xsd"/>
	<xs:include schemaLocation="PrepareRequest.xsd"/>
	<xs:include schemaLocation="PrepareResponse.xsd"/>
	<xs:include schemaLocation="ResumeRequest.xsd"/>
	<xs:include schemaLocation="ResumeResponse.xsd"/>
	<xs:include schemaLocation="StartRequest.xsd"/>
	<xs:include schemaLocation="StartResponse.xsd"/>
	<xs:include schemaLocation="StatusRequest.xsd"/>
	<xs:include schemaLocation="StatusResponse.xsd"/>
	<xs:element name="mmi">
		<xs:complexType>
			<xs:choice>
				<xs:sequence>
					<xs:element ref="mmi:newContextRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:newContextResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:clearContextRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:clearContextResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:cancelRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:cancelResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:createRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:createResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:doneNotification"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:extensionNotification"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:pauseRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:pauseResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:prepareRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:prepareResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:resumeRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:resumeResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:startRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:startResponse"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:statusRequest"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element ref="mmi:statusResponse"/>
				</xs:sequence>
			</xs:choice>
			<xs:attributeGroup ref="mmi:mmi.version.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

mmi-datatypes.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 general Type definition schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:simpleType name="versionType">
		<xs:restriction base="xs:decimal">
			<xs:enumeration value="1.0"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="mediaContentTypes">
		<xs:restriction base="xs:string">
			<xs:enumeration value="media1"/>
			<xs:enumeration value="media2"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="mediaAttributeTypes">
		<xs:restriction base="xs:string">
			<xs:enumeration value="mediaID1"/>
			<xs:enumeration value="mediaID2"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="sourceType">
		<xs:restriction base="xs:string"/>
	</xs:simpleType>
	<xs:simpleType name="targetType">
		<xs:restriction base="xs:string"/>
	</xs:simpleType>
	<xs:simpleType name="requestIDType">
		<xs:restriction base="xs:string"/>
	</xs:simpleType>
	<xs:simpleType name="contextType">
		<xs:restriction base="xs:string"/>
	</xs:simpleType>
	<xs:simpleType name="statusType">
		<xs:restriction base="xs:string">
			<xs:enumeration value="success"/>
			<xs:enumeration value="failure"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="statusResponseType">
		<xs:restriction base="xs:string">
			<xs:enumeration value="alive"/>
			<xs:enumeration value="dead"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="immediateType">
		<xs:restriction base="xs:boolean"/>
	</xs:simpleType>
	<xs:complexType name="contentURLType">
		<xs:attribute name="href" type="xs:anyURI" use="required"/>
		<xs:attribute name="max-age" type="xs:string" use="optional"/>
		<xs:attribute name="fetchtimeout" type="xs:string" use="optional"/>
	</xs:complexType>
	<xs:complexType name="contentType">
		<xs:sequence>
			<xs:any namespace="http://www.w3.org/2001/vxml" processContents="skip" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="emmaType">
		<xs:sequence>
			<xs:any namespace="http://www.w3.org/2003/04/emma" processContents="skip" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="anyComplexType" mixed="true">
		<xs:complexContent mixed="true">
			<xs:restriction base="xs:anyType">
				<xs:sequence>
					<xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/>
				</xs:sequence>
			</xs:restriction>
		</xs:complexContent>
	</xs:complexType>
	
</xs:schema>

mmi-attribs.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 general Type definition schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:attributeGroup name="media.id.attrib">
		<xs:attribute name="id" type="mmi:mediaAttributeTypes" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="mmi.version.attrib">
		<xs:attribute name="version" type="mmi:versionType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="source.attrib">
		<xs:attribute name="source" type="mmi:sourceType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="target.attrib">
		<xs:attribute name="target" type="mmi:targetType" use="optional"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="requestID.attrib">
		<xs:attribute name="requestID" type="mmi:requestIDType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="context.attrib">
		<xs:attribute name="context" type="mmi:contextType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="immediate.attrib">
		<xs:attribute name="immediate" type="mmi:immediateType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="status.attrib">
		<xs:attribute name="status" type="mmi:statusType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="statusResponse.attrib">
		<xs:attribute name="status" type="mmi:statusResponseType" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="extension.name.attrib">
		<xs:attribute name="name" type="xs:string" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="requestAutomaticUpdate.attrib">
		<xs:attribute name="requestAutomaticUpdate" type="xs:boolean" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="automaticUpdate.attrib">
		<xs:attribute name="automaticUpdate" type="xs:boolean" use="required"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="group.allEvents.attrib">
		<xs:attributeGroup ref="mmi:source.attrib"/>
		<xs:attributeGroup ref="mmi:requestID.attrib"/>
		<xs:attributeGroup ref="mmi:context.attrib"/>
	</xs:attributeGroup>
	<xs:attributeGroup name="group.allResponseEvents.attrib">
		<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
		<xs:attributeGroup ref="mmi:status.attrib"/>
	</xs:attributeGroup>
	
</xs:schema>

mmi-elements.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 general elements definition schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	
	<!-- ELEMENTS -->
	<xs:element name="statusInfo" type="mmi:anyComplexType"/>
	<xs:element name="media">
		<xs:complexType>
			<xs:simpleContent>
				<xs:extension base="mmi:mediaContentTypes">
					<xs:attributeGroup ref="mmi:media.id.attrib"/>
				</xs:extension>
			</xs:simpleContent>
		</xs:complexType>
	</xs:element>
</xs:schema>

NewContextRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 NewContextRequest schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	<xs:import namespace="http://www.w3.org/2008/04/dcont" schemaLocation="dcont.xsd"/>

	<xs:element name="newContextRequest">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" maxOccurs="unbounded"/>
				<xs:element name="data">
					<xs:complexType>
						<xs:sequence>
							<xs:element ref="dcont:DeliveryContext"/>
						</xs:sequence>
					</xs:complexType>
				</xs:element>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

NewContextResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 NewContextResponse schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="newContextResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

PrepareRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 PrepareRequest schema for MMI Life cycle events version 1.0. 
			 The optional PrepareRequest event is an event that the Runtime Framework may send 
			 to allow the Modality Components to pre-load markup and prepare to run (e.g. in case of 
			 VXML VUI-MC). Modality Components are not required to take any particular action in 
			 response to this event, but they must return a PrepareResponse event.
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="prepareRequest">
		<xs:complexType>
			<xs:choice>
				<xs:sequence>
					<xs:element name="contentURL" type="mmi:contentURLType"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element name="content" type="mmi:anyComplexType"/>
					<!-- only vxml permitted ?? -->
				</xs:sequence>
				<!-- data really needed ?? -->
				<xs:sequence>
					<xs:element name="data" type="mmi:anyComplexType"/>
				</xs:sequence>
			</xs:choice>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

PrepareResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 CreateResponse schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="prepareResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
	
</xs:schema>

StartRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 StartRequest schema for MMI Life cycle events version 1.0. 
			 The Runtime Framework sends the event StartRequest to invoke a Modality Component 
			 (to start loading a new GUI resource or to start the ASR or TTS). The Modality Component 
			 must return a StartResponse event in response. If the Runtime Framework has sent a previous
			 PrepareRequest event, it may leave the contentURL and content fields empty, and the Modality
			 Component will use the values from the PrepareRequest event. If the Runtime Framework includes 
			 new values for these fields, the values in the StartRequest event override those in the 
			 PrepareRequest event.
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="startRequest">
		<xs:complexType>
			<xs:choice>
				<xs:sequence>
					<xs:element name="contentURL" type="mmi:contentURLType"/>
				</xs:sequence>
				<xs:sequence>
					<xs:element name="content" type="mmi:anyComplexType"/>
					<!-- only vxml permitted ?? -->
				</xs:sequence>
				<!-- data really needed ?? -->
				<xs:sequence>
					<xs:element name="data" type="mmi:anyComplexType"/>
				</xs:sequence>
			</xs:choice>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

StartResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 CreateResponse schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="startResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="data" minOccurs="0" type="mmi:anyComplexType"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

DoneNotification.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 DoneNotification schema for MMI Life cycle events version 1.0. 
			 The DoneNotification event is intended to be used by the Modality Component to indicate that
			 it has reached the end of its processing. For the VUI-MC it can be used to return the ASR
			 recognition result (or the status info: noinput/nomatch) and TTS/Player done notification. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="doneNotification">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="data" type="mmi:anyComplexType"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
	
</xs:schema>

CancelRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 CancelRequest schema for MMI Life cycle events version 1.0. 
			 The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality 
			 Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a 
			 CancelResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="cancelRequest">
		<xs:complexType>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
			<xs:attributeGroup ref="mmi:immediate.attrib"/>
			<!-- no elements -->
		</xs:complexType>
	</xs:element>
</xs:schema>

CancelResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 CancelResponse schema for MMI Life cycle events version 1.0. 
			 The CancelRequest event is sent by the Runtime Framework to stop processing in the Modality 
			 Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a 
			 CancelResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="cancelResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

PauseRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 PauseRequest schema for MMI Life cycle events version 1.0. 
			 The PauseRequest event is sent by the Runtime Framework to pause processing of a Modality 
			 Component (e.g. to cancel ASR or TTS/Playing). The Modality Component must return with a 
			 PauseResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="pauseRequest">
		<xs:complexType>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
			<xs:attributeGroup ref="mmi:immediate.attrib"/>
			<!-- no elements -->
		</xs:complexType>
	</xs:element>
</xs:schema>

PauseResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema"
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 PauseResponse schema for MMI Life cycle events version 1.0. 
			 The PauseRequest event is sent by the Runtime Framework to pause the processing of
			 the Modality Component (e.g. to cancel ASR or TTS/Playing). The Modality Component 
			 must return with a PauseResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="pauseResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

ResumeRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ResumeRequest schema for MMI Life cycle events version 1.0. 
			 The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended 
			 processing task of a Modality Component. The Modality Component must return with a 
			 ResumeResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="resumeRequest">
		<xs:complexType>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
			<xs:attributeGroup ref="mmi:immediate.attrib"/>
			<!-- no elements -->
		</xs:complexType>
	</xs:element>
</xs:schema>

ResumeResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ResumeRequest schema for MMI Life cycle events version 1.0. 
			 The ResumeRequest event is sent by the Runtime Framework to resume a previously suspended 
			 processing task of a Modality Component. The Modality Component must return with a 
			 ResumeResponse message. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="resumeResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

ExtensionNotification.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
				targetNamespace="http://www.w3.org/2008/04/mmi-arch" attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ExtentionNotification schema for MMI Life cycle events version 1.0. 
			 The extensionNotification event may be generated by either the Runtime Framework or the 
			 Modality Component and is used to communicate (presumably changed) data values to the 
			 other component. E.g. the VUI-MC has signaled a recognition result for any field displayed 
			 on the GUI, the event will be used by the Runtime Framework to send a command to the 
			 GUI-MC to update the GUI with the recognized value. 
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	
	<xs:element name="extensionNotification">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="data" type="mmi:anyComplexType"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
			<xs:attributeGroup ref="mmi:extension.name.attrib"/>
		</xs:complexType>
	</xs:element>
	
</xs:schema>

ClearContextRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ClearContextRequest schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>

	<xs:element name="clearContextRequest">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

ClearContextResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ClearContextResponse schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="clearContextResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

StatusRequest.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ClearContextRequest schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>

	<xs:element name="clearContextRequest">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

StatusResponse.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dcont="http://www.w3.org/2008/04/dcont" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2008/04/mmi-arch" 
				attributeFormDefault="qualified" elementFormDefault="qualified">
	<xs:annotation>
		<xs:documentation xml:lang="en">
			 ClearContextResponse schema for MMI Life cycle events version 1.0
		</xs:documentation>
	</xs:annotation>
	<xs:include schemaLocation="mmi-datatypes.xsd"/>
	<xs:include schemaLocation="mmi-attribs.xsd"/>
	<xs:include schemaLocation="mmi-elements.xsd"/>
	
	<xs:element name="clearContextResponse">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="mmi:media" minOccurs="0" maxOccurs="unbounded"/>
				<xs:element ref="mmi:statusInfo" minOccurs="0"/>
			</xs:sequence>
			<xs:attributeGroup ref="mmi:group.allResponseEvents.attrib"/>
			<xs:attributeGroup ref="mmi:target.attrib"/>
		</xs:complexType>
	</xs:element>
</xs:schema>

The following ladder diagram shows a possible message sequence upon a session creation. We assume that the Runtime Framework and a Interaction Manager session is already up and running. The user starts a multimodal session for example by starting a web browser and fetching a given URL.

The initial document contains scripts which providing the modality component functionality (e.g. understanding XML formatted life cycle events) and message transport capabilities (e.g. AJAX, but depends on the exact system implementation).

After loading the initial documents (and scripts) the modality component implementation issues a mmi:newContextRequest message to the Runtime Framework. The Runtime Framework may load a corresponding markup document, if necessary (could be SCXML), and initializes and starts the Interaction Manager.

In this sceneario the Interaction Manager manager logic issues a number of mmi:startRequest messages to the various modality components. One message is sent to the graphical modality component (GUI) to instruct it to load a HTML document. Another message is sent to a voice modality component (VUI) to play a welcome message.

The voice modality component has (in this example) to create a VoiceXML session. As VoiceXML 2.1 does not provide an external event interface a CCXML session will be used for external asynchronous communication. Therefore the voice modality component uses the session creation interface of CCXML 1.0 to create a session and start a corresponding script. This script will then make a call to a phone at the user device (which could be a regular phone or a SIP soft phone on the user's device). This scenario illustrates the use of a SIP phone, which may reside on the users mobile handset.

After successful setup of a CCXML session and the voice connection the voice modality component instructs the CCXML browser to start a VoiceXML dialog and passing it a corresponding VoiceXML script. The VoiceXML interpreter will execute the script and play out the welcome message. After the execution of the VoiceXML script has finished, the voice modality component notifies the Interaction Manager using the mmi:done event.

D.2 Processing User Input

The next diagram gives a example for the possible message flow while processing of user input. In the given scenario the user wants to enter information using the voice modality component. To start the voice input the user has to use the "push-to-talk" button. The "push-to-talk" button (which might be a hardware button or a soft button on the screen) generates a corresponding event when pushed. This event is issues as a mmi:extension event towards the Interaction Manager. The Interaction Manager logic sends a mmi:startRequest to the voice modality component. This mmi:startRequest message contains a URL which points to a corresponding VoiceXML script. The voice modality component again starts a VoiceXML interpreter using the given URL. The VoiceXML interpreter loads the document and executes it. Now the system is ready for the user input. To notify the user about the availabilty of the voice input functionality the Interaction Manager might send an event to the GUI upon receiving the mmi:startResponse event (which indicates that the voice modality component has started to execute the document). But note that this is not shown in the picture.

The VoiceXML interpreter captures the users voice input and uses a speech recognition engine to recognize the utterance. The speech recognition result will be represented as an EMMA document and sent to the interaction manager using the mmi:done message. The Interaction Manager logic sends a mmi:extension message to the GUI modality component to instruct it to display the recognition result.

D.3 Ending a Session

In the following sceneario a modality component instance will be destroyed as a reaction to a user input, e.g. because the user selected to change to the GUI only mode. In this case a mmi:clearContextRequest will be issued to the voice modality component. The voice modality component wrapper will then destroy the CCXML (and VoiceXML) session.

The application logic (i.e. the IM) may also decide to indicate the removed voice functionality and disable an icon on the screen which indicates the availablity of the voice modality.

E Glossary

CCXML: CCXML is designed to provide telephony call control support for dialog systems, such as VoiceXML.
Controller Document: A document that contains markup defining the interaction between the other documents. Such markup is called Interaction Manager markup.
Data Component: The Data Component is a sub-component of the Runtime Framework which is responsible for storing application-level data.
DCCI: Platform and language neutral programming interfaces that provide Web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions. (http://www.w3.org/TR/DPF/)
Interaction Manager: The Interaction Manager (IM) is the sub-component of the Runtime Framework that is responsible for handling all events that the other Components generate. It is responsible for synchronization of data and focus, etc., across different Modality Components as well as the higher-level application flow that is independent of Modality Components.
Life cycle events:The Multimodal Architecture defines basic life-cycle events which must be supported by all modality components. These events allow the Runtime Framework to invoke modality components and receive results from them. They form the basic interface between the Runtime Framework and the Modality components.
Modality Component: Modality Components are responsible for controlling the various input and output modalities on the device. Modality components may also be used to perform general processing functions not directly associated with any specific interface modality, for example, dialog flow control or natural language processing
Nested components: A Runtime Framework and a set of Components can present themselves as a Component to a higher-level Framework. All that is required is that the Framework implement the Component API. The result is a "Russian Doll" model in which Components may be nested inside other Components to an arbitrary depth.
Runtime Framework: The Runtime Framework is responsible for starting the application and interpreting the Controller Document. It provides the basic infrastructure which the various Modality Components plug into and controls the communication among the other Constituents.
SCXML: "State Chart extensible Markup Language". SCXML provides a generic state-machine based execution environment based on CCXML and Harel State Tables.
Software Constituent: An architecturally significant entity in the architecture. Because we are using the term 'Component' to refer to a specific set of entities in our architecture, we will use the term 'Constituent' as a cover term for all the elements in our architecture which might normally be called 'software components'.
VoiceXML: VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations

F Use Case Discussion

This section presents a detailed example of how an implementation of this architecture. For the sake of concreteness, it specifies a number of details that are not included in this document. It is based on the MMI use case document [MMIUse], specifically the second use case, which presents a multimodal in-car application for giving driving directions. Three languages are involved in the design view:

The Controller/Interaction Manager markup language. We will not specify this language but will assume that it is capable of representing a reasonably powerful state machine.
The graphical language. We will assume that this is HTML.
The voice language . We will assume that this VoiceXML. For concreteness, we will use VoiceXML 2.0 [VXML], but will also note differences in behavior that might occur with a future version of VoiceXML

The remainder of the discussion involves the run-time view. The numbered items are taken from the "User Action/External Input" field of the event table. The appended comments are based on the working group's discussion of the use case.

User Presses Button on wheel to start application. Comment: The Runtime Framework submits to a pre-configured URL and receives a session cookie in return. This cookie will be included in all subsequent submissions. Now the Runtime Framework loads the DCCI framework, retrieves the default user and device profile and submits them to a (different) URl to get the Controller Document. UAPROF can be used for standard device characteristics (screen size, etc.), but it is not extensible and does not cover user preferences. The DCCI group is working on a profile definition that provides an extensible set of attributes and can be used here. Once the initial profile submission is made, only updates get sent in subsequent submissions. Once the Runtime Framework loads the Controller, it notes that it references both VoiceXML and HTML documents. Therefore it makes sure that the corresponding Modality Components are loaded, and then sends Prepare for each Component. These events contain the Context ID and the Component-specific markup (VoiceXML or HTML). If the markup was included in the root document, it is delivered in-line in the event. However, if the main document referenced the Component-specific markup via URL, only the URL is passed in the event. Once the Modality Components receive the Prepare event, they parse their markup, initialize their resources (ASR, TTS, etc.) and return PrepareResponse events. The IM responds with Start events and the application is ready to interact with the user.
The user interacts in an authentication dialog. Comment: The Runtime Framework sends the Start command to the VoiceXML Modality component, which executes a Form asking the user to identify himself. In VoiceXML 3.0, the Form might make use of speaker verification as well as speech recognition. Any database access or other back-end interaction is handled inside the Form. In VoiceXML 2.0, the recognition results (which include the user's indentity) will be returned to the IM by the <exit> tag along with a namelist. This would mean that the specific logical Modality Component instance had exited, so that any further voice interactions would have to be handled by a separate logical Modality Component corresponding to a separate Presentation Document. In VoiceXML 3.0, however, it would be possible for the Modality Component instance to send a recognition result event to the IM without exiting. It would then be sitting there, waiting for the IM to send it another event to trigger further processing. Thus in VoiceXML 3.0, all the voice interactions in the application could be handled by a single Markup Component (section of VoiceXML markup) and a single logical Modality Component.
Recognition can be done locally, remotely (on the server) or distributed between the device and the server. By default, the location of event handling is determined by the markup. If there is a local handler for an event specified in the document, the event is handled locally. If not, the event is forwarded to the server. Thus if the markup specifies a speech-started event handler, that event will be consumed locally. Otherwise it will be forwarded to the server. However, remote ASR requires more than simply forwarding the speech-started event to the server because the audio channel must be established. This level of configuration is handled by the device profile, but can be overridden by the markup. Note that the remote server might contain a full VoiceXML interpreter as well as ASR capabilities. In that case, the relevant markup would be sent to the server along with the audio. The protocol used to control the remote recognizer and ship it audio is not part of the MMI specification (but may well be MRCP.)

Open Issue: The previous paragraph about local vs remote event handling is retained from an earlier draft. Since the Modality Component is a black box to the Runtime Framework, the local vs remote distinction should be internal to it. Therefore the event handlers would have to be specified in the VoiceXML markup. But no such possibility exists in VoiceXML 2.0. One option would be to make the local vs remote distinction vendor-specific, so that each Modality Component provider would decide whether to support remote operations and, if so, how to configure them. Alternatively, we could define the DCCI properties for remote recognition, but make it optional that vendors support them. In either case, it would be up to the VoiceXML Modality Component communicate with the remote server, etc. Newer languages, such as VoiceXML 3.0 could be designed to allow explicit markup control of local vs remote operations. Note that in the most complex case, there could be multiple simultaneous recognitions, some of which were local and some remote. This level of control is most easily achieved via markup, by attaching properties to individual grammars. DCCI properties are more suitable for setting global defaults.

When the IM receives the recognition result event, it parses it and retrieves the user's preferences from the DCCI component, which it then dispatches to the Modality Components, which adjust their displays, output, default grammars, etc. accordingly. In VoiceXML 2.0, each of the multiple voice Modality Components will receive the corresponding event.
Initial GPS input. Comment: DCCI configuration determines how often GPS update events are raised. On the first event, the IM sends the HTML Modality Component an command to display the initial map. On subsequent events, a handler in the IM markup determines if the automobile's location has changed enough to require an update of the map display. Depending on device characteristics, the update may require redrawing the whole map or just part of it.
This particular step in the use case shows the usefulness of the Interaction Manager. One can imagine an architecture lacking an IM in which the Modality Components communicate with each other directly. In this case, all Modality Components would have to handle the location update events separately. This would mean considerable duplication of markup and calculation. Consider in particular the case of a VoiceXML 2.0 Form which is supposed to warn the driver when he went off course. If there is an IM, this Form will simply contain the off-course dialog and will be triggered by an appropriate event from the IM. In the absence of the IM, however, the Form will have to be invoked on each location update event. The Form itself will have to calculate whether the user is off-course, exiting without saying anything if he is not. In parallel, the HTML Modality Component will be performing a similar calculation to determine whether to update its display. The overall application is simpler and more modular if the location calculation and other application logic is placed in the IM, which will then invoke the individual Modality Components only when it is time to interact with the user.

Note on the GPS. We assume that the GPS raises four types of events: On-Course Updates, Off-Course Alerts, Loss-of-Signal Alerts, and Recovery of Signal Notifications. The Off-Course Alert is covered below. The Loss-of-Signal Alert is important since the system must know if its position and course information is reliable. At the very least, we would assume that the graphical display would be modified when the signal was lost. An audio earcon would also be appropriate. Similarly, the Recovery of Signal Notification would cause a change in the display and possibly a audio notification. This event would also contain an indication of the number of satellites detected, since this determines the accuracy of the signal: three satellites are necessary to provide x and y coordinate, while a fourth satellite allows the determination of height as well. Finally, note that the GPS can assume that the car's location does not change while the engine is off. Thus when it starts up it will assume that it is at its last recorded location. This should make the initialization process quicker.
User selects option to change volume of on-board display using touch display. Comment: HTML Modality Component raises an event, which the IM catches. Depending on the IM language, it may be able to call the DCCI interface directly (e.g. as executable content in SCXML). If it cannot, the IM would generates an event to modify the relevant DCCI property and the Runtime Framework (Adapter) would be responsible for converting it into the appropriate function call, which has the effect of resetting the output volume.
User presses button on steering wheel (to start recognition) Comment: The interesting question here is whether the button-push event is visible at the application level. One possibility is that the button-push simply turns on the mike and is thus invisible to the application. In that case, the voice modality component must already be listening for input with no prespeech timeout set. On the other hand, if there is an explicit button-push event, the IM could catch it and then invoke the speech component, which would not need to have been active in the interim. The explicit event would also allow for an update of the graphical display.
User says destination address. (May improve recognition accuracy by sending grammar constraints to server based on a local dialog with the user instead of allowing any address from the start) Comment: Assuming V3 and explicit markup control of recognition, the device would first perform first local recognition, then send the audio off for remote recognition if the confidence was not high enough. The local grammar would consist of 'favorites' or places that the driver was considered likely to visit. The remote grammar would be significantly larger, possibly including the whole continent.
When the IM is satisfied with the confidence levels, it ships the n-best list off to a remote server, which adds graphical information for at least the first choice. The server may also need to modify the n-best list, since items that are linguistically unambiguous may turn out to be ambiguous in the database (e.g., "Starbucks"). Now the IM instructs the HTML component to display the hypothesized destination (first item on n-best list) on the screen and instructs the speech component to start a confirmation dialog. Note that the submission to the remote server should be similar to the <data> tag in VoiceXML 2.1 in that it does not require a document transition. (That is, the remote server should not have to generate a new IM document/state machine just to add graphical information to the n-best list.)
User confirms destination. Comment: Local recognition of grammar built from n-best list. The original use case states that the device sends the destination information to the server, but that may not be necessary since the device already has a map of the hypothesized destination. However, if the confirmation dialog resulted in the user choosing a different destination (i.e., not the first item on the n-best list), it might be necessary to fetch graphical/map information for the selected destination. In any case, all this processing is under markup control.
GPS Input at regular intervals. Comment: On-Course Updates. Event handler in the IM decides if location has changed enough to require update of graphical display.
GPS Input at regular intervals (indicating driver is off course) Comment: This is probably an asynchronous Off-Course Alert, rather than a synchronous update. In either case, the GPS determines that the driver is off course and raises a corresponding event which is caught by the IM. Its event handler updates the display and plays a prompt warning the user. Note that both these updates are asynchronous. In particular, the warning prompt may need to pre-empt other audio (for example, the system might be reading the user's email back to him.)
N/A Comment: The IM sends a route request to server, requesting it to recalculate the route based on the new (unexpected) location. This is also part of the event handler for the off-course event. There might also be a speech interaction here, asking the user if he has changed his destination.
Alert received on device based on traffic conditions Comment: This is another asynchronous event, just like the off-course event. It will result in asynchronous graphical and verbal notifications to the user, possibly pre-empting other interactions.; The difference between this event and the off-course event is that this one is generated by the remote server. To receive it, the IM must have registered for it (and possibly other event types) when the driver chose his destination. Note that the registration is specific to the given destination since the driver does not want to receive updates about routes he is not planning to take.
User requests recalculation of route based on current traffic conditions Comment: Here the recognition can probably be done locally, then the recalculation of the route is done by the server, which then sends updated route and graphical information is sent to the device.
GPS Input at regular intervals Comment: On-Course updates as discussed above.
User presses button on steering wheel Comment: Recognition started. Whether this is local or remote recognition is determined by markup and/or DCCI defaults established at the start of application. The use case does not specify whether all recognition requires a button push. One option would be to require the button push only when the driver is initiating the interaction. This would simplify the application in that it would not have to be listening constantly to background noise or side chatter just in case the driver issued a command. In cases where the system had prompted the driver for input, the button push would not be necessary. Alternatively, a special hot-word could take the place of the button push. All of these options are compatible with the architecture described in this document.
User requests new destination by destination type while still depressing button on steering wheel (may improve recognition accuracy by sending grammar constraints to server based on a local dialog with the us Comment: Local and remote recognition as before, with IM sending n-best list to server, which adds graphical information for at least the first choice.
User confirms destination via a multiple interaction dialog to determine exact destination Comment: Local disambiguation dialog, as above. At the end, user is asked if this is a new destination.
User indicates that this is a stop on the way to original destination Comment: Device sends request to server, which provides updated route and display info. The IM must keep track of the original destination so that it can request a new route to it after the driver reaches his intermediate destination.
GPS Input at regular intervals Comment: As above.

G References

CDF: Compound Document by Reference Framework 1.0. Timur Mehrvarz, et al. editors. World Wide Web Consortium, 2006
CCXML: "Voice Browser Call Control: CCXML Version 1.0" , R.J. Auburn, editor, World Wide Web Consortium, 2005.
DCCI: "Delivery Context Interfaces (DCCI) Accessing Static and Dynamic Properties" , Keith Waters, Rafah Hosn, Dave Raggett, Sailesh Sathish, and Matt Womer, editors. World Wide Web Consortium, 2004.
EMMA: "Extensible multimodal Annotation markup language (EMMA)" , Michael Johnson et al. editors. EMMA is an XML format for annotating application specific interpretations of user input with information such as confidence scores, time stamps, input modality and alternative recognition hypotheses, World Wide Web Consortium, 2005.
Galaxy: "Galaxy Communicator" Galaxy Communicator is an open source hub and spoke architecture for constructing dialogue systems that was developed with funding from Defense Advanced Research Projects Agency (DARPA) of the United States Government.
MMIF: "W3C Multimodal Interaction Framework" , James A. Larson, T.V. Raman and Dave Raggett, editors, World Wide Web Consortium, 2003.
MMIUse: "W3C Multimodal Interaction Use Cases" , Emily Candell and Dave Ragett, editors, World Wide Web Consortium, 2002.
SCXML: "State Chart XML (SCXML): State Machine Notation for Control Abstraction" , Jim Barnett et al. editors. World Wide Web Consortium, 2006.
SMIL: "Synchronized Multimedia Integration Language (SMIL 2.1)" , Dick Bulterman et al. editors. World Wide Web Consortium, 2005.
SVG: "Scalable Vector Graphics (SVG) 1.1 Specification" , Jon Ferraiolo et al. editors. World Wide Web Consortium, 2003.
VoiceXML: "Voice Extensible Markup Language (VoiceXML) Version 2.0" , Scott McGlashan et al. editors. World Wide Web Consortium, 2004.
XHTML: "XHTML 1.0 The Extensible HyperText Markup Language (Second Edition)" , Steven Pemberton et al. editors. World Wide Web Consortium, 2004.
XMLSig: "XML-Signature Syntax and Processing" Eastlake et al., editors. World Wide Web Consortium, 2001.