W3C

Authoring Applications for the Multimodal Architecture

W3C Working Group Note 2 July 2008

This version:
http://www.w3.org/TR/2008/NOTE-mmi-auth-20080702/
Latest version:
http://www.w3.org/TR/mmi-auth/
Previous version:
This is the first version.
Editor:
Ingmar Kliche, Deutsche Telekom AG

Abstract

This document describes a multimodal system which implements the W3C Multimodal Architecture and gives an example of a simple multimodal application authored using various W3C markup languages, including SCXML, CCXML, VoiceXML 2.1 and HTML.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 2 July 2008 W3C Working Group Note of "Authoring Applications for the Multimodal Architecture". The Multimodal Interaction Working Group published an updated Working Draft of the "Multimodal Architecture and Interfaces" (MMI Architecture) [MMI-ARCH] on 14 April 2008. However, the Working Draft currently does not take a position on several aspects of multimodal applications. These include the startup phase, how components find each other, and message transport, all of which must be addressed in actual applications. In order to provide a concrete illustration of a multimodal application based on the Multimodal Architecture using current W3C technologies, the Working Group has prepared this Note. The goal of this Note is to make it easier to author concrete multimodal Web applications. This document represents the views of the group at the time of publication. The group is planning to enhance the MMI Architecture specification based on this Working Group Note if the feedback is positive.

This document is one of a series produced by the Multimodal Interaction Working Group of the W3C Multimodal Interaction Activity.

For more information about the Multimodal Interaction Activity, please see the Multimodal Interaction Activity statement.

Comments for this specification are welcomed and should have a subject starting with the prefix '[AUTH]'. Please send them to www-multimodal@w3.org, the public email list for issues related to Multimodal. This list is archived and acceptance of this archiving policy is requested automatically upon first post. To subscribe to this list send an email to www-multimodal-request@w3.org with the word subscribe in the subject line.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Overview
3 Implementation of the components
    3.1 The Runtime Framework
    3.2 GUI Modality Component
    3.3 Voice Modality Component
4 Initiating multimodal sessions
5 Authoring example
    5.1 T-Shirt.scxml
    5.2 captureColorSize.html
    5.3 dispatcher.ccxml
    5.4 captureColor.vxml

Appendices

A Acknowledgments
B References


1 Introduction

The W3C Multimodal Interaction (MMI) Working Group develops an architecture [MMI-ARCH] for the Multimodal Interaction framework [MMIF]. The Multimodal Architecture describes a general and flexible framework for interoperability of the various components of the multimodal framework (e.g. modality components and the interaction manager) in an abstract way. Among others it defines interfaces and messages between the constituents of the framework, but it is up to the implementation to decide how these messages are transferred in case of a distributed implementation.

The intention of this document is to provide a proposal of how to implement a multimodal runtime environment as well as an application based on the W3C Multimodal Architecture using existing W3C technologies. This proposal uses CCXML and VoiceXML to implement a voice modality component. Note that this is just one possibility for implementing it.

Note: The W3C Voice Browser Working Group is currently developing VoiceXML 3.0 which is the next major release of VoiceXML and will enable voice browsers to fit into the W3C Multimodal Architecture as a modality component. As VoiceXML 3.0 implementations are not yet available, this document relies on the existing VoiceXML 2.1 specification [VoiceXML].

The Multimodal Interaction Working Group itself wants to learn from this authoring example where improvements are possible and necessary. We also intend to present how we think that multimodal applications will be authored in the future.

2 Overview

The W3C Multimodal Architecture consists of the following main constituents (see also: MMI Runtime Architecture Diagram):

In this document we discuss a distributed implementation of the multimodal framework using the following components and technologies:

The following figure shows all these components mapped to the MMI Runtime Architecture Diagram:

Runtime Architecture Diagram

The dashed boxes correspond to (logical) components within the MMI architecture whereas solid lines correspond to actual software or hardware components used to implement the system.

The voice input/output device shown in the figure above may be a regular (mobile) phone or a Voice-over-IP (soft) phone. In any case a phone connection to a standard voice browser is used.

3 Implementation of the components

This section discusses one possible implementation of the Multimodal Architecture.

3.1 The Runtime Framework

The Runtime Framework provides the environment which hosts the SCXML interpreter. It has to provide an interface to receive events from external components (modality components) and must be able to inject these events into an existing SCXML session or to start SCXML interpreter sessions. The Runtime Framework also needs to provide the possibility to send events to external components (i.e. some implementation of the SCXML <send> tag). In the future this feature might be a covered by the "external communications module" of the SCXML specification ([SCXML]).

An implementation of an SCXML interpreter written in Java is available open source from the Apache Software Foundation [Apache Commons SCXML]. One possibility for implementing a simple runtime framework could be to combine the Apache commons SCXML library [Apache Commons SCXML] with a J2EE servlet engine (e.g. [Apache Tomcat]). The servlet engine would be used to implement the HTTP I/O processor.

SCXML based Interaction Manager

Even though HTTP might not be the most efficient solution as a transport protocol, it still has some advantages. It is a widely used protocol and available in nearly every programming language. In a distributed scenario, where the Interaction Manger (i.e. Runtime Framework) and the modality components are spread across the network, proxy and firewall problems are easy to solve. Also, our intended modality components (HTML browsers for graphical modality and VoiceXML browsers for voice modality) inherently support HTTP. Therefore we use HTTP for this proof-of-concept implementation proposal. Other, more scalable solutions might make use of other protocols.

The Runtime Framework provides the I/O processor which receives HTTP requests from modality components (containing XML based life-cycle event representation). Based on the event semantics the Runtime Framework logic has either to start a new SCXML interpreter instance (when receiving a mmi:newContextRequest message) or to inject an event into a running SCXML interpreter instance.

In this scenario in terms of transport the Runtime Framework acts as an HTTP server which receives HTTP requests and modality components are HTTP clients sending HTTP requests to the Runtime Framework. Therefore sending events from modality components to the Interaction Manager is relatively easy to implement using existing technologies.

The multimodal runtime architecture also requires to send events from the Interaction Manger to the modality components asynchronously. To be able to leverage standard components like HTML and VoiceXML browsers as modality components (or modality component containers) events should still be transferred using HTTP (as HTML and VoiceXML browsers supporting the HTTP protocol natively). But the browsers act as HTTP clients only. Therefore the Interaction Manager has still the role of the HTTP server. According to the HTTP model the client has to initiate requests. To enable the Interaction Manager to send events to the modality component, the modality component therefore has to send HTTP requests to the Interaction Manager to ask for events. This technique is usually known as polling. Simple implementations have obvious drawbacks (e.g. increased network traffic, additional delay) but it is possible to optimize it to some extend (e.g. by blocking the HTTP request server side and using timeouts). This technique certainly has limitations for large scale implementations, but it is relatively easy to implement based on existing technologies and therefore a good choice for a proof-of-concept.

Another promising approach could be [COMET] which uses long living HTTP connections to stream data to the client. Again the client has to open the HTTP connection. The server will stream an HTTP response to the client and leaves the HTTP connection open until the next event has to be sent to the client. Meanwhile there are a lot of applications out there using this server-push technology. Unfortunately this technology is not well standardized yet and therefore requires browser dependent implementations. But it is a potential solution for the required server-push channel.

3.2 GUI Modality Component

The GUI modality component may be implemented using HTML and JavaScript.

According to the rules defined for the Multimodal Architecture, the application logic resides within the Interaction Manager. Therefore the modality component has to send events (e.g. user initiated events like click or change) to the Interaction Manager. The Interaction Manager decides on possible reactions to this events and sends events to the modality component to instruct it to execute some action (e.g. displaying something).

The modality component API may be implemented using [XMLHttpRequest] (also know as AJAX). Event handlers for user initiated events like change for text input elements and click events for button elements may easily convert these into XML representations (MMI life-cycle event representation, e.g. containing values of input fields) and sent them to the Interaction Manager using XMLHttpRequests.

The following code snippet demonstrates the principle of how to send events to a server side Interaction Manager (assuming a servlet at someURL) using ECMAScript and XMLHttpRequests:

/* The sendMmiLifecycleEvent() function sends the MMI lifecycle
   event, potentially containing data values like color. The implementation of
   this function is vendor specific. The function is called to send a life cycle 
   event to the Runtime Framework using AJAX. The parameter "payload" contains a life 
   cycle event object.
*/ 
function sendMmiLifecycleEvent(source, context, payload) 
{
  var xmlHttpRequest = new XMLHttpRequest();
  
  // relative url, assuming that AJAX requests go to a url 
  // relative to the documents url
  var url ="./someURL";

  var XMLpayload = payload.toXML(source, context);
  
  xmlHttpRequest.open("POST", url, true);
  xmlHttpRequest.onreadystatechange = readystatehandler;
  xmlHttpRequest.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');
  xmlHttpRequest.send(XMLpayload);
  
  function readystatehandler()
  {
    if (xmlHttpRequest.status == 200 || xmlHttpRequest.status==304) {
      // be quiet in case of success
      // alert("success");
    } else {
      // alert error
      alert("send failure");
    }
  }
}

// JavaScript Event (pseudo) object
function LifeCycleEvent(mmiEvType, eventType, fieldName, fieldValue)
{
  this.mmiEventType = mmiEvType;        // e.g. extension
  this.eventType = eventType;           // user initiated event, e.g. change
  this.fieldName = fieldName;           // e.g. HTML id of the field
  this.fieldValue = fieldValue;         // e.g. value of the field
}

// method of LifeCycleEvent object to generate XML string from its properties
LifeCycleEvent.prototype.toXML = function(source, context)
{
  var mmiLifeCycleEvent;
     
  mmiLifeCycleEvent  = '&lt;mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch"&gt;';
  mmiLifeCycleEvent += '  &lt;mmi:" +  this.mmiEventType + "\"";
  mmiLifeCycleEvent += '  mmi:source="' +  source + '" mmi:context="' +  context + '"&gt;';
  mmiLifeCycleEvent += '  &lt;mmi:data&gt;';
  mmiLifeCycleEvent += '    &lt;eventType&gt;' + this.eventType + '&lt;/eventType&gt;';
  mmiLifeCycleEvent += '    &lt;fieldName&gt;' + this.fieldName + '&lt;/fieldName&gt;';
  mmiLifeCycleEvent += '    &lt;fieldValue&gt;' + this.fieldValue + '&lt;/fieldValue&gt;';
  mmiLifeCycleEvent += '  &lt;/mmi:data&gt;';
  mmiLifeCycleEvent += '  &lt;/mmi:' +  this.mmiEventType + '&gt;';
  mmiLifeCycleEvent += '&lt;/mmi&gt;";

  return mmiLifeCycleEvent;
}

As described above, receiving events from the Interaction Manager requires to send an HTTP request to the server (i.e. Runtime Framework). The response contains an XML coded event which represents an MMI life-cycle event. An event, indicating the change of the value of color, would be represented as a MMI life-cycle event "mmi:extension" (see [MMI-ARCH]) and could look like this:

<?xml version="1.0" encoding="UTF-8"?>
<mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:startRequest mmi:source="" mmi:target="" mmi:context="">
    <mmi:contentURL href="someContentURI" max-age="" fetchtimeout="1s">
    <mmi:data>
    </mmi:data>
  </mmi:extension>
</mmi>

It should be mentioned, that the content of the <mmi:data> element is application specific.

This approach requires to send an asynchronous XMLHttpRequest to the Runtime Framework (expecting the request to be blocked at the server) and to interpret the response accordingly: either taking any local action based on the event semantics and/or re-sending another request to the Runtime Framework.

/* This function handles all incoming MMI lifecycle events. They may be fetched 
   from the server side interaction manager using AJAX. The returned XML document
   is the MMI lifecycle event.
*/
function handleIncomingMmiEvents(xml)
{ 
  // check if incoming message is MMI lifecycle event
  // perform a very simple check:
  if(xml.match("<mmi:")) 
  {
    // parse incoming xml string to DOM
    parser=new DOMParser();
    doc=parser.parseFromString(xml,"text/xml"); 
    
    var element = doc.documentElement;
    if(element.childNodes[0].nodeName=="mmi:newContextResponse")
    {
      _CONTEXT = element.childNodes[0].getAttribute("mmi:context");
    }
    else if(element.childNodes[0].nodeName=="mmi:extension")
    {
      if(element.childNodes[0].childNodes[0].nodeName=="mmi:data")
       {
        // Application specific extension
        // In this example we receive the name of a function and the params.
        // This has to be evaluated locally using eval().
        var functionname = element.childNodes[0].childNodes[0].childNodes[0].childNodes[0].nodeValue;
        var elementname = element.childNodes[0].childNodes[0].childNodes[1].childNodes[0].nodeValue;
        var elementvalue = element.childNodes[0].childNodes[0].childNodes[2].childNodes[0].nodeValue;
        eval(functionname + "(elementname ,elementvalue)");
      }
    }
    else if(element.childNodes[0].nodeName=="mmi:clearContextRequest")
    {
      // create new mmiLifeCycleEvent Object that signals the removal of the Context
      event = new LifeCycleEvent("clearContextResponse", "", "", "");
      
      // send clearContextResponse lifecycle event
      sendMmiLifecycleEvent(_SOURCE, _CONTEXT, event);
    }
    else
    {
      // unknown lifecycle event
      alert("MMI lifecycle event not handled.");
    }

    // send HTTP request to server to receive lifecycle event.
    readMmiLifecycleEvent();
  }
  else // check if contains "<mmi:"
  {
    // --> it is not a valid lifecycle event!
    alert("Error: wrong message!");
  }
}

3.3 Voice Modality Component

The Voice Modality Component may be implemented using CCXML and VoiceXML 2.1.

VoiceXML 2.1 does not provide an external eventing functionality. As CCXML 1.0 defines an external event interface (Basic HTTP Event I/O Processor), which allows to inject external events into a running CCXML session or to start new CCXML sessions, CCXML will therefore be used as a event bridge between VoiceXML and the Interaction Manager. CCXML will receive events from the Interaction Manager and - depending on the event semantics - start a VoiceXML dialog.

VoiceXML will be used to implement the actual voice user interface (play prompt and control ASR). User input collected by VoiceXML will be returned to CCXML. CCXML has the ability to send HTTP requests to external components. This feature will be used to send events back to the multimodal runtime framework to inject events into the SCXML based Interaction Manager.

Due to the fact that VoiceXML must return to CCXML (and hence exit) to return results (e.g. recognition results) the VoiceXML user interface has to be implemented as small independent scripts. Each script corresponds to a single action, like play a prompt or start grammar and listen to user input.

4 Initiating multimodal sessions

Now, as we have described the basics of all constituents, we need to define the setup of a multimodal session.

A multimodal session may be initiated using a GUI modality component. The user starts a web browser and loads a HTML document from a given URL. Upon load, the HTML document registers corresponding event handlers (e.g. for change events) and is able to send messages to the Interaction Manager using AJAX (i.e. XMLHttpRequests).

The HTML document may contain a special text input field which is used to collect the users phone number or SIP URL. Once the user has entered this information it is sent (e.g. by pressing a corresponding button) to the Interaction Manager. The Interaction Manager generates a message towards the CCXML event processor to create a new CCXML session and to initiate a phone call to the given telephone number (or SIP URL).

As soon as the telephone connection has been established successfully the multimodal session is initiated. Now the Interaction Manager is capable of controlling the two modalities by sending life-cycle events.

5 Authoring example

This section ties together the previously described components to implement a sample application. The multimodal T-Shirt example contains a combined graphical and voice user interface and allows to fill in a form containing two fields (color and size) either by voice or by pen/keyboard.

The following figure shows the corresponding state machine logic for this example together with the MMI life-cycle events.

State machine logic and MMI life-cycle events

5.1 T-Shirt.scxml

The state machine could be represented in SCXML source code (T-Shirt.scxml) as follows:

<?xml version="1.0" encoding="UTF-8"?>
<scxml version="1.0" profile="ecmascript" initial="getColor" >
  <!-- we assume there is a script library which constructs MMI lifecycle events etc. -->
  <script src="mmi.js"/>
  
  <!-- datamodel definition -->
  <datamodel>
    <data id="color" expr=""/>
    <data id="size" expr=""/>
    <data id="received" expr="0"/>
  </datamodel>
  
  <!-- state getColor -->
  <state id="getColor">
    <onentry>
      <script>
        mmiEvent = new mmiStartRequest();
        mmiEvent.setURL('captureColorSize.html');
      </script>
      <!-- issue startRequest to GUI -->
      <send event="mmi:startRequest" target="GUI" targetType="x-ajax" namelist="mmiEvent"/>
      <script>
        mmiEvent = new mmiStartRequest();
        mmiEvent.setURL('getColor.vxml');
      </script>
      <!-- issue startRequest to VUI -->
      <send event="mmi:startRequest" target="VUI" targetType="basichttp" namelist="mmiEvent"/>
    </onentry>
    
    <!-- handle voice input -->
    <transition event="mmi:done" cond="_event.data..@source.toString() == 'VUI' && 
         _event.data..@status.toString() == 'success'" target="echoColor"/>
       <!-- save color to data model -->
       <assign location="_data.color" expr="_event.data..color.toString()"/>
       <!-- send event to GUI to display information -->
       <script>
         mmiEvent = new mmiExtension();
         // construct content of data element of extension event as XML string
         dataFieldValue = "&lt;eventType&gt;_check&lt;/eventType&gt;";
         dataFieldValue += "&lt;fieldName&gt;color&lt;/fieldName&gt;";
         dataFieldValue += "&lt;fieldValue&gt;" + color + "&lt;/fieldValue&gt;";
         mmiEvent.setDataField(dataFieldValue);
       </script>
      <send event="mmi:extension" target="GUI" targetType="x-ajax" namelist="mmiEvent"/>
    </transition>    
    
    <!-- handle GUI input -->
    <transition event="mmi:extension" cond="_event.data..@source.toString() == 'GUI' && 
         _event.data..@status.toString() == 'success'" target="echoColor"/>
       <!-- save color to data model -->
       <assign location="_data.color" expr="_event.data..color.toString()"/>
    </transition>    
    
    <!-- error handling -->
    <transition event="mmi:startResponse" cond="_event.data..@status.toString() == 'error'" target="failure"/>
    <transition event="mmi:done" cond="_event.data..@status.toString() == 'error'" target="failure"/>
  </state>
  
  <!-- state echoColor -->
  <state id="echoColor">
    <onentry>
      <!-- play back color to user -->
      <script>
        mmiEvent = new mmiStartRequest();
        mmiEvent.setURL('echoColor.vxml');
        // construct content of data element of extension event as XML string
        dataFieldValue = "&lt;color&gt;" + color + "&lt;/color&gt;";
        mmiEvent.setDataField(dataFieldValue);
      </script>
      <send event="mmi:startRequest" target="VUI" targetType="basichttp" namelist="mmiEvent"/>
    </onentry>

    <!-- play prompt done -->
    <transition event="mmi:done" cond="_event.data..@source.toString() == 'VUI' && 
         _event.data..@status.toString() == 'success'" target="getSize"/>
        
    <!-- error handling -->
    <transition event="mmi:startResponse" cond="_event.data..@status.toString() == 'error'" target="failure"/>
    <transition event="mmi:done" cond="_event.data..@status.toString() == 'error'" target="failure"/>
  </state>
  
  <!-- state getSize -->
  <state id="getSize">
    <onentry>
      <script>
        mmiEvent = new mmiStartRequest();
        mmiEvent.setURL('getSize.vxml');
      </script>
      <!-- issue startRequest to VUI -->
      <send event="mmi:startRequest" target="VUI" targetType="basichttp" namelist="mmiEvent"/>
    </onentry>
    
    <!-- handle voice input -->
    <transition event="mmi:done" cond="_event.data..@source.toString() == 'VUI' && 
         _event.data..@status.toString() == 'success'" target="echoSize"/>
       <!-- save color to data model -->
       <assign location="_data.size" expr="_event.data..size.toString()"/>
       <!-- send event to GUI to display information -->
       <script>
         mmiEvent = new mmiExtension();
         // construct content of data element of extension event as XML string
         dataFieldValue = "&lt;eventType&gt;_check&lt;/eventType&gt;";
         dataFieldValue += "&lt;fieldName&gt;size&lt;/fieldName&gt;";
         dataFieldValue += "&lt;fieldValue&gt;" + size + "&lt;/fieldValue&gt;";
         mmiEvent.setDataField(dataFieldValue);
       </script>
      <send event="mmi:extension" target="GUI" targetType="x-ajax" namelist="mmiEvent"/>
    </transition>    
    
    <!-- handle GUI input -->
    <transition event="mmi:extension" cond="_event.data..@source.toString() == 'GUI' && 
         _event.data..@status.toString() == 'success'" target="echoSize"/>
       <!-- save size to data model -->
       <assign location="_data.size" expr="_event.data..size.toString()"/>
    </transition>    
    
    <!-- error handling -->
    <transition event="mmi:startResponse" cond="_event.data..@status.toString() == 'error'" target="failure"/>
    <transition event="mmi:done" cond="_event.data..@status.toString() == 'error'" target="failure"/>  
  </state>
  
  <!-- state echoSize -->
  <state id="echoSize">
    <onentry>
      <!-- play back color to user -->
      <script>
        mmiEvent = new mmiStartRequest();
        mmiEvent.setURL('echoSize.vxml');
        // construct content of data element of extension event as XML string
        dataFieldValue = "&lt;size&gt;" + size + "&lt;/size&gt;";
        mmiEvent.setDataField(dataFieldValue);
      </script>
      <send event="mmi:startRequest" target="VUI" targetType="basichttp" namelist="mmiEvent"/>
    </onentry>

    <!-- play prompt done -->
    <transition event="mmi:done" cond="_event.data..@source.toString() == 'VUI' && 
         _event.data..@status.toString() == 'success'" target="endOfInteraction"/>
        
    <!-- error handling -->
    <transition event="mmi:startResponse" cond="_event.data..@status.toString() == 'error'" target="failure"/>
    <transition event="mmi:done" cond="_event.data..@status.toString() == 'error'" target="failure"/>  
  </state>
  
  <!-- state  endOfInteraction-->
  <state id="endOfInteraction">
    <onentry>
      <!-- number of received clearContextResponse messages, we are waiting for two -->
      <assign location="received" expr="0"/>
      
      <!-- issue clearContextRequest messages -->
      <script>
        mmiEvent = new mmiClearContextRequest();
      </script>
      <!-- issue clearContextRequest to GUI -->
      <send event="mmi:clearContextRequest" target="GUI" targetType="x-ajax" namelist="mmiEvent"/>
      <script>
        mmiEvent = new mmiClearContextRequest();
      </script>
      <!-- issue clearContextRequest to VUI -->
      <send event="mmi:clearContextRequest" target="VUI" targetType="basichttp" namelist="mmiEvent"/>
    </onentry>
    <transition event="mmi:clearContextResponse" cond="received = 0">
      <!-- increase counter -->
      <assign location="received" expr="1"/>
    </transition>
    <transition event="mmi:clearContextResponse" cond="received > 0" target="end"/>
  </state>
  
  <!-- state failure -->
  <state id ="failure">
    <!-- simply stop interaction -->
    <transition target="endOfInteraction"/>
  </state>
  
  <!-- final state -->
  <state id="end" final="true"/>
</scxml>

In this example we assume that the Runtime Framework supports the x-ajax and basichttp targettypes for the <send> tag. The GUI modality component uses AJAX to communicate to the Runtime Framework. Therefore we use x-ajax as the targettype, whereas the Voice modality component is implemented using CCXML/VoiceXML. As the external event interface of CCXML is used to inject events into the CCXML session we have to make use of the basichttp targettype.

5.2 captureColorSize.html

The following code fragment provides the basics of the HTML source code for the GUI modality component (i.e. captureColorSize.html):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta name="application" content="captureColorSize" />
<meta name="description" content="solicits value for size and color" />

<title>CaptureColorSize</title>

<script src="sendEvent.js" language="javascript"/>
<script src="evaluateResponseXML.js" language="javascript"/>

<!-- Event handlers that create and send external events -->

<script language="JavaScript" type="text/javascript">
   
  var _SOURCE="GUI";
  var _CONTEXT="";
     
  function onloadHandler()
  {
    // create new mmiLifeCycleEvent Object that requests a new Context
    event = new LifeCycleEvent("newContextRequest", "", "", "");
    
    // send newContextRequest lifecycle event
    sendMmiLifecycleEvent(_SOURCE, _CONTEXT, event);
    
    // send HTTP request to server to receive lifecycle event.
    readMmiLifecycleEvent();
  }
  
  /* HTML event handler. The functions makes use of the browser event object
    which holds the id and the value (in this case the color value) of the HTML object.
  */
  function eventHandler(event)
  {  
    target = new Object();
    if(event.target)
    {    
      target = event.target;
    } 
    // Internet Explorer has no attribute target
    else if(event.srcElement)  
    {
      target = event.srcElement;
    }
    
    event = new LifeCycleEvent("extension", event.type, target.id, target.value);
    
    sendMmiLifecycleEvent(_SOURCE, _CONTEXT, event);
  }
    
  /* initiate AJAX request to the interaction manager to read the next MMI lifecycle
     event. The returned event is handled asynchronously within handleIncomingMmiEvents().
     Finally the next MMI lifecycle event is fetched.
  */
  function readMmiLifecycleEvent()
  { 
    // Start asynchronous XMLHttpRequest to receive MMI lifecycle event. 
    // We assume that the the IM always returns a lifecycle event, i.e.
    // we do not handle special timeout events. Once the request returns, 
    // handleIncomingMmiEvents() will be called to evaluate the xml encoded event.
  
    var xmlHttpRequest = new XMLHttpRequest();
  
    // relative url, assuming that AJAX requests go to a url 
    // relative to the documents url
    var url ="./getMMILifeCycleEvent";
  
    xmlHttpRequest.open("GET", url, true);
    xmlHttpRequest.onreadystatechange = readyhandler;
    xmlHttpRequest.send(null);
  
    function readyhandler()
    {
      if (xmlHttpRequest.readyState == 4) {
        if (xmlHttpRequest.status == 200) {
          //handle lifecycle event
          handleIncomingMmiEvents(xmlHttpRequest.responseText);
        } else {
          // alert error
          alert("readMmiLifecycleEvent failure");
        }
      }
    }     
  }
    
  // function to check elements with the given html id, e.g. radio buttons
  function _check(elementname, elementvalue)
  {
     document.getElementById(elementvalue).checked=true;
  }     
</script>
</head>

<body id="bodyId" onload="onloadHandler();">

<form action="" name="Color" id="Color">T-shirt color:
<table width="200">
  <tr>
    <td>
      <label> <input type="radio" id="red"
      name="radioGroup1" value="Red" onclick="eventHandler(event);" /> 
      Red</label>
    </td>
  </tr>
  <tr>
    <td>
      <label> <input type="radio" id="green"
      name="radioGroup1" value="Green" onclick="eventHandler(event);" />
      Green</label>
    </td>
  </tr>
  <tr>
    <td>
      <label> <input type="radio" id="blue"
      name="radioGroup1" value="Blue" onclick="eventHandler(event);" />
      Blue</label>
    </td>
  </tr>
</table>
</form>

<form action="" name="Size" id="Size">T-shirt size:
<table width="200">
  <tr>
    <td><label> <input type="radio" id="small"
      name="radioGroup2" value="Small" onclick="eventHandler(event);" />
    Small</label></td>
  </tr>
  <tr>
    <td><label> <input type="radio" id="medium"
      name="radioGroup2" value="Medium" onclick="eventHandler(event);" />
    Medium</label></td>
  </tr>
  <tr>
    <td><label> <input type="radio" id="large"
      name="radioGroup2" value="Large" onclick="eventHandler(event);" />
    Large</label></td>
  </tr>
</table>
</form>

</body>
</html>

sendEvent.js:

/* The sendMmiLifecycleEvent() function sends the MMI lifecycle
   event, potentially containing data values like color. The implementation of
   this function is vendor specific. The function is called to send a life cycle 
   event to the Runtime Framework using AJAX. The parameter "payload" contains a life 
   cycle event object.
*/ 
function sendMmiLifecycleEvent(source, context, payload) 
{
  var xmlHttpRequest = new XMLHttpRequest();
  
  // relative url, assuming that AJAX requests go to a url 
  // relative to the documents url
  var url ="./someURL";

  var XMLpayload = payload.toXML(source, context);
  
  xmlHttpRequest.open("POST", url, true);
  xmlHttpRequest.onreadystatechange = readystatehandler;
  xmlHttpRequest.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');
  xmlHttpRequest.send(XMLpayload);
  
  function readystatehandler()
  {
    if (xmlHttpRequest.status == 200 || xmlHttpRequest.status==304) {
      // be quiet in case of success
      // alert("success");
    } else {
      // alert error
      alert("send failure");
    }
  }
}

// JavaScript Event (pseudo) object
function LifeCycleEvent(mmiEvType, eventType, fieldName, fieldValue)
{
  this.mmiEventType = mmiEvType;        // e.g. extension
  this.eventType = eventType;           // user initiated event, e.g. change
  this.fieldName = fieldName;           // e.g. HTML id of the field
  this.fieldValue = fieldValue;         // e.g. value of the field
}

// method of LifeCycleEvent object to generate XML string from its properties
LifeCycleEvent.prototype.toXML = function(source, context)
{
  var mmiLifeCycleEvent;
     
  mmiLifeCycleEvent  = '&lt;mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch"&gt;';
  mmiLifeCycleEvent += '  &lt;mmi:' +  this.mmiEventType + '"';
  mmiLifeCycleEvent += '  mmi:source="' +  source + '" mmi:context="' +  context + '"&gt;';
  mmiLifeCycleEvent += '  &lt;mmi:data&gt;";
  mmiLifeCycleEvent += '    &lt;eventType&gt;' + this.eventType + '&lt;/eventType&gt;';
  mmiLifeCycleEvent += '    &lt;fieldName&gt;' + this.fieldName + '&lt;/fieldName&gt;';
  mmiLifeCycleEvent += '    &lt;fieldValue&gt;' + this.fieldValue + '&lt;/fieldValue&gt;';
  mmiLifeCycleEvent += '  &lt;/mmi:data&gt;';
  mmiLifeCycleEvent += '  &lt;/mmi:' +  this.mmiEventType + '&gt;';
  mmiLifeCycleEvent += '&lt;/mmi&gt;';

  return mmiLifeCycleEvent;
}

evaluateResponseXML.js:

/* This function handles all incoming MMI lifecycle events. They may be fetched 
   from the server side interaction manager using AJAX. The returned XML document
   is the MMI lifecycle event.
*/
function handleIncomingMmiEvents(xml)
{ 
  // check if incoming message is MMI lifecycle event
  // perform a very simple check:
  if(xml.match("<mmi:")) 
  {
    // parse incoming xml string to DOM
    parser=new DOMParser();
    doc=parser.parseFromString(xml,"text/xml"); 
    
    var element = doc.documentElement;
    if(element.childNodes[0].nodeName=="mmi:newContextResponse")
    {
      _CONTEXT = element.childNodes[0].getAttribute("mmi:context");
    }
    else if(element.childNodes[0].nodeName=="mmi:extension")
    {
      if(element.childNodes[0].childNodes[0].nodeName=="mmi:data")
       {
        // Application specific extension
        // In this example we receive the name of a function and the params.
        // This has to be evaluated locally using eval().
        var functionname = element.childNodes[0].childNodes[0].childNodes[0].childNodes[0].nodeValue;
        var elementname = element.childNodes[0].childNodes[0].childNodes[1].childNodes[0].nodeValue;
        var elementvalue = element.childNodes[0].childNodes[0].childNodes[2].childNodes[0].nodeValue;
        eval(functionname + "(elementname ,elementvalue)");
      }
    }
    else if(element.childNodes[0].nodeName=="mmi:clearContextRequest")
    {
      // create new mmiLifeCycleEvent Object that signals the removal of the Context
      event = new LifeCycleEvent("clearContextResponse", "", "", "");
      
      // send clearContextResponse lifecycle event
      sendMmiLifecycleEvent(_SOURCE, _CONTEXT, event);
    }
    else
    {
      // unknown lifecycle event
      alert("MMI lifecycle event not handled.");
    }

    // send HTTP request to server to receive lifecycle event.
    readMmiLifecycleEvent();
  }
  else // check if contains "<mmi:"
  {
    // --> it is not a valid lifecycle event!
    alert("Error: wrong message!");
  }
}

The ECMAScript function _check(elementname, elementvalue) within captureColorSize.html is provided to check a radio button. To achieve this, the Interaction Manager sends a mmi:extension life-cycle event where the (application specific) eventType element within the <mmi:data> element is set to _check. The fieldValue element contains the HTML id of the corresponding object. The _check(...) function therefore simply uses the DOM API to activate the radio button. The following example shows the MMI life-cycle event to activate the green color radio button.

<?xml version="1.0" encoding="UTF-8"?>
<mmi version="1.0" xmlns:mmi="http://www.w3.org/2008/04/mmi-arch">
  <mmi:extension mmi:source="captureColorSize.html" mmi:context="">
    <mmi:data>
      <eventType>_check</eventType>
      <fieldName>color</fieldName>
      <fieldValue>green</fieldValue>
    </mmi:data>
  </mmi:extension>
</mmi>

This event is created within the SCXML script. See the getColor state of the SCXML sample code (5.1 T-Shirt.scxml).

5.3 dispatcher.ccxml

CCXML is used as a dispatcher of events between SCXML and VoiceXML. The script ccxml_events.js contains a collection of support functions to create MMI life-cycle events or to start VoiceXML dialogs.

Note that this script is written to be application independent.

<?xml version="1.0" encoding="UTF-8" ?>
<ccxml version="1.0" xmlns="http://www.w3.org/2002/09/ccxml">

  <!-- we assume there is a library of functions to send data to 
   the Interaction Manager -->
  <script src="ccxml_events.js" />

  <!-- CCXML session ID -->
  <var name="connectionId" expr="''" />

  <!-- SCXML session ID -->
  <var name="interactionId" expr="''" />
  <!-- request ID of lifecycle event -->
  <var name="requestID" expr="'123456'" />
  <!-- target type -->
  <var name="SCXML" expr="'basichttp'" />

  <!-- VXML dialog ID for termination -->
  <var name="vxml_dialogid" expr="0" />
  <!-- whether a VXML dialog is running or not -->
  <var name="vxml_running" expr="false" />
  <!-- whether a VXML dialog is in terminating or not -->
  <var name="vxml_terminating" expr="false" />

  <var name="prompt" expr="''" />
  <var name="audio" expr="''" />
  <var name="grammarUri" expr="''" />
  <var name="fields" expr="''" />
        
        
  <!-- Note: All events which are tagged with "(INTERNAL)" are standard events! -->
  <eventprocessor>
        
    <!-- ===================================================== -->
    <!-- SCXML events -->
    <!-- ===================================================== -->
                
    <!-- CCXML (INTERNAL): when CCXML is started, it throws this internal event -->
    <transition event="ccxml.loaded">
      <script>
        _ccxml.setSCXML_URI(session.values.scxml_serverip,
            session.values.scxml_serverport,
            session.values.scxml_serverpage);
      </script>
      <assign name="connectionId" expr="event$.connectionid" />
      <assign name="interactionId" expr="session.values.interactionid" />
                        
      <!-- call SIP phone -->
      <var name="sipip" expr="session.values.sip_phoneprefix + '@' + session.values.sip_phoneip + 
           ':' + session.values.sip_phoneport" />
      <createcall dest="sipip" connectionid="connectionId" />
    </transition>
                
    <!-- CCXML: terminate -->
    <transition event="ccxml.terminate">
      <send target="_ccxml.clearContextResponse(interactionId, requestID)"
          targettype="SCXML" name="'ccxml.external'" />
      <send target="session.id" targettype="'ccxml'" name="'this.exit'" />
    </transition>
    <transition event="this.exit">
      <log expr="'CCXML.exit'" />
      <exit />
    </transition>
                
    <!-- SIP: disconnect SIP phone -->
    <transition event="sip.disconnect">
      <disconnect connectionid="connectionId" />
      </transition>
                
        <!-- VXML: start -->
    <transition event="vxml.start">
      <assign name="prompt" expr="event$.prompt" />
      <assign name="audio" expr="event$.audio" />
      <assign name="grammarUri" expr="event$.grammarUri" />
      <assign name="fields" expr="event$.fields" />
      
      <!-- If a VXML dialog is running, terminate. otherwise start -->
      <if cond="vxml_running == false">
        <assign name="vxml_running" expr="true" />
        <var name="sessionid" expr="event$.sessionid" />
        <dialogstart src="_vxml.start(grammarUri, prompt, audio, fields)"
            dialogid="vxml_dialogid" connectionid="connectionId" namelist="sessionid" />
      <else />
        <assign name="vxml_terminating" expr="true" />
        <dialogterminate dialogid="vxml_dialogid" immediate="true" />
      </if>
    </transition>
    
    <!-- VXML: terminate -->
    <transition event="vxml.terminate">
      <var name="immediate" expr="event$.immediate" />
      <dialogterminate dialogid="vxml_dialogid" immediate="immediate" />
    </transition>
    
    
    <!-- ===================================================== -->
    <!-- SIP events -->
    <!-- ===================================================== -->
    
    <!-- SIP (INTERNAL): connection to phone completed -->
    <transition event="connection.connected">
      <send target="_ccxml.createResponse(interactionId, requestID, session.id)"
          targettype="SCXML" name="'ccxml.external'" />
    </transition>
    
    <!-- SIP (INTERNAL): disconnected -->
    <transition event="connection.disconnected">
      <send target="_ccxml.clearContextRequest(interactionId)"
          targettype="SCXML" name="'ccxml.external'" />
      <send target="session.id" targettype="'ccxml'" name="'this.exit'" />
    </transition>
    
    <!-- SIP (INTERNAL): reject call from SIP phone -->
    <transition event="connection.alerting">
      <reject />
    </transition>
    
    
    <!-- ===================================================== -->
    <!-- VXML events -->
    <!-- ===================================================== -->
    
    <!-- VXML (INTERNAL): when VXML is started, it throws this internal event -->
    <transition event="dialog.started">
      <send target="_vxml.startResponse(interactionId, requestID, vxml_dialogid)"
          targettype="SCXML" name="'ccxml.external'" />
    </transition>
    
    <!-- VXML (INTERNAL): an exit in VXML throws this internal event
      if it was not just a prompt, get EMMA from VXML and send response to SCXML -->
    <transition event="dialog.exit">
      <assign name="vxml_running" expr="false" />
      <!-- if a VXML dialog was terminated as a cause of dialogstart until a dialog was running,
        start new dialog now (for that case we have the global _vxml.start()-parameters) -->
      <if cond="vxml_terminating == false">
        <send target="_vxml.doneNotification(interactionId, event$.values.emma, vxml_dialogid)"
            targettype="SCXML" name="'ccxml.external'" />
      <else />
        <send target="_vxml.doneNotification(interactionId, '', vxml_dialogid)"
            targettype="SCXML" name="'ccxml.external'" />
        <assign name="vxml_running" expr="true" />
        <var name="sessionid" expr="event$.sessionid" />
        <dialogstart src="_vxml.start(grammarUri, prompt, audio, fields)"
            dialogid="vxml_dialogid" connectionid="connectionId" namelist="sessionid" />
        <assign name="vxml_terminating" expr="false" />
      </if>
    </transition>
    
    
    <!-- ===================================================== -->
    <!-- error events -->
    <!-- ===================================================== -->
    
    <!-- all errors (INTERNAL) -->
    <transition event="error.*">
      <log expr="'CCXML error'" />
      <!--
      <send target="_vxml.sendEvent(interactionId, _vxml.ERROR)"
          targettype="SCXML" name="'ccxml.external'" />
      -->
    </transition>
    
  </eventprocessor>
  
</ccxml>

5.4 captureColor.vxml

As mentioned in 3.3 Voice Modality Component VoiceXML must return to CCXML (and hence exit the VoiceXML dialog) to return results (e.g. recognition results). Therefore the VoiceXML user interfaces has to be implemented as small independent scripts. Each script corresponds to a single action, like play a prompt or start grammar and listen to user input.

The following code sample shows how the captureColor.vxml document could look like. The script vxml_emma.js, which is referenced in the VoiceXML document, contains a collection of auxiliary ECMAScript functions to create an [EMMA] representation of the user input. See Appendix C of [EMMA] for more information of how to map a recognition result into an EMMA representation.

Note that the other VoiceXML documents are very similar and therefore not shown here.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns:vxml="http://www.w3.org/2001/vxml" version="2.1">
  <!-- We assume that there is a script library to convert the 
       recognition result into an EMMA string 
       (see http://www.w3.org/TR/emma) -->
  <script src="vxml_emma.js" />
  <form>
    <field name="color">
       <prompt>Which color?</prompt>
       <option>red</option>
       <option>blue</option>
       <option>green</option>
       <filled>
         <!-- generate EMMA string from recognition result -->
         <var name="emma" expr="createEmma(application.lastresult$)"/>
         <!-- exit back to CCXML and return the recognized result -->
         <exit namelist="emma"/>
       </filled>
    </field>
    <catch event="help nomatch noinput">
       Your options are <enumerate/>
    </catch>
  </form>
</vxml>

A Acknowledgments

The editor wishes to thank Jim Larson (Intervoice) from the Voice Browser Working Group for his contributions to the writing of this document and for many helpful comments.

B References

Apache Tomcat
Apache Tomcat servlet container, The Apache Software Foundation, 2008.
Apache Commons SCXML
A Java SCXML engine, The Apache Software Foundation, 2008.
CCXML
"Voice Browser Call Control: CCXML Version 1.0 (Working Draft)" , R.J. Auburn, editor. CCXML, or the Call Control eXtensible Markup Language, is designed to provide telephony call control support for dialog systems, such as VoiceXML. World Wide Web Consortium, 2007.
COMET
Comet Ajax server-push, Wikipedia.
EMMA
"Extensible multimodal Annotation markup language (EMMA)", Michael Johnson et al. editors. EMMA is an XML format for annotating application specific interpretations of user input with information such as confidence scores, time stamps, input modality and alternative recognition hypotheses. World Wide Web Consortium, 2007.
Galaxy
"Galaxy Communicator" , Galaxy Communicator is an open source hub and spoke architecture for constructing dialogue systems that was developed with funding from Defense Advanced Research Projects Agency (DARPA) of the United States Government.
MMI-ARCH
"Multimodal Architecture and Interfaces (Working Draft)" , Jim Barnett et al. editors. This specification describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents. World Wide Web Consortium, 2008.
MMIF
"W3C Multimodal Interaction Framework" , James A. Larson, T.V. Raman and Dave Raggett, editors, World Wide Web Consortium, 2003.
MMIUse
"W3C Multimodal Interaction Use Cases", Emily Candell and Dave Raggett, editors, World Wide Web Consortium, 2002.
SCXML
"State Chart XML (SCXML): State Machine Notation for Control Abstraction (Working Draft)" , Jim Barnett et al. editors. SCXML, or the "State Chart extensible Markup Language", provides a generic state-machine based execution environment based on CCXML and Harel State Tables. World Wide Web Consortium, 2008.
VoiceXML
"Voice Extensible Markup Language (VoiceXML) Version 2.1" , Matt Oshry et al. editors. VoiceXML, the Voice Extensible Markup Language, is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications. World Wide Web Consortium, 2007.
XHTML
" XHTML 1.0 The Extensible HyperText Markup Language (Second Edition)" , Steven Pemberton et al. editors. World Wide Web Consortium, 2004.
XMLHttpRequest
The XMLHttpRequest Object (Working Draft) , Anne van Kesteren. World Wide Web Consortium, 2008.