Multimodal Interaction Activity

Max Froumentin

This talk is aimed at explaining what multimodal interaction is, how the W3C is standardising in that area, and what has happened since last year. A lot has happened actually, since it had just started last year.

Intro: what is it?
The W3C MMI Framework
Ongoing Work

Intro::What's MMI?

Multimedia for output
??? for input
Multimodal for output and input
PDAs, mobile phones, car nav systems
Web access is expected to be the main application. so it makes sense to have the w3c standardise it.

t68i phone p800 phone Nokia Phone Sony PDA

Intro::Scenarios

Described in Use Cases document.

Examples:

alternate modalities: driving directions
simultaneous modalities: travel reservation

a very complex problem...

... that's why there is quite a big working group...

Intro::The MMI Working Group

43 Companies
79 Participants
4 Subgroups
7 documents and counting: Requirements, Use Cases, Framework, Ink, EMMA, etc.
Many dependencies on other groups

The MMI Framework

MMI Framework

Framework doesn't necessarily map to hardware or devices
Reuse of existing markup: XHTML, CSS, SVG for output, XForms for input

Framework::Input

input framework

Framework::Output

output framework

Ongoing Work

Last year, the WG had just started and concentrated on the framework. Now more specific areas are worked on:

Object Model
I/O components
New markup: EMMA, InkXML
Interaction manager
...

Ongoing Work::The Object Model

Going down one level, it is necessary to specify interfaces between components. The whole framework can be seen as a distributed DOM, or it can be seen as components passing messages between each other, in the form of markup.

Distributed DOM

Framework::The Object Model (cont'd)

Or message passing infrastructure

Message Passing

Ongoing Work::I/O Components

VIO - Requirements

Defines an interface for voice input and output
Input: speech recognition, DTMF
Output: speech synthesis

Ink object, GUI object, etc.

Ongoing Work::Markup

Output: reuse existing markup
Input: new markup needed...

Ongoing Work::Markup::Ink

Markup for pen-based input devices

<traceFormat>
  <regularChannels>
     <channel name="X" type="decimal">
     <channel name="Y" type="decimal">
  </regularChannels>
  <intermittentChannels>
     <channel name="S1" type="boolean" default="F"/>
     <channel name="S2" type="boolean" default="F"/>
  </intermittentChannels>
</traceFormat>

<trace id = "4525BCD">
1125 18432'23'43"7"-8 3-5+7  -3+6+2+6 8+3+6:T;+2+4:*T;+3+6+3-6:FF;
</trace>

Ongoing Work::Markup::EMMA

Markup for input annotations

Input device produces one or more results of a template (XForms)
These results are annotated by intermediaries (device info, confidence scores)

<emma:emma emma:version="1.0"
 xmlns:emma="http://www.w3.org/2003/04/emma#"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> 
  <emma:one-of rdf:ID="r1">
    <emma:interpretation rdf:ID="int1">
      <origin>Boston</origin>
      <destination>Denver</destination>
      <date>03112003</date>
    </emma:interpretation>
    <emma:interpretation rdf:ID="int2">
      <origin>Austin</origin>
      <destination>Denver</destination>
      <date>03112003</date>
    </emma:interpretation>
  </emma:one-of>

  <rdf:RDF>
    <!-- time stamps for date in first interpretation -->
    <rdf:Description rdf:about="#xpointer(id('int1')/date)" 
      emma:start="2003-03-26T0:00:00.15"
      emma:end="2003-03-36T0:00:00.2"/>
  </rdf:RDF>
</emma:emma>

Conclusion

The MMI framework is well advanced.
Parts are beginning to be defined formally in specs.
Many "holes" left: dynamic configuration, multi-user, multi-device.
dependencies: HTML/SVG/XForms/CSS, DI, RDF

Conclusion::More Info

WG Page: http://www.w3.org/2002/mmi
Staff Contact: Max Froumentin (mailto:mf@w3.org)
Activity Lead: Dave Raggett (mailto:dave@w3.org)