Multimodal Interaction Activity
Max Froumentin
This talk is aimed at explaining what multimodal
interaction is, how the W3C is standardising in that area, and what
has happened since last year. A lot has happened actually, since it
had just started last year.
Contents
- Intro: what is it?
- The W3C MMI Framework
- Ongoing Work
Intro::What's MMI?
- Multimedia for output
- ??? for input
- Multimodal for output and input
- PDAs, mobile phones, car nav systems
- Web access is expected to be the main application. so it makes sense to have the w3c standardise it.
Intro::Scenarios
Described in Use Cases document.
Examples:
- alternate modalities: driving directions
- simultaneous modalities: travel reservation
a very complex problem...
... that's why there is quite a big working group...
Intro::The MMI Working Group
- 43 Companies
- 79 Participants
- 4 Subgroups
- 7 documents and counting: Requirements, Use Cases, Framework, Ink, EMMA, etc.
- Many dependencies on other groups
The MMI Framework
- Framework doesn't necessarily map to hardware or devices
- Reuse of existing markup: XHTML, CSS, SVG for output, XForms for input
Framework::Input
Framework::Output
Ongoing Work
Last year, the WG had just started and concentrated on the framework.
Now more specific areas are worked on:
- Object Model
- I/O components
- New markup: EMMA, InkXML
- Interaction manager
- ...
Ongoing Work::The Object Model
Going down one level, it is necessary to specify
interfaces between components. The whole framework can be seen as a
distributed DOM, or it can be seen as components passing messages
between each other, in the form of markup.
Distributed DOM
Framework::The Object Model (cont'd)
Or message passing infrastructure
Ongoing Work::I/O Components
VIO - Requirements
- Defines an interface for voice input and output
- Input: speech recognition, DTMF
- Output: speech synthesis
Ink object, GUI object, etc.
Ongoing Work::Markup
- Output: reuse existing markup
- Input: new markup needed...
Ongoing Work::Markup::Ink
Markup for pen-based input devices
<traceFormat>
<regularChannels>
<channel name="X" type="decimal">
<channel name="Y" type="decimal">
</regularChannels>
<intermittentChannels>
<channel name="S1" type="boolean" default="F"/>
<channel name="S2" type="boolean" default="F"/>
</intermittentChannels>
</traceFormat>
<trace id = "4525BCD">
1125 18432'23'43"7"-8 3-5+7 -3+6+2+6 8+3+6:T;+2+4:*T;+3+6+3-6:FF;
</trace>
Ongoing Work::Markup::EMMA
Markup for input annotations
- Input device produces one or more results of a template (XForms)
- These results are annotated by intermediaries (device info,
confidence scores)
<emma:emma emma:version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<emma:one-of rdf:ID="r1">
<emma:interpretation rdf:ID="int1">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation rdf:ID="int2">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
<rdf:RDF>
<!-- time stamps for date in first interpretation -->
<rdf:Description rdf:about="#xpointer(id('int1')/date)"
emma:start="2003-03-26T0:00:00.15"
emma:end="2003-03-36T0:00:00.2"/>
</rdf:RDF>
</emma:emma>
Conclusion
- The MMI framework is well advanced.
- Parts are beginning to be defined formally in specs.
- Many "holes" left: dynamic configuration, multi-user, multi-device.
- dependencies: HTML/SVG/XForms/CSS, DI, RDF
Conclusion::More Info