The mission of the Multimodal Interaction Working Group, part of the Multimodal Interaction Activity, is to develop open standards that enable the following vision:
End date | 31 December 2016 |
---|---|
Confidentiality | Proceedings are Public. |
Initial Chairs | Deborah Dahl |
Initial Team Contacts (FTE %: 20) |
Kazuyuki Ashimura |
Usual Meeting Schedule | Teleconferences: Weekly (1 main group call
and 1 task force call) Face-to-face: as required up to three per year |
The primary goal of this group is to develop W3C Recommendations that enable multimodal interaction with various devices including desktop PCs, mobile phones and less traditional platforms such as cars and intelligent home environments including digital TVs/connected TVs. The standards should be scalable to enable richer capabilities for subsequent generations of multimodal devices.
Users will be able to provide input via speech, handwriting, motion or keystrokes, with output presented via displays, pre-recorded and synthetic speech, audio, and tactile mechanisms such as mobile phone vibrators and Braille strips. Application developers will be able to provide an effective user interface for whichever modes the user selects. To encourage rapid adoption, the same content can be designed for use on both old and new devices. Multimodal access capabilities depend on the devices used. For example, users of multimodal devices which include not only keypads but also touch panel, microphone and motion sensor can enjoy all the possible modalities, while users of devices with restricted capability prefer simpler and lighter modalities like keypads and voice.Extensibility to new modes of interaction is important, especially considered in the context of the current proliferation of types of sensor input, for example, in biometric identification and medical applications.
As technology evolves the ability to process modalities grows. The office computer is an example. From punch cards, to keyboards to mice and touch screens, the machine has seen a variety of modalities come and go. The mediums that enable interaction with a piece of technology continue to expand as increasingly sophisticated hardware is introduced into computing devices. The medium mode relationship is one of many to many. As mediums are introduced into a system, they may introduce new modes or leverage existing ones. For example: a system may have a tactile medium of touch and a visual medium of eye tracking. These two mediums may share the same mode, such as ink. Devices increasingly expand the ability to weave mediums and modes together. Modality components do not necessarily have to be devices. Taking into consideration the increasing pervasiveness of computing, the need to handle diverse sets of interactions introduces a need for a scalable multimodal architecture that allows for rich human interaction. Systems currently leveraging the multimodal architecture include TV, healthcare, automotive technology, heads up displays and personal assistants. MMI enables interactive, interoperable, smart systems.
The diagram below shows an example of an ecosystem of interacting Multimodal Architecture components. In this example, components are located in the home, the car, a smartphone, and the Web. User interaction with these components is mediated by devices such as smartphones or wearables. The MMI Architecture, along with EMMA, provides a layer that virtualizes user interaction on top of generic Internet protocols such as WebSockets and HTTP, or on top of more specific protocols such as ECHONET.
As an example, a user in this ecosystem might be driving home in the car, and want to start their rice cooker.The user could say something like "start the rice cooker" or "I want to turn on the rice cooker". MMI Life Cycle events flowing among the smartphone, the cloud and the home network result in a command being sent to the rice cooker to turn on. To confirm that the rice cooker has turned on, voice output would be selected as the best feedback mechanism for a user who is driving. Similar scenarios could be constructed for interaction with a wide variety of other devices in the home, the car, or in other environments.
Note that precise synchronization of all the related entities is becoming more and more important, because network connection is getting very fast and there is a possibility of monitoring sensors on the other side of the world or even on an artificial satellite.
The diagram below shows how the MMI-based user interface relates to other Web technologies including the following layers using various communication mechanisms, e.g., MMI Events, Indie UI Events, Touch Events and Web APIs:
Note that MMI is a generic mechanism to handle event exchange for advanced Web applications with distributed multiple entities.
The target audience of the Multimodal Interaction Working Group are vendors and service providers of multimodal applications, and should include a range of organizations in different industry sectors like:
Speech is a very practical way for those industry sectors to interact with smaller devices, allowing one-handed and hands-free operation. Users benefit from being able to choose which modalities they find convenient in any situation. General purpose personal assistant applications such Apple Siri, Google Now, Google Glass, Anboto Sherpa, XOWi, Speaktoit for Android, Indigo for Android and Cluzee for Android are becoming widespread. These applications feature interfaces that include speech input and output, graphical input and output, natural language understanding, and back-end integration with other software. More recently, enterprise personal assistant platforms such as Nuance Nina, Angel Lexee and Openstream Cue-me have also been introduced. In addition to multimodal interfaces, enterprise personal assistant applications frequently include deep integration with enterprise functionality. Personal Assistant functionality is merging with earlier Multimodal Voice Search applications which have been implemented in applications by a number of companies, including Google, Microsoft, Yahoo, Novauris, AT&T, Vocalia, and Sound Hound. The Working Group should be of interest to companies developing smart phones and personal digital assistants or who are interested in providing tools and technology to support the delivery of multimodal services to such devices.
To solicit industrial expertise for the expected multimodal interaction ecosystem, the group held the following two Webinars and a Workshop:
Those Webinars and Workshop were aimed at Web developers who may find it daunting to incorporate innovative input and output methods such as speech, touch, gesture and swipe into their applications, given the diversity of devices and programming techniques available today. During those events, we had great discussion on rich multimodal Web applications, and clarified many new use cases and requirements for enhanced multimodal user-experiences, e.g., distributed/dynamic applications depend on the ability of devices and environments to find each other and learn what modalities they support. Target industry areas include health care, financial services, broadcasting, automotive, gaming and consumer devices. Example use cases include:
The main target of "Multimodal Interaction" used to be the interaction between computer systems and human, so the group have been working on human-machine interfaces, e.g., GUI, speech and touch. However, after holding the above events, it was clarified that MMI Architecture and EMMA are useful for not only human-machine interface but also integration with machine input/output. Therefore there is a need to consider how to integrate existing non Web-based embedded systems and Web-based systems, and the group would like to provide a solution to deal with the issue by tackling the following standardization work:
During the previous charter period the group brought the Multimodal Architecture and Interfaces (MMI Architecture) specification to Recommendation. The group will continue maintaining the specification.
The group may expand the Multimodal Architecture and Interfaces specification by including the description of Modality Components Registration & Discovery Working Group Note so that the specification would fit the need of industory sectors.
The group has been investigating how various Modality Components in the MMI Architecture which are responsible for controlling the various input and output modalities on various devices can register their capabilities and existence with the Interaction Manager. The possible examples of registration information of Modality Components include capabilities of Modality Components such as Languages supported, ink capture and playback, media capture and playback (audio, video, images, sensor data), speech recognition, text to speech, SVG, geolocation and emotion.
The group's proposal for the description of multimodal properties focuses on the use of semantic Web services technologies over well known network layers like UPnP, DLNA, Zeroconf or the HTTP protocol. Intended to be an extension of the current MMI Architecture and Interfaces recommendation, the goal is to enhance the discovery of services provided by Modality Components on devices using well defined descriptions from a multimodal perspective. We believe that this approach will enhance the autonomous behavior of multimodal applications, providing a robust perception of the user-system needs and exchanges, and helping control the semantic integration of the human-computer interaction from a context and functioning environment perspective. As a result, the challenge of discovery is addressed using description tools provided by the field of web services, like WSDL 2.0 documents annotated with semantic data (SAWSDL) in owl or owl-s languages using the modelReference attribute. This extension adds functionalities for the discovering and registration of the Modality Components by using semantic WSDL descriptions of the contextual preconditions (QoS, price and other non functional information), processes and results provided by these components. These documents, following the WSDL 2.0 structure can either be expressed in xml format or in the lighter Json or Yaml formats.
The group generated a Working Group Note on Registration & Discovery, and based on that document the group will generate a document which describes the detailed definition of the format and process of Registration of Modality Components as well as the process for the Interaction Manager to discover the existence and current capabilities of Modality Components. This document is expected to become a recommendation track document.
The group will investigate and recommend how various W3C languages can be extended for use in a multimodal environment using the multimodal Life-Cycle Events. The group may prepare W3C Notes on how the following languages can participate in multimodal specifications by incorporating the life cycle events from the multimodal architecture: XHTML, VoiceXML, MathML, SMIL, SVG, InkML and other languages that can be used in a multimodal environment. The working group is also interested in investigating how CSS and Delivery Context: Client Interfaces (DCCI) can be used to support Multimodal Interaction applications, and if appropriate, may write a W3C Note.
EMMA is a data exchange format for the interface between different levels of input processing and interaction management in multimodal and voice-enabled systems. Whereas the MMI Architecture defines communication among components of multimodal systems, including distributed and dynamic systems, EMMA provides information used in communications with users. For example, EMMA provides the means for input processing components, such as speech recognizers, to annotate application specific data with information such as confidence scores, time stamps, and input mode classification (e.g. key strokes, touch, speech, or pen). It also provides mechanisms for representing alternative recognition hypotheses including lattice and groups and sequences of inputs. EMMA supersedes earlier work on the natural language semantics markup language (NLSML) in the Voice Browser Activity.
During the previous charter periods, the group brought EMMA Version 1.0 to Recommendation. Also the group generated Working Drafts of the EMMA Version 1.1 specification based on the feedback from the MMI Webinars and the MMI workshop in 2013.
In the period defined by this charter, the group will publish a new EMMA 2.0 version of the specification which incorporates new features that address issues brought up during the development of EMMA 1.1. The group won't do any more work on EMMA Version 1.1, but the work done on EMMA 1.1 will be folded into EMMA Version 2.0.
Note that the original EMMA 1.0 specification focused on representing multimodal user inputs, and EMMA 2.0 will add representation of multimodal outputs directed to users.
New features for EMMA 2.0 includes but not limited to the following:
During the MMI Webinars and the MMI workshop, there was discussion on possible mechanism to handle semantic interpretation for multimodal interaction, e.g., gesture recognition, like the one for speech recognition, i.e., Semantic Interpretation for Speech Recognition (SISR).
The group will start to see how to apply SISR to semantic interpretation for non-speech Modality Components, and discuss concrete use cases and requirements. However, the group will not generate a new version of SISR specification.
The group will be maintaining the following specifications:
The group will also consider publishing additional versions of the specifications depending on feedback from the user commuinty and industry sectors.
For each document to advance to proposed Recommendation, the group will produce a technical report with at least two independent and interoperable implementations for each required feature. The working group anticipates two interoperable implementations for each optional feature but may reduce the criteria for specific optional features.
The following documents are expected to become W3C Recommendations:
The following documents are Working Group Notes and are not expected to advance toward Recommendation:
This Working Group is chartered to last until 31 December 2016.
Here is a list of milestones identified at the time of re-chartering. Others may be added later at the discretion of the Working Group. The dates are for guidance only and subject to change.
Note: The group will document significant changes from this initial schedule on the group home page. | |||||
Specification | FPWD | LC | CR | PR | Rec |
---|---|---|---|---|---|
EMMA 2.0 | 2Q 2015 | 3Q 2015 | 1Q 2016 | 3Q 2016 | 3Q 2016 |
Modality Components Registration & Discovery | 2Q 2015 | 4Q 2015 | 1Q 2016 | 4Q 2016 | 4Q 2016 |
These are W3C activities that may be asked to review documents produced by the Multimodal Interaction Working Group, or which may be involved in closer collaboration as appropriate to achieving the goals of the Charter.
Furthermore, Multimodal Interaction Working Group expects to follow these W3C Recommendations:
This is an indication of external groups with complementary goals to the Multimodal Interaction activity. W3C has formal liaison agreements with some of them, e.g. VoiceXML Forum. The group will coordinate with other new related organizations as well if needed.
To be successful, the Multimodal Interaction Working Group is expected to have 10 or more active participants for its duration. Effective participation in the Multimodal Interaction Working Group is expected to consume one work day per week for each participant; two days per week for editors. The Multimodal Interaction Working Group will also allocate the necessary resources for building Test Suites for each specification.
In order to make rapid progress, the Multimodal Interaction Working Group consists of several subgroups, each working on a separate document. The Multimodal Interaction Working Group members may participate in one or more subgroups.
Participants are reminded of the Good Standing requirements of the W3C Process.
Experts from appropriate communities may also be invited to join the working group, following the provisions for this in the W3C Process.
Working Group participants are not obligated to participate in every work item, however the Working Group as a whole is responsible for reviewing and accepting all work items.
For budgeting purposes, we may hold up to three full group face-to-face meetings per year if we believe them to be beneficial. The Working Group anticipate holding a face-to-face meeting in association with the technical plenary. The Chair will make Working Group meeting dates and locations available to the group in a timely manner according to the W3C Process. The Chair is also responsible for providing publicly accessible summaries of Working Group face to face meetings, which will be announced on www-multimodal@w3.org.
This group primarily conducts its work on the Member-only mailing list w3c-mmi-wg@w3.org (archive). Certain topics need coordination with external groups. The Chair and the Working Group can agree to discuss these topics on a public mailing list. The archived mailing list www-multimodal@w3.org is used for public discussion of W3C proposals for Multimodal Interaction Working Group and for public feedback on the group's deliverables.
Information about the group (deliverables, participants, face-to-face meetings, teleconferences, etc.) is available from the Multimodal Interaction Working Group home page.
All proceedings of the Working Group (mail archives, teleconference minutes, face-to-face minutes) will be available to W3C Members. Summaries of face-to-face meetings will be sent to the public list.
As explained in the Process Document (section 3.3), this group will seek to make decisions when there is consensus among working group participants. When the Chair puts a question and observes dissent, after due consideration of different opinions, the Chair should record a decision (possibly after a formal vote) and any objections, and move on.
This charter is written in accordance with Section 3.4, Votes of the W3C Process Document and includes no voting procedures beyond what the Process Document requires.
This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis.
For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.
This charter for the Multimodal Interaction Working Group has been created according to section 6.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.
The most important changes from the previous charter are:
Copyright© 2015 W3C ® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
$Date: 2015/06/17 16:28:56 $