W3C

Discovery & Registration of Multimodal Modality Components

W3C Working Group Note 2 February 2017

This version:
https://www.w3.org/TR/2017/NOTE-mmi-mc-discovery-20170202/
Latest published version:
https://www.w3.org/TR/mmi-mc-discovery/
Previous version:
https://www.w3.org/TR/2016/WD-mmi-mc-discovery-20160411/
Editor:
B. Helena Rodríguez, W3C Invited Expert
Authors:
Jim Barnett, Genesys Telecommunications Laboratories
Deborah Dahl, W3C Invited Expert
Raj Tumuluri, Openstream, Inc.
Nagesh Kharidi, Openstream, Inc.
Kazuyuki Ashimura, W3C

Abstract

This document is addressed to people who want to develop Modality Components for Multimodal Applications distributed over a local network or "in the cloud". With this goal, in a multimodal system implemented according to the Multimodal Architecture Specification, over a network, to configure the technical conditions needed for the interaction, the system must discover and register its Modality Components in order to monitor and preserve the overall state of the distributed elements. Therefore, Modality Components can be composed with automation mechanisms in order to adapt the Application to the state of the surrounding environment.

Status of This Document

Beware. This specification is no longer in active maintenance and the Multimodal Interaction Working Group does not intend to maintain it further.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document has been published as a Working Group Note to reflect the fact that the Multimodal Interaction Working Group is no longer progressing it along the W3C Recommendation Track. A record of discussion relating to this specification can be found in the Multimodal Interaction Working Group's email archive. The email list was www-multimodal@w3.org.

The changes from the previous Working Draft are (1) removal of "State Handling" from the title since the document now describes not only state handling but also annotation vocabulary, (2) addition of the description on a vocabulary for the annotation of Modality Components and (3) clarifications and modifications based on public comments. A diff-marked version of this document is also available for comparison purposes.

The Multimodal Interaction Working Group was chartered to develop open standards that enable the following vision:

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

1. Introduction

To the best of our knowledge, there is no standardized way to build a web Application that can dynamically combine and control discovered components by querying a registry build based on the multimodal types of the modalities and their states. This document covers three needs on Discovery & Registration for this kind of web Application implemented following the Multimodal Architecture Specification.

First, we define a new component responsible for the management of the state of a Multimodal System, extending the control layer already defined in the Multimodal Architecture Specification (Table 1 col. 1). This component will be responsible for handling the messages exchanged in order to declare the presence (or absence) of the Modality Components of the system .

Second, this document presents an adaptive push/pull mechanism, needed to inform the system about the changes in the state of the Modality Components. (Table 1 col. 2) These changes are not necessarily related to the interaction functional context itself, but they can affect it, for example, in the case of the unavailability of a given Modality Component

And finally, to allow the advertisement of the state of the Modality Components by using the adaptive mechanism, two new events are needed.(Table 1 col. 3) The semantics of these new events is not directly related to the interaction context but it is related with the system's configuration; for this reason a new component responsible for the management of the state of the Multimodal System is needed.

 

Resources Handling A new direction in the messages flow Events for System's updates
The state management through events and the pull mechanism must be supported by a dedicated component, responsible for the management of the state of the Modality Components in the Multimodal System. An adaptive pull mechanism needed to inform periodically of the availability or other kind of evolution on the state of the Modality Components. A new event and a new notification to support the pull mechanism and the advertisement, registering, search and update of Modality Component's availability.

Table 1: Requerements for discovery in a multimodal Architecture based on the MMI specification

2. Domain Vocabulary

Interaction Context
According with the MMI Architecture "a context represents a single extended interaction with zero or more users across one or more modality components. (...) In general, the 'context' should cover the longest period of interaction over which it would make sense for components to store information."

Following the definition above and for the purposes of this document, an Interaction Context represents a single exchange between a system and one or multiple users across one or multiple interaction modes. It can be as simple as a single period of displaying an audiovisual content (e.g. a program), a phone call or a web session.
 
The Interaction Context can be also a richer interaction combining, for example, voice, gesture and a direct interaction with a light pointer or a shared whiteboard with an associated VoIP call that during the interaction evolves to a text chat. In these cases, a single context persists across various modality configurations. [See: the multimodal context in the MMI architecture]

Multimodal System
For the purposes of this document, a Multimodal System is any system communicating with one or multiple users through different modalities such as voice, gesture, or handwriting in one or multiple interaction cycles, each one identified by an unique context. In a Multimodal System the Application or the end user can dynamically switch modalities in the same context of information exchange. This is a bi-directional system with combined inputs and outputs in multiple sensorial modes (e.g. visual, acoustic, haptic, olfactive, gustative) and modalities (e.g. voice, gesture, handwriting, biometrics capture, temperature sensing).

Modality Component
For the purpose of this document, a 'modality' is the way an idea could be communicated or the manner an action could be performed through a medium. In some mobile multimodal systems, for example, the primary modality is a speech and an additional modality can typically be gesture, gaze, sketch, or a combination thereof. These are forms of representing information in a known and recognizable logical structure.
 
For example, visual data can be expressed as a gesture modality (e.g. a pointing gesture) or as a signing modality (e.g. a deaf human talking). Following this idea, in this document a Modality Component is a logical entity that handles the input or output of different hardware Devices (e.g. camera, microphone, graphic tablet, keyboard, sensors) or software Services (e.g. motion detection, image recognition) associated with the Multimodal System. Modality Components are responsible for specific tasks, including handling inputs and outputs in various ways, such as speech, writing, video, etc. Modality Components are also loosely coupled software modules that may be either co-resident on a device or distributed across a network.

Interaction Manager
For the purposes of this document, the Interaction Manager is also a logical component handling the multimodal integration and composition . It is responsible for all message exchanges between the components of the Multimodal System and the hosting runtime framework [See : Architecture Components, the Description of an Interaction Manager in the Multimodal Interaction Framework].

Data Component
For the purposes of this document a Data Component is a logical entity that stores the public and private data of any module in a Multimodal System. The data component's primary role is to save the public data that may be required by one or several Modality Components or by other modules (eg., a session component in the hosting framework). [See: the Multimodal Interaction Framework Description of a Session Component].

Application
For the purposes of this document, the term Application refers to a collection of events, components and resources which use server-side or client-side processing and the Multimodal Architecture Specification to provide sensory, cognitive and emotional information [See : Emotion Use Cases] through a rich multimodal user experience. The Application is a collection of interaction cycles designed to achieve one or mutliple tasks. For example, a multimodal Application can be implemented for use in an integrated way, mobile Devices and cell phones, home appliances, Internet of Things, objects, robots, television and home networks, enterprise applications, web applications, "smart" cars or medical devices.

Service
For the purposes of this document, a Service is a set of functionalities associated with a process or system that performs a task and is wrapped in a Modality Component abstraction. A multimodal Service is any functionality wrapped in a Multimodal Component (e.g. a Modality Component or the Interaction Manager), publishing information about its behavior and using one or multiple devices. We will use the term Service Description as a set of attributes (metadata) describing a particular service. The term Service Advertisement refers to the publication of the metadata about the Service by indexing this metadata in some registry and making available the Service to the client's requests.

Push notifications
In this document we assume that push techniques allow the server to send new events or data to the client through progressive download or long-polling requests from the client. They enable client applications to subscribe to particular events, notifications or data streams and provide the server a callback address or a client-side service to which they are delivered. In consequence, to implement a push technique, the server must know the client's address in order to deliver the data, and some registry of registered subscribers must be created. The server determines when things change, and simply sends down the new data. In this way, new connections don't have to be opened all the time.
Pull notifications
In this document we assume that pull techniques allow the client to send new events or data to the server through events or long-polling requests from the client.
Multimodal Session
For the purposes of this document, a Multimodal Session is a system's state in which is allowed or prepared the user interaction. It covers: the loading of components and ressources, its registration and its availability and the interaction cycle itself.

 


3.Scope

In the current state of the Multimodal Architecture Specification, the events that are responsible for handling the control of the user-system interaction, like Prepare or Start, must be triggered only by the Interaction Manager and sent to the Modality Components. As a result, a Modality Component can not send a StartRequest or a PrepareRequest to the Interaction Manager. In both cases the Modality Component depends on the Interaction Manager to begin the interaction cycle by raising an event, originated by an internal command or in reaction to a previous notification sent by a Modality Component (Figure 1).

A Modality Component may send a NewContextRequest to the Interaction Manager to request the creation of a new context of interaction. The interaction can be started by different Modality Components independently. Nevertheless, to start an interaction the Modality Component needs to be already part of the system a to be registered, given that a context represents a single extended interaction with one or more Modality Components.

This means that the Multimodal System has two complementary phases: the runtime phase (defined by the execution of one or multiple interaction cycles), and the system configuration phase (defined by the loading of components and their monitoring and adaptation in real-time).

The semantics of the NewContextRequest event is different and mostly oriented to the interaction phase, while the registering process is part of a previous phase, when even the presence of the user is not mandatory. This phase is designed for a system that will handle one or more interaction processes at the same time.

 

Current push mechanism ( input and output direction )

Figure 1: Current MMI communication mechanism: IM to MC ( input and output direction )

In addition, in the current state of the MMI Recommendation, the Interaction Manager is supposed to know ahead of time, the address and port of all the Modality Components available in the system. In consequence, the preparation of the media or the start of the interaction cycle also currently implies the setting up of a "multimodal session" that is not completely defined at the current stage of the specification.

The Extension and Status notifications are dedicated to the exchange of interaction data, while the data exchanged in a discovery process is mostly previous to any interaction between the user and the system. During the configuration phase (or reconfiguration according to a changement of the overall state), the system prepares and registers the information about the Modality Components (availability, technical characteristics, cost). In this way, all this information might be used in the future, when the user-system interaction actually takes place.

In other terms, the semantics of the two existing notifications differ from the features needed for discovery. The communication protocol paradigm (the flow of messages always initiated by the Interaction Manager) is not sufficient if the Recommendation is used to address use cases evolving in dynamic environments, as described in uses cases like [UC 2.1] Personal Externalized Interfaces: Smart Cars, [UC 3.1] Public Spaces: Interactive Spaces or [UC 3.2] Public Spaces:In-Office Events Assistance, MMI Use Cases, or some of the use cases described in our current charter.

In all these cases, the Modality Components enter and quit the multimodal system dynamically, and they must declare to the system, their existence, availability and capabilities in some way:

In the first case, [UC 2.1] Personal Externalized Interfaces: Smart Cars, the Modality Components provided by a smartphone must be detected by the multimodal system to relate these features to the features provided by the Modality Components in the car.

In the second case, [UC 3.1] Public Spaces: Interactive Spaces, the discovery of the Modality Components installed on the client's smartphone can affect the behavior of the multimodal application in the public space.

In the third case, [ UC 3.2 ] Public Spaces: In-Office Events Assistance, the announcement and discovery of the Modality Component capabilities in a smart conference room can allow the attendees to access to some of the multimodal services provided by the conference room, providing a fine-grained adaptation of the application features to the multimodal interaction's environment state.

For all these reasons the current document addresses the need of support for discovery and registration in very dynamical environments, like the ones described above by proposing a resources manager, a new flow of messages and two events specifically designed to carry discovery and registration data.

4. A manager to handle the state of the resources of the multimodal system

A Modality Component's discovery protocol needs a mechanism tracing the relevant session data to be handled on the control layer. This is the first of the responsibilities for a Resources Manager. This manager is responsible for handling the evolution of the "multimodal session" [See: Functions of Session Component in W3C Multimodal Interaction Framework] and the modifications in any of the participants of the system that could affect its global state. This component is also aware of the system's capabilities, like the address of modalities, their availability or their processing state.

The inclusion of the Resources Manager responds to the functional requirement concerning the management of the interaction cycles locally and globally, the requirement of an appropriate real-time sensing for dynamic uses; and, partially, to the requirement concerning the support of processing of dynamic and incomplete data. [See: MMI Framework requirements]

The Resources Manager is nested in the control layer of the multimodal system (turquoise in Figure 2) which is slightly different from the proposal of a Session Component described in the W3C Multimodal Interaction Framework.

Two implementations for multimodal state handling

Figure 2: The Multimodal System's state handling: a Resources Manager (RM)


The Resources Manager keeps the control of the multimodal session state (the state of the system to allow a series of user interactions) and the resources of the system in a unified layer with the control of the user interaction (Interaction Manager). In other words, the control layer (dark gray in Figure 2) encompasses the handling of the multimodal interaction and the management of the resources on the multimodal system. In this way, the architecture preserves its compliance with the MVC design pattern. As Figure 2 shows, the Resources Manager can be also be nested in a Complex Modality Component, following the Russian Doll Pattern.

4.1 The Resources Manager and the MVC design pattern of the MMI Architecture

In the MVC model (Figure 3), the Controller translates the user's actions into method calls on the Model. The Model broadcasts a notification to the View and to the Controller to inform that its state has changed. The View queries the Model to determine the exact change. Upon reception of the response, the View updates the display according to the information received. Thus, in the MVC pattern, the View is directly linked with its controller, but it can also query and communicate with the Model.

In this pattern, the Model offers a registration mechanism so that multiple Views and Controllers can express their interest in the Model through anonymous callbacks. This allows an easy implementation of multiple renderings of the same domain concepts either on one local device or across multiple distributed devices.

The MVC architecture design pattern

Figure 3: The MVC architecture design pattern



The Resources Manager described in the current document, allows the management of the states of the Modality Component (which represents the MVC view in the MMI Architecture) putting this function in the control layer (dark gray in Figure 4).

The Resources Manager translates the user's actions into method calls on the Data Component, as the MVC pattern proposes. WHile the INteraction Manager handles the user interaction, the Ressources Manager will take care of the state of the system, the type and avalaibility of the Modality Components and the state of the multimodal session.

The Modality Component's communication and request of state information is restricted to exchanges with the Control layer as the MMI Recommendation defines. The Model broadcasts a notification to the Resources Manager (Figure 4), and then, the Resources Manager informs the Modality Component that the state has changed using a flow of messages through an UpdateNotification or a CheckUpdateResponse. Upon reception of the UpdateNotification or the checkUpdateResponse, the Modality Component updates the user interface according to the information received.

The State Manager included on the MMI structure

Figure 4: The Resources Manager included on the MMI structure



4.2 The Resources Manager responsibilities

Thus, the Resources Manager delivers information about the state and the resources of the multimodal system during and outside the interaction cycle . Some of its responsibilities can be:



4.3 Data structures handled by the Resources Manager

The Resources Manager can also process and serialize in data structures the traces of external and internal phenomena. Depending on the complexity of the implementation, the application can store in the Data Component:

4.4 Requirements Addressed by the Ressources Manager

Requirements
Distribution The Resources Manager supports the coordination between distributed components, and their communication through the control layer. This enables it to synchronize the input constraints across modalities [MMI-I16] and also enhances the resolution of input conflicts from distributed modalities [MMI-I17].
Advertisement The Resources Manager is the starting point to declare and process the advertised announcements and to keep them up to date.
Discovery The Resources Manager is also the core support for mediated and passive discovery and ilt can also be used to trigger active discovery using the push mechanism or to execute some of the tasks on fixed discovery.
Registration The Resources Manager is also the interface that can be requested to register the Modality Component's information. It handles all the communication between the Modality Components and the registry handled by the Data Component, and it manages the multiple renderings of private and public data related to the state of the multimodal system or the state of the interaction cycle.
Querying The flow of queries transit through the Resources Manager who dispatches the requests to the Data Component and notifies the Interaction Manager if needed. Some of these queries must be produced using the state handling events proposed on this document.

5. A bidirectional flow of messages

According to the current MMI life-cycle events protocol, the command of Modality Components is initiated by the Interaction Manager, which means that if there is an HTTP client-server implementation, it can be designed following a push notification technique.

In the communication protocol designed for the MMI life-cycle events , the direction of the message flow (mostly from the Interaction Manager to the Modality Components) is suggested by the specification through the description of the control events, even if the specific communication mechanism is not currently described in detail in the normative section and it is, for the moment, implementation dependent.

This document describes the flow of messages in both directions, which are needed for the Discovery & Registration of Modality Components. With this proposal, the MMI architecture will respond more accurately to architectural requirements like completeness, extensibility, integratability and interoperability concerning the relations allowed between requesters and providers of messages.

Our intention is to allow multimodal developers the use of a communication flow initiated by Modality Components arriving dynamically to the system. An extension that authorizes the Modality Component client to request or provide new data from the server; using for example, form submissions or AJAX-based technologies with the XMLHttpRequest object.

With this mechanism the change in the state of the multimodal session (i.e. the dynamic inclusion of new distributed modalities) is instigated from the Modality Component itself.

After a certain period, the Modality Component's client requests the Resources Manager (i.e. in a server), which notifies the Modality Component about changes on the user interface displayed with other distant components or in the data related to the overall state of the system, causing eventually the Modality Component‘s state to evolve, for example, by putting it on stand-by. The connection is closed after each transfer and the Modality Component is told when to open a new connection, and what data to fetch when it does so.

The inclusion of this new direction in the flow of messages is the best option for tightly coupled clients to which the Resources Manager has reliable access.

Discovery of MC components : pull request to announce (and confirm) availability and capabilities

Figure 5: Discovery of MC components : pull request to announce (and confirm) availability and capabilities

Nevertheless, adding a new direction in the message flow can raise issues related to the risk of high network traffic reducing the overall performance.

In a distributed multimodal system, Modality Components can be idle for long time if no interaction happens or the situation is not optimal to allow a specific type of interaction. Given that the data rate is very low during this period, it is not necessary to keep the client requesting all the time.

For fine-tuning the Modality Component's requests we propose a new attribute: the timeout attribute. The sleep value of this attribute can reduce the requesting time by putting the client (e.g. a Modality Component using recognition services) into a periodic sleep state. This allows handling the requesting frequency to update the state data in the Modality Component.

5.1 Requirements Addressed by the new flow of messages

Requirements
Distribution Modality Components can be distributed in a centralized way, an hybrid way or a fully decentralized way in order to support distributed processing [MMI-A14], [MMI-A15] and distributed input / output synchronization [MMI-A13].
Given the number of devices that could be used, a more flexible way to recognize and include the device in the multimodal system's registry requires to adding a new direction in the flow of messages to allow an announcement of modalities coming from the device every time an important change occurs. This reduces the number of permanent connections, and allows a more pertinent monitoring of the availability and changes on Modality Components at the session level [MMI-A6].
Advertisement With a pull mechanism, the unique identifier of the Modality Component, its name, address, port number, its embedded services, constructor, version and lifetime can be announced when important changes affecting this information occurs. This proactive updating of information facilitates the management of scalable multimodal systems across wide ranges of devices, and supports the application's adaptability [MMI-G2] and the coordination capabilities of the multimodal session [MMI-I8]. It also supports the announcement of evolution in the user profile or user preferences [MMI-G13] - [MMI-G14].
A new direction in the flow of messages also supports the extensibility of the system, through the active announcement of the new modalities or new devices and capabilities to be dynamically added [MMI-I12] - [MMI-O8]. This implies the management of external input events during the announcement process [MMI-A16].
Discovery This new direction in the flow of messages facilitates the mediated and passive discovery of Modality Components. Functions can be partitioned and distributed across several servers or devices that notify periodically their availability and general state. [MMI-C1] and [MMI-C2] .
It also facilitates the deployments using mobile networks, preventing bandwidth limitations and delays because the embedded Modality Component itself can announce and update its current state. [MMI-R1] and [MMI-R2].
Registration Using a new direction in the flow of messages, the updates to the register are triggered by changes dynamically declared by the Modality Component itself without the need of a persistent connection to update data that is not very frequently modified.
This also helps in the registration of high level information used to specify the preconditions and effects produced by the addition of this new Modality Component to the system or its unavailability [MMI-G15]
It also supports the registering of other information that does not change very often, like the semantics of some kinds of inputs, or any specification of the meaning of the embedded modalities implemented in the Modality Component to be registered [MMI-I13]
Querying To enable information gathering in a multimodal system, the simplest strategy is to have all Modality Components providing a continuous stream of all the data that they gather to the Interaction Manager. However, for many types of applications where only a small subset of the collected information is likely to be useful, updated or pertinent, this simple approach can become very inefficient. For this reason, a tunable communication strategy offers significant advantages for optimizing querying.

6. Two events for system updates

With these mechanisms of communication a Modality Component can register its services for a specific period of time. This is the basis for the handling of the Modality Component's state. Every Modality Component can have a life-time, that begins at discovery and ends at a date provided at registration. If the Modality Component does not re-register the service before its lifetime expires, the Modality Component's index is purged. This depends on the parameters given by the Application logic, the distribution of the Modality Components or the context of interaction.

When the lifetime has no end, the Modality Component is part of the multimodal system indefinitely. In contrast, in more dynamic environments, a limited life-time can be associated with the Modality Component, and if it is not renewed before expiration, the Modality Component will be assumed to no longer be part of the multimodal system. Thus, by the use of this kind of registering, the multimodal system can implement a procedure to confirm its global state and update the <<inventory>> of the components that could eventually participate in the interaction cycle. Therefore, registering involves some Modality Components' timeout information, which can be always exchanged between components and, in the case of a dynamic environment, can be updated from time to time.

For this reason, a registration renewal mechanism is needed. We define a renewal mechanism based on the use of the timeout attribute and two new events: the CheckUpdate Event and the UpdateNotification, used in conjunction with an automatic process that ensures periodical requests.

The checkUpdate Event provides a mechanism :

The UpdateNotification provides a mechanism:



6.1 The timeout attribute for state handling and registration renewal

A dedicated data structure is defined for registration: the timeout attribute. A timeout is an ordered list of three elements:

The timeout tuple

Definition 1: The timeout tuple

Each Modality Component can sleep for some time, and then wake up and check to see if there are changes planned on the systems side (by requesting the component responsible for the management of the system states). During sleeping, the client turns off checkUpdate requests, and sets a timer to awake itself later.

The sleep value is calculated by the Resources Manager (on the server side, for example) based on the context-awareness level of the multimodal system. It can be static and defined with a set of basic rules or more dynamic, linked to the semantic analysis of the environment.

The second element of the timeout tuple is the communication life-time. A Modality Component leaves the multimodal system when its life-time is exceeded and needs to restart its registering mechanism to obtain a new Modality Component ID and timeout pace. This supports periodic updates of the availability of the Component (e.g. authorization) or the renewal of its metadata (See Figure 6).

Example of the timeout use

Figure 6: Using the timeout tuple to manage availability

The third element is the communication interval, which is modulated according to the multimodal system's needs by a set of static rules or by a prediction mechanism used in the state handler Component. This element informs the Modality Component before-hand about the frequency of requests that can be allowed by the recipient component (a Resources Manager in a server, for example) in the current conditions. This value is exchanged on each request, which means that it can be changed at any moment in the multimodal session.

The communication intervals will be synchronized, because the Modality Component knows the exact publish interval beforehand according to a time pattern. In this way, data coherence is ensured and network performance maintained. Since the Resources Manager has access to all the state data, it can for example, use a prediction algorithm implemented in the Data Component, to foresee a time when the data is going to change. The Resources Manager then attaches this time value in the timeout triplet to the outgoing data allowing the data synchronization.

Finally, if the Resources Manager prediction is wrong and still a change occurs in data, if the Resources Manager knows the address of the Modality Component it can push the change to it, using the original push technique proposed here. In this case the push command is handled as an interruption of the default pull update mechanism. In this way, the system maintains its reliability.


6.2 CheckUpdateRequest and CheckUpdateResponse

This section is Normative.

The CheckUpdate events are the request/response pair, CheckUpdateRequest and CheckUpdateResponse. CheckUpdateRequest and CheckUpdateResponse are used to check if there are any changes in the system. They share the Context, Source, Target, RequestID and Data fields with MMI Life Cycle Events. A CheckUpdate event MUST include Source, Target and RequestID. It MAY include a Data field. It MAY also include a Context field, if the event pertains to a specific context.

In addition, both CheckUpdate events MUST include the additional fields UpdateType, State, and Timeout. The CheckUpdateResponse MUST also include the field "AutomaticUpdate". The CheckUpdate events can be sent from either the Modality Component to the Resources Manager or from the Resources Manager to the Modality Components.

6.2.1 UpdateType

An attribute that MUST indicate the type of check to be performed. Some values can be: Handshake, Monitoring, Reporting, DataCheck, Resuming, Leaving. This values are application specific.

6.2.2 State

An attribute that MUST indicate the state of the requesting component and its value. Some values MUST be: Alive, Loading, Registering, Available, Idle, Busy Waiting, Processing, Unavailable, Unregistered. (See Figure 7)

States for a Modality Component

Figure 7: Discovery and Registration states for a Modality Component

6.2.2.1 ALIVE

A Modality Component MUST be in Alive state when it is already started and ready to be identified and registered on the multimodal system.

6.2.2.2 LOADING

A Modality Component MAY be in Loading state if it is currently loading resources that it will need to function.

6.2.2.3 REGISTERING

A Modality Component MAY be in Registering state when it has already requested a registration id in the multimodal system through the Resources Manager.

6.2.2.4 AVAILABLE

A Modality Component MUST be in Available state when it is already registered, ready to function and not busy.

6.2.2.5 IDLE

A Modality Component MAY be in Idle state when it is already registered, functioning and waiting for a user input.

6.2.2.6 BUSY WAITING

A Modality Component MAY be in Busy Waiting state when it is already registered, functioning and waiting for a system's event or a system's response.

6.2.2.7 PROCESSING

A Modality Component MUST be in Processing state when it is already registered and processing some task. The process could be any multimodal or unimodal task like transferring, searching, recognizing or any other kind of process. The processing state is related to a given multimodal session (the same Modality Component can handle multiple tasks in parallel from different users and sessions).

6.2.2.8 UNREGISTERED

A Modality Component MUST be in Unregistered state if the system's rules command the unregistration and the Modality Component is no longer authorized to interact with the system (for example if it has to update its access credentials).

6.2.2.9 UNAVAILABLE

A Modality Component MUST be in Unavailable state when it has a failure, or when it is unregistered and it does not update its registration, or when it lacks of resources or must reload them. In short, when the Modality Component is no more able to correctly ensure its task.

 

 

The following list shows the flow between these nine states:

The component MUST pass from the ALIVE state to Loading, Registering or Available state.

The component MUST pass from the LOADING state to Registering or Available state.

The component MUST pass from the REGISTERING state only to Available state.

The component MUST pass from the AVAILABLE state to Idle, Busy Waiting or Unavailable state.

The component MUST pass from the IDLE state to Busy Waiting, Processing or Unavailable state.

The component MUST pass from the BUSY WAITING state to Processing, Idle, Unregistered or Unavailable state.

The component MUST pass from the PROCESSING state to Processing, Idle or Unavailable state.

The component MUST pass from the UNREGISTERED state only to Unavailable state.

Some examples of this flow between the states are:

- Unauthorized Component

ALIVE AVAILABLE BUSY WAITING UNREGISTERED UNAVAILABLE

First the Modality Component announces that it is ALIVE and declares its AVAILABLE state to the system.
After sending this announcement, the Modality Component enters the BUSY WAITING state, waiting a response from the system.
The system does not allow the Modality Component to continue joining the system (it is no more authorized to join the system), then the Component pass to an UNREGISTERED state, and becomes UNAVAILABLE.

- Failure of a Registered Component

ALIVE REGISTERING AVAILABLE IDLE PROCESSING UNAVAILABLE

The Modality Component is ALIVE and announces its address and port to the Resources Manager which registers this data allowing the component to pass to the REGISTERED state.
The Modality Component passes to the AVAILABLE state.
When the system is ready to interact with a user, the Modality Component passes to the IDLE state, waiting for a user action.
If the user interacts with the Modality Component it passes to a PROCESSING state, but then, the current process fails and the component becomes UNAVAILABLE.

- Unavailability of a Registered Component

ALIVE REGISTERING AVAILABLE IDLE PROCESSING BUSY WAITING UNAVAILABLE

The Modality Component is ALIVE and announces its address and port. It is allowed to pass to the REGISTERED state.
The Modality Component passes to the AVAILABLE state. When the system is ready to interact with a user, the Modality Component passes to the IDLE state, waiting for a user action.
The user interacts with the Modality Component it passes to a PROCESSING state. The process needs an exchange with another component on the system, then it waits for a response.
After a certain time with no response (or after a response making impossible to continue the process), the current process fails and the component becomes UNAVAILABLE.

- Registration of a Component needing multimodal resources

ALIVE LOADING REGISTERING AVAILABLE IDLE PROCESSING BUSY WAITING PROCESSING IDLE

The Modality Component is ALIVE. It needs to load some resources, passing on LOADING state.
Then the Modality Component announces its address, port and resources and is allowed to pass to the REGISTERED state.
The Modality Component passes to the AVAILABLE state. When the system is ready to interact with a user, the Modality Component passes to the IDLE state, waiting for a user action.
The user interacts with the Modality Component, it passes to a PROCESSING state. The process communicates with another component on the system, and receives a response.
The process ends and then the Modality Component returns to its IDLE state to wait for another user interaction.

6.2.3 AutomaticUpdate

A boolean-valued attribute indicating whether the state of the Modality Component will be automatically updated by UpdateNotification events or whether the Modality Component will keep sending UpdateNotification events in the future without waiting for another CheckUpdateRequest event. If the Resources Manager is temporarily unavailable the Modality Component will continue to send messages according with the last interval defined by the last timeout information received.

6.2.4 Metadata

An element with an attribute to link with external complementary metadata and an info attribute for inline data. The metadata non-functional information will be complementary to the data, which is a functional information type.

6.2.5 Timeout

An element used to temporize the exchanges between components. The values of this element are defined by the Resources Manager. These values can be changed by a Modality Component if the Modality Component arrives into a state that makes impossible to preserve the pace of communication (i.e. error, fail, unavailability). This element MUST include three attributes. It MUST include a sleep attribute to define the "communication sleep period", a validity attribute to represent the "communication validity period" in milliseconds and an interval attribute to express the "communication interval" in milliseconds. Example:

<mmi:Timeout sleep="1000" validity="5000" interval="500"/>

6.2.6 Example

CheckUpdateRequest (from MC to RM)
<mmi:mmi xmlns:mmi="https://www.w3.org/2008/04/mmi-arch" version="1.0">
    <mmi:CheckUpdateRequest 
        mmi:Source="URIForMC"
        mmi:Target="URIForRM" 
        mmi:RequestID="request-1"
        mmi:State="LOADING"
        mmi:UpdateType="HANDSHAKE"
        mmi:AutomaticUpdate="true"> 
       <mmi:metadata src="URIForMetadata" info="{medium:{acoustic}, modality:{acoustic:SPEECH}}" />
       <mmi:Timeout sleep="0" validity="500" interval="500"/>
    </mmi:CheckUpdateRequest>
</mmi:mmi>
CheckUpdateResponse (from RM to MC)
<mmi:mmi xmlns:mmi="https://www.w3.org/2008/04/mmi-arch" version="1.0">
    <mmi:CheckUpdateResponse 
        mmi:Source="URIForRM"
        mmi:Target="URIForMC" 
        mmi:RequestID="request-1"
        mmi:State="REGISTERED"
        mmi:UpdateType="HANDSHAKE"
        mmi:AutomaticUpdate="true"
        mmi:data="MCRegistrationID Data">
       <mmi:Timeout sleep="1000" validity="360000" interval="500"/>
    </mmi:CheckUpdateResponse>
</mmi:mmi>
 

6.3 UpdateNotification

This section is Normative.

The UpdateNotification event informs other system components (periodically or not) about the changes on the state of a Component. If automatic updates are enabled, the Component may send multiple UpdateNotification messages after a single CheckUpdateRequest message. It shares the Context, Source, Target, RequestID and Data fields with MMI Life Cycle Events. An UpdateNotification event MUST include Source, Target, and RequestID. It MAY include a Data field. It MAY also include a Context field, if the notification pertains to a specific context.

In addition, an UpdateNotification MUST include the additional fields UpdateType, State, and Timeout. The UpdateNotification event can be sent from either the Modality Component to the Resources Manager or from the Resources Manager to the Modality Components.

6.3.1 UpdateType

An attribute that MUST indicate the type of check to be performed. Some values can be: Reporting, in the case of an important change to the Modality Component that needs to be reported to the Resources Manager, like a noise situation in some audio capture, for example. An update notification can also be triggered when the Modality Component uses or produces new data: in this case the UpdateType can be DataUpdate. Finally a Modality Component can need to inform other components about some user interface changes, for example when the load of some data is finished and this affects the user interface display. In this case the UpdateType will be InterfaceUpdate

6.3.2 State

An attribute that MUST indicate the state of the requesting component and its value. These values correspond to the values supported by the CheckUpdate event: Alive, Loading, Registering, Available, Idle, Busy Waiting, Processing, Unavailable, Unregistered.

6.2.3 Timeout

An element used to indicate the pace of the notification process when automatic updates are enabled. This element MUST include three attributes. It MUST include a sleep attribute to define the "communication sleep period", a validity attribute to represent the "communication validity period" in milliseconds and an interval attribute to express the "communication interval" in milliseconds.

6.3.4 Example

UpdateNotification (from MC to RM)
<mmi:mmi xmlns:mmi="https://www.w3.org/2008/04/mmi-arch" version="1.0">
    <mmi:UpdateNotification 
        mmi:Source="URIForMC"
        mmi:Target="URIForRM" 
        mmi:RequestID="request-1"
        mmi:UpdateType="REPORTING"
        mmi:State="BUSY WAITING">
        <mmi:Timeout sleep="1000" validity="360000" interval="200"/>
    </mmi:UpdateNotification>
</mmi:mmi>


6.4 Requirements Addressed by the discovery events

Requirements
Distribution For notification of failures, progress or delays in distributed processing [MMI-A14] the UpdateNotification ensures periodical requests informing other components if any change occurs in the Modality Component's state. This can support, for example, grammar updates or image recognition updates for a subset of differential data (the general recognized image is the same but one little part of the image has changed, i. e. the face is the same but there is a smile)
On the other hand, if a Modality Component is waiting for some processing provided by other distributed component, the checkUpdate Event allows the recovery of progress information and the fine tuning of requests by changing the timeout attribute. This enhances input/output synchronization in distributed environments [MMI-A13].
Advertisement The use of the timeout attribute helps in the management of the validity of the advertised data. If a Modality Component communication is out-of-date, the system can infer that the data has the risk of being inaccurate or invalid.
Discovery The UpdateNotification and the checkUpdate Event support mediated and passive discovery of Modality Components, by allowing servers or devices to announce their capabilities at bootstrapping and notify or check periodically availability and session state changes [MMI-C1] .
Registration The UpdateNotification and the checkUpdate Event tuned by a timeout mechanism for pull requests allow the dynamic registration and update of the information about the capabilities of the Modality Component [MMI-G2] or the user preferences [MMI-G13] and profile [MMI-G14] collected on the device.
Querying The checkUpdate Event allows the recovery of a small subset of the information provided by the interaction manager or the data component, to maintain up to date the data in the Modality Components as in the Data Component.

7. A vocabulary for the annotation of Modality Components

This proposal is designed to support the annotation of Modality Components, to allow their discovery and registering in a multimodal system. The focus is the dynamic discovery of Modality Component as services using generic information about the underlying properties and types of processes. This information is provided by an announcement and a description (a capabilities manifest, for example) advertised in some network. In this document we will illustrate this point with an example of a multimodal greeting service in a smart environment.

The Modality Components can be described with a document that evolves on complexity depending on the application needs. This description can be limited to indications about the Input and Output interfaces or be more detailed describing functional and non-functional properties, inspired by some of the Extensible Multimodal Annotation Markup Language (EMMA) properties [W3C-EMMA 2009] like emma:function, emma:media-type, emma:medium and emma:mode.

The meaning of the terms for a controlled vocabulary in the form of a Glossary for the annotation of Modality Components, is divided in two parts (Figure 2): Subsumption terms and behavior terms.

Basic Vocabulary for MC annotation

Figure 8: Basic Vocabulary for MC annotation

7.1 Subsumption Description Attributes

Subsumption concerns the attributes classifiying the Modality Components. It is structured with metadata classifying the Modality Component according to its membership or association with a Multimodal class in conformance to the modes handled by the System. This first description allows discovery filtering for a precise target mode. There are four properties:

The functions are the technical entities supporting a limited number of modalities according to the semantics of the message and the capabilities of the support itself. A Modality Component acts as a complex set of functions. Each function uses one or more modalities that realizes some mode. For example, in Figure 2 the Avatar uses a 3D mesh modality through a visual mode. The functions term defines a list of functions using in the service, ordered by importance and by mode. For example, a gesture recognizer service uses the sign language function, using the single hand gesture modality that is executed in the haptic mode and is perceived in the visual mode.

7.2 Behavior Description Attributes

Finally, the operations is the IOPE list of the Modality Component Capabilities. IOPE means Inputs, Outputs, Preconditions and Effects of a service[YU-2007] [OWL-S]

7.3 Multimodal description example

In Figure 2 the "Face Synthesizer Service" acts in some mode that is perceived by a final user through a modality that is part of some functions, i.e. a face synthesis service acts in the visual mode that is perceived through a 3D mesh modality that is part of an avatar function.

Basic Vocabulary for MC annotation

Figure 9: Mode, Modalities, Functions

Thus, for the "Face Syntesizer" service illustrated in Figure 2 the Modality Component's description (description.js document) shows an operation description. It could be a list of other expressions but we propose the smile operation as an example:

{
	"name": "VRML_FACE_SYNTHESIZER", 
	"affiliation": "ANIMATED_3D_RENDERER", 
	"version": "1.0",
	"endpoints": { 
	     "1.0" : { "description":"http://localhost:5000/vrml_face_synthesizer/1-0/description.js", 
        "uri": "http://localhost:5000/vrml_face_synthesizer/1-0/" } },

	"modalities":{
		"visual":["REALTIME_SINTHESIZER"] }
	},
	"functions":{
		"visual":["VR_GRAPHICS"]
	},
	"operations": {
		"smile": {
		    "method":"POST",
			"endpoint":"http://localhost:5000/vrml_face_synthesizer/1-0",
			"documentation": "Operation to change the expression to a smiling face. ",
			"metadata": {"emotion":"emotionML_uri","behavior":"behaviorML_uri"},
			"input": {
				"key": {
				   "position": 1,
					"metadata": { "Content-Type":{ "cognitive":["text/plain"] } },
					"documentation": "The user key to acces this API"
				},

				"event":  {
				   "position": 0,
					"metadata": { "Content-Type":{ "cognitive":["ExtensionNotification","StartRequest"] } },
					"documentation": "If the event type is extension, the service returns just true or fail 
                                     (for a steady smile, for example). If the event type is start request
                                    (for a time-controlled smile), the service can receive the starting 
                                    time and returns the acceleration info."
					"data": { 
                         "metadata": { "Content-Type":{ "cognitive":["data/integer","data/time"] } },
				          "documentation": "If the event's data is a notification, the event will include 
                               the easing integer value for the acceleration. If the event is a StartRequest
                               the event can also include the start time in milliseconds for 
                               the smile process."
				     } 
				} 
				
			},

			"output": {
				"event":  {
				   	"position": 0,
					"metadata": { "Content-Type":{ "cognitive":["StartResponse"] } },
					"documentation": "The type of response event.",
					"data": { 
                         "metadata": { "Content-Type":{ "cognitive":["data/integer" } },
				          "documentation": "In the case of a startRequest, a confirmation of the 
                                           starting time of the animation." }
				 }
			},

			"preconditions": {"documentation": "No precondition is needed other 
                                               than the loading of the face visual data."},

			"effects": {"documentation": "Asynchronous modality. It will not block the rest 
                                         of the application rendering."}
		}

	}
}

Code 1: MC Annotation example

 

7.4 Multimodal service query examples

This description can be parsed before the execution of the service, in a discovery process. To call the service and to execute a smile operation, le service query with a POST method must be structures as follows:

POST /vrml_face_synthesizer/1-0 HTTP/1.1
Host: localhost:5000
Content-Type: text/xml
<?xml version="1.0"?> <smile>
<input>
<event>
<mmi xmlns="https://www.w3.org/2008/04/mmi-arch" version="1.0">
<mmi:startRequest source="IM_1" target="smile" context="c_1" requestID="r_1">
<mmi:data>
<ease value="0.5"/>
<starting_time value="300"/>
</mmi:data>
</mmi:startRequest>
</mmi:mmi>
</event>
</input> </smile>

Code 2: Post request to the multimodal service

The smile tag represents the operation that has been requested, the input tag express that this is a request, the event tag contains the MMI Lifecycle event to control the operation. There can be multiple MMI events inside the input and output elements to support concurrential or parallel commands to the interface. The MMI Lifecycle event sent to the operation provided by the Modality Component can be any of the events defined to handle inputs on the MMI specification.

The POST response of the service will be:

<?xml version="1.0"?>
<smile>
<output> <event>
<mmi xmlns="https://www.w3.org/2008/04/mmi-arch" version="1.0"> <mmi:startResponse source="smile" target="IM_1" context="c_1" requestID="r_1" status="success" /> </mmi:mmi>
</event>
</output> </smile>

Code 3: Post response from the multimodal service

The possible GET request to the REST endpoint for the same service could be:

GET /vrml_face_synthesizer/1-0 HTTP/1.1
Host: localhost:5000 /IM_1/c_1/event/startRequest/r_1/smile?data[ease]=0.5&data[starting_time]=300

Code 4: Possible REST request for the multimodal service following the MMI architecture semantics

The possible Json response to the REST request:

{ "output": {
"event": [{
"mmi": "startResponse",
"context": "c_1", "source": "smile", "target": "IM_1",
"requestID": "r_1",
"status": "success",
"data": {}
}]
}
}

Code 5: Possible REST request for the multimodal service following the MMI architecture semantics

8. Open Issues

Security techniques are separated from the current communication protocol in the architecture as in this document: we assume that this is a private network. Security issues for this protocol in public networks will be addressed later.

Also, this document is focusing on the flow of messages and the building blocks needed to support this flow. The details of the communication between the Interaction Manager and the State Manager, as the interfaces between the Data Component and the State Manager will be described later.

Another open issue is the management of multiple instances of the Interaction Manager and the flow of messages between them, the Resources Manager and multiple Modality Components.

Finally, a common vocabulary for the description of the Modality Component's attributes in order to Register and Compose them, is an important subject to be treated in order to allow a better interoperability between multimodal systems. Vocabulary and Capabilities will be addressed in a subsequent document.

9. Acknowledgments

The authors wish to acknowledge the contributions by all the members of the Multimodal Interaction Working Group.

Finally, the authors would also like to acknowledge the people outside of the MMI Working Group who help with the process of developing this document, specially Jean-Claude Moissinac and Isabelle Demeure.

A. References

A.1 Key References

[MMI-ARCH]
Jim Barnett (Ed). Multimodal Architecture and Interfaces. 25 October 2012 W3C Recommendation. URL: https://www.w3.org/TR/mmi-arch/
[MMI-REQ]
Stéphane H. Maes and Vijay Saraswat (Eds). Multimodal Interaction Requirements 8 January 2003. W3C Working Group Note. URL: https://www.w3.org/TR/mmi-reqs/
[DIS-USE]
B. Helena Rodríguez (Ed). Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements 15 July 2012. W3C Working Group Note. URL: https://www.w3.org/TR/mmi-discovery/

A.2 Other References

[WEB-INTENTS]
Paul Kinlan. Web Intents. Available at URL: http://webintents.org
[FOLEY-1984]
Foley,J.D., Wallace,V.L. and Chan, P. The Human Factors of computer Graphics interaction techniques In: IEEE computer Graphics and Applications, 1984. Vol. 4. No.11, pp. 13-48. Available at URL: http://ieeexplore.ieee.org
[FOSTER-2002]
Foster,I et al. The Open Grid Services Architecture. Available at URL: http://www.ogf.org
[BURBECK-1987]
Steve Burbeck Applications Programming in Smalltalk-80(TM): How to use Model-View-Controller (MVC). Available at URL: http://www.dgp.toronto.edu
[BOZDAG-2007]
Engin Bozdag, Ali Mesbah and Arie van Deursen, 2007 A Comparison of Push and Pull Techniques for AJAX In: 2007 9th IEEE International Workshop on Web Site Evolution (WSE), 2007. pp. 15-22. Available at URL: http://dx.doi.org/
[OWL-S]
Recommendation. URL: https://www.w3.org/Submission/OWL-S/.
[ADJIE-1999]
ADJIE-WINOTO, W., SCHWARTZ,E., BALAKRISHNAN,H. and LILLEY,J. "The Design and Implementation of an Intentional Naming System". Proc. 17th ACM SOSP, Kiawah Island, SC. Dec. 1999.
[YU-2007]
Yu, Liyang. Introduction to the Semantic Web and Semantic Web Services