This document details the responses made by the Multimodal Interaction Working
Group to issues raised during the first
Last Call
(beginning 29 September 2005 and ending 28 October 2005)
and the second Last Call (beginning 9 April, 2007 and ending 30 April, 2007).
Comments were provided by
other W3C Working Groups and the public via the
www-multimodal-request@w3.org
(archive)
mailing list.
This document of the W3C's Multimodal Interaction Working Group describes the
disposition of comments as of 27 September, 2007 on
the first and second Last Call Working Drafts of Extensible Multimodal Annotation (EMMA) Version 1.0.
It may be updated, replaced or rendered obsolete by other W3C documents at any time.
This document describes the disposition of comments in relation to
Extensible Multimodal Annotation (EMMA) Version 1.0
(http://www.w3.org/TR/emma/).
The goal is to allow the readers to understand the background
behind the modifications made to the specification. In the meantime it provides an useful check point for the people who submitted
comments to evaluate the resolutions applied by the W3C's Multimodal Interaction Working Group.
In this document each issue is described
by the name of the commentator, a description of the
issue, and either the resolution or the reason that the issue was not
resolved. For some of the issues the status is "Waiting Response", because the Working Group didn't received a formal
acceptance/denial or the acceptance was pending to the resolution applied.
This document provides the analysis of the issues that were
submitted and resolved as part of the Last Call Review periods.
Item | Commentator | Nature | Disposition |
---|
WAI-PF-1 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-2.1 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-2.2 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-3 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-4 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF5 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-6 | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-7a | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WA-PF-7b | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF-7c | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF7d | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
WAI-PF7e | Al Gilman (2005-12-14) | Clarification / Typo / Editorial | Accepted |
i18N-1 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-2 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-3 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-4 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-5 | Felix Sasaki (2005-10-26) | Clarification / Typo / Editorial | Accepted |
i18N-6 | Felix Sasaki (2005-10-26) | Clarification / Typo / Editorial | Withdrawn |
i18N-7 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-8 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-9 | Felix Sasaki (2005-10-26) | Feature Request | Accepted |
i18N-10 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-11 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-12 | Felix Sasaki (2005-10-26) | Technical Error | Accepted |
i18N-13 | Felix Sasaki (2005-10-26) | Feature Request | Accepted |
i18N-2-1 | Richard Ishida (2007-05-02) | Feature Request | Accepted |
i18N-2-2 | Richard Ishida (2007-05-02) | Change to Existing Feature | Accepted |
i18N-2-3 | Richard Ishida (2007-05-02) | Change to Existing Feature | Accepted |
i18N-2-4 | Richard Ishida (2007-05-02) | Feature Request | Accepted |
i18N-2-5 | Richard Ishida (2007-05-02) | Change to Existing Feature | Accepted |
i18N-2-6 | Richard Ishida (2007-05-02) | Technical Error | Accepted |
i18N-2-7 | Richard Ishida (2007-05-02) | Technical Error | Accepted |
SW-1 | Jin Liu (2006-10-2) | Feature Request | Accepted |
SW-2 | Jin Liu (2006-10-2) | Feature Request | Accepted |
SW-3 | Jin Liu (2006-10-2) | Feature Request | Accepted |
SW-4 | Jin Liu (2006-10-2) | Feature Request | Accepted |
VB-A1 | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
VB-A1.1 | Paolo Baggia (2006-04-03) | Change to Existing Feature | Accepted |
VB-A1.2 | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
VB-A2 | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
VB-A3 | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
VB-A4 | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
VB-A5 | Paolo Baggia (2006-04-03) | Change to Existing Feature | Accepted |
VB-B | Paolo Baggia (2006-04-03) | Clarification / Typo / Editorial | Accepted |
Public-01 | Paolo Martini (2007-04-25) | Change to Existing Feature | Accepted |
ITS-01 | Christian Lieske (2007-05-03) | Feature Request | Accepted |
ITS-02 | Christian Lieske (2007-05-03) | Feature Request | Accepted |
Issue WAI-PF-1
From Al Gilman (2005-12-14):
1. We are concerned that in an approach that focuses on input and output
modalities that are "widely used today" Assistive Technology devices might
be left out in practice. Although theoretically it seems to be possible to
apply EMMA to all types of input and output devices (modalities), including
Assistive Technology, the important question is "Who is going to write the
device-specific code for Assistive Technology devices?"
If this is outside the scope of EMMA, please let us know who we should
address with this question.
Resolution: Rejected
We share the concern of the WAI group as to whether the
introduction of new protocols such as EMMA could adversely
impact assistive technology, and the EMMA subgroup have
discussed this in some detail in response to your feedback.
EMMA is a markup for the representation and annotation of
user inputs and is intended to enable support for modalities
beyond keyboard and mouse such as speech and pen. As such
EMMA can play an important role in
enabling the representation of user inputs from
assistive technology devices. The EMMA group would greatly
welcome your feedback on classifications on different kinds of
assistive devices that could be used as values of emma:mode.
The broader issue concerns providing support for
assistive technologies while the minimizing the burden on
application developers building multimodal applications.
We see three ways in which assistive devices
may operate with multimodal applications:
1. The application developer building the interaction
manager (IM) for the multimodal application builds it
specifically with support for particular assistive devices.
The IM might for example use different timeouts or
break up the dialog differently depending on the kind of
assistive device in use. In this case the assistive technology
will produce EMMA representation of the user input,
annotated to indicate the kind of device it is from, and
IM will have specific dialog/interaction logic for that device.
2. The application developer does not directly provide support
for the assistive devices but the developer of the
assistive technology provides EMMA as a representation of
the input on the assistive device. For example, for an
application with speech input, the assistive technology would
generate EMMA for the assistive device that looks like a
sequence of words from speech recognition.
3. The third case is more like what we believe is
prevalent today and likely (unfortunately) to remain the case for
most devices where the assistive technology, generally
at an operating system level, serves as an emulator of the
keyboard and/or mouse. In this case, the only way to ensure that
multimodal applications also support assistive devices
is to establish best practices for multimodal application
design. One principle would be that in any case
where the interaction manager expects a verbal input, be it
from speech or handwriting recognition it will also
accept input from the keyboard. Another would be that if
commands can be issued in one mode e.g. gui they can
also be issue in the other e.g. speech (symmetry
among the modes).
Since EMMA does not provide an authoring language for
interaction management or authoring of applications this
lies outwith the scope of the EMMA specification itself.
Within the MMI group this relates most closely to the
multimodal architecture work and work on interaction management.
The EMMA subgroup are starting to compile a list of
best practices for authoring applications that consume
EMMA but see this as better suited to a separate best
practices Note rather than as part of the EMMA specification.
Issue WAI-PF-2.1
From Al Gilman (2005-12-14):
System and Environment
Composite input should provide environmental information. Since input
is used to define a response, the system response should take into
account environmental conditions that should be captured at input
time. Here are some examples:
Signal to Noise Ratio (SNR)
Lighting conditions
Power changes (may throw out input or prompt user to re-enter information)
In the case of a low SNR you might want to change the volume, pitch,
or if the system provides it - captioning. Sustained SNR issues may
result in noise cancellation to improve voice recognition. This
should be included with EMMA structural elements. Some of these
issues could be reflected in confidence but the confidence factor
provides no information as to why the confidence level is low and how
to adapt the system.
Resolution: Rejected
System and environment issues were initially addressed within the
MMI working group and includes the kinds of information described
above along with other factors such as the location of the device.
That work is now called DCCI (Delivery Context Interfaces) and
is has moved to the Ubiquitous Web Applications working group:
http://www.w3.org/TR/2005/WD-DPF-20051111/
In the Multimodal architecture work within the MMI group,
DCI (previously DPF) is accessed directly from the interaction
manager, rather than through the annotation of EMMA inputs.
http://www.w3.org/TR/mmi-arch/
We believe it is important for system and environment
information to be accessed directly through DCI from the IM
because the interaction should be able to adjust whether the
user provides an input or not (EMMA is only going to
arrive to the IM when the user makes an input).
For example, if the interaction manager will adapt and use
visual prompts rather than audio when the SNR is beneath
a threshold. This adaption should occur regardless of whether
the user has produced a spoken input or not.
One possible reason for attachment of DCCI information
to EMMA documents would be for logging what the conditions
were when a particular input was received. For this case, the
emma:info element can be used as a container for an xml
serialization of system and environment information accessed
through the DCCI.
Issue WAI-PF-2.2
From Al Gilman (2005-12-14):
User factors
How does the Emma group plan to address user capabilities. ... At
the Emma input level or somewhere else in the system? Example: I may
have a hearing impairment changing the situation for me over another
person. If multiple people are accessing a system it may be important
to address the user and their specific capabilities for adaptive
response.
Resolution: Rejected
Along with system and environment factors, and device factors,
user preferences for e.g. choice of mode, volume level etc
are intended to be accessed using the DCI:
http://www.w3.org/TR/2005/WD-DPF-20051111/
The preferences for a specific user should be queried based
on the user's id from the DCCI and then those preferences used
by the interaction manager to adapt the interaction.
The EMMA group discussed the possibility of having an
explicit user-id annotation and EMMA and concluded that this
information is frequently provided explicitly by the user as
an input and therefore is application data and so should
not be standardized in EMMA. Typically user ids will come from entering
a value in a form and this will be submitted as a user input.
This will either be done directly from XHTML or perhaps in
some cases enclosed in an EMMA message
(e.g. if the user id is specified by voice).
The id may also come from a cookie, or be determined
based on the user's phone number of other more detailed
info from a mobile provider. In all of these cases,
the user id (and other information such as authentication) is
not an annotation of a user input.
A user id may be transmitted as the payload of a piece of
EMMA markup, as application data inside emma:interpretation
but will not be encoded as an emma annotation.
Again for logging purposes, the user id or information
describing the user could be stored within emma:info
Issue WAI-PF-3
From Al Gilman (2005-12-14):
Settling time
How does this technology address settling time and multiple keys being hit.
People with mobility impairments may push more than one key,
inadvertently hit specific keys, or experience tremors whereby it
needs to be smoothed. This may or may not effect confidence factors
but again the "why" question comes up. This information may need to
be processed in the drivers.
Resolution: Rejected
The issue appears to be at a different level from EMMA. In many
cases this will be a matter of the driver used for the keyboard
input device. In the case where keyboard input
is used to fill a field in a form, and then it is sent when the
user hits return or a SEND/GO button then any editing or
correction takes place before the input is sent and the
interaction manager would only see the final string.
If there is a more complex direct interface from the
keystrokes to the interaction manager (each keystroke
being sent individually) then details regarding the
nature of the keyboard input could be encoded in the
application semantics.
Issue WAI-PF-4
From Al Gilman (2005-12-14):
Directional information
Should we have an emma:directional information? Examples are right,
left, up, down, end, top, north, south, east, west, next, previous.
These could be used to navigate a menu with arrow keys, voice reco,
etc. They could be used to navigate a map also. This addresses device
independence. This helps with intent-based events.
We should include into and out of to address navigation up and down
the hierarchy of a document as in DAISY. The device used to generate
this information should be irrelevant. Start, Stop, reduce speed, may
also be an addition. These higher levels of navigation may be used to
control a media player independent of the device.
Resolution: Rejected
Specific intents such as up down left right etc are part of the
application semantics and so are not standardized as part of EMMA.
EMMA provides containers for the representation of intents and a way to
specify various kinds of annotations on those intents but it is
outwith the scope of EMMA to standardize the semantic representation of
user intents.
Issue WAI-PF5
From Al Gilman (2005-12-14):
Zoom: What about Zoom out?
Resolution: Accepted
In order to clarify the example we will change the
speech from 'zoom' to 'zoom in'. Zoom out is of course
another possible command but this is intended here as
an example rather than an exhaustive presentation of
map manipulation commands.
Issue WAI-PF-6
From Al Gilman (2005-12-14):
Device independence and keyboard equivalents
For the laptop/desktop class of client devices, there has been a "safe
haven" input channel provided by the keyboard interface. Users who
cannot control other input methods have assistive technologies that
at least emulate the keyboard, and so full command of applications is
required from the keyboard. Compare with Checkpoints 1.1 and 1.2 of
the User Agent Accessibility Guidelines 1.0 [UAAG10].
[UAAG10]
http://www.w3.org/TR/UAAG10-TECHS/guidelines.html#gl-device-independence
How does this MMI Framework support having the User Agent supply the
user with alternate input bindings for un-supported modalities
expected by the application?
How will applications developed in this MMI Framework (EMMA
applications) meet the "full functionality from keyboard"
requirement, or what equivalent facilitation is supported?
Resolution: Rejected
The general principle of allowing people to interact
more flexibly depending on needs and device capabilities,
is part of the broader work in the MMI group on multimodal
architecture and interfaces. EMMA is at a different level.
EMMA provides a standardized markup for containing and
annotating interpretations of particular user inputs.
It does not standardize the authoring of the logic of the
application. At the architecture level this is likely to
be a matter of specifying best practices for multimodal
application authoring. There is a need for best practices
as different levels. On one level there should be best practices
for the design of multimodal applications so that they can
support a broad range of modalities and tailor the
interaction (timeouts etc) on the basis of annotations
(e.g medium, mode) and information from the DCI.
At another, more pragmatic, level of best practices
multimodal applications should be designed so that in
addition to support new modalities such as speech they
also support keyboard and mouse so that assistive devices
which emulate keyboard and/or mouse input can be used to
interact with these applications. One principle
would be that verbal inputs such as speech and handwriting have
'alternate bindings' to keyboard input fields.
Another would be that if an application supports pointing
using a device such as a pen or touchscreen
any mechanism supporting pointing
(e.g. pen, touchscreen, trackball) should also support
mouse input.
Issue WAI-PF-7a
From Al Gilman (2005-12-14):
Use cases
To make things more concrete, we have compiled the following use cases
to be investigated by the MMI group as Assistive Technology use cases which
might bear requirements beyond the typical mainstream use cases. We are
willing to discuss these with you in more detail with the goal of coming to
a joint conclusion about their feasibility in EMMA.
(a) Input by switch. The user is using an on-screen keyboard and inputs
each character by scanning over the rows and columns of the keys and hitting
the switch for row and column selection. This takes significantly more time
than the average user would take to type in the characters. Would this
switch-based input be treated like any keyboard input (keyboard emulation)?
If yes, could the author impose time constraints that would be a barrier to
the switch user? Or, alternatively, would this use case require
device-specific (switch-specific) code?
Resolution: Rejected
Imposing time constraints is not something that is done by EMMA
rather it is a matter of interaction management. In this particular
case we think such constraints are unlikely, general fields for
keyboard input do not 'time out'. If a switch was being used to
generate substitute speech input then there could be a problem with
timeouts (in fact probably a problem for almost any keyboard input).
Again this maybe a matter of best practices and the best practice
should be that when speech input is supported, keyboard input should
also be supported, and for the keyboard input there should be no timeout.
Issue WA-PF-7b
From Al Gilman (2005-12-14):
Word prediction. Is there a way for word prediction programs to
communicate with the interaction manager (or other pertinent components of
the framework) in order to find out about what input is expected from the
user? For example, could a grammar that is used for parsing be passed on to
a word prediction program in the front end?
Resolution: Rejected
Again this certainly lies outside the scope of EMMA, since EMMA
does not define grammar formats or interaction management. The W3C
SRGS grammar specification, from the Voice Browser working group
could potentially be used by a word prediction system.
Issue WAI-PF-7c
From Al Gilman (2005-12-14):
User overwrites default output parameters. For example, voice output
could be described in an application with EMMA and SSML. Can the user
overwrite (slow down or speed up) the speech rate of the speech output?
Resolution: Rejected
EMMA is solely used for the representation of user inputs and so
does not address voice output. Within the MMI framework the way to achieve
this would be to specify the user preference for speech output rate
in the DCI and have the interaction manager query the DCI in order to
determine the speech rate. The voice modality component is then responsible
for honoring users' preferences regarding speech including dynamic changes.
The working group responsible for this component is the Voice Browser
working
group and requirements for this mechanism should be raised there.
Issue WAI-PF7d
From Al Gilman (2005-12-14):
WordAloud (http://www.wordaloud.co.uk/). This is a program that
displays text a word at a time, in big letters on the screen, additionally
with speech output. How could this special output modality be accommodated
with EMMA?
Resolution: Rejected
EMMA is solely used for the representation and annotation of user inputs
and does not address output. At a later stage the EMMA group maybe address
output but at this time the language is solely for input.
Issue WAI-PF7e
From Al Gilman (2005-12-14):
Aspire Reader (http://www.aequustechnologies.com/), This is a
daisy reader and browser that also supports speech output, word
highlighting, enhanced navigations, extra text and auditory
descriptions that explain the page outline and content as you go,
alterative renderings such as following through key points of content
and game control type navigation. Alternative texts are for the
struggling student (for example a new immigrant)
Resolution: Rejected
EMMA is solely used for the representation and annotation of user inputs
and does not address output. At a later stage the EMMA group maybe address
output but at this time the language is solely for input.
Issue i18N-5
From Felix Sasaki (2005-10-26):
On terminology: Please reference standards like XForms RELAX-NG, SIP, TCP,
SOAP, HTTP, SMTP, MRCP etc. if you mention them.
Resolution: Accepted
Agreed, although in most cases these would be informative
references rather than normative ones.
Issue i18N-6
From Felix Sasaki (2005-10-26):
Section 2.2 Your list of data models is a little bit confusing. A proposal: List the
DOM, the infoset and the XPath 2.0 data model.
Resolution: Rejected
Section 2.2 is about the use of constraints on the structure and
content of EMMA documents. Your comment seems to be more related
to the data model exposed to EMMA processors.
Issue VB-A1
From Paolo Baggia (2006-04-03):
EMMA profile
Describe in EMMA spec a VoiceXML 2.0/2.1 profile, either in an Appendix
or in a Section of the specification.
This profile should describe the mandatory annotations to allow a
complete integration in a VoiceXML 2.0/2.1 compliant browser.
The VoiceXML 2.0/2.1 requires four annotations related to an input. They
are described normatively in http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1 as shadow variables related to a form
input item.
The same values are also accessible from the application.lastresult$
variable, see http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1
The annotations are the following:
- name$.utterance
which might be conveyed by "emma:token" attribute (http://www.w3.org/TR/emma/#s4.2.1)
- name$.confidence
which might be conveyed by "emma:confidence" attribute (http://www.w3.org/TR/emma/#s4.2.8)
The range of values seem to be fine: 0.0 - 1.0, but
some checks could be made in the schema of both the specs.
- name$.inputmode
which might be conveyed by "emma:mode" attribute (http://www.w3.org/TR/emma/#s4.2.11)
Proposal 1.1 for a discussion of its values
- name$.interpretation, is an ECMA script value containing
the semantic result which has to be derived by the
content of "emma:interpretation"
As regards the N-best results, see http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml5.1.5for details, the one-of element
should be suitable to convey them to the voice Browser.
Resolution: Rejected
The Multimodal working group sees significant benefit in the creation
of an EMMA profile for VoiceXML 2.0/2.1. However, the group
rejects the request to have this work within the EMMA specification
itself. The request might best be resolved by a W3C Note on these
issues, or maybe more broadly on the whole chain that connects a VoiceXML page,
to SRGS+SISR grammars and then EMMA to return speech/dtmf results
to VoiceXML. We suggest that this document should be edited by VBWG
with some support from MMIWG.
Issue VB-A1.2
From Paolo Baggia (2006-04-03):
Optional/mandatory
The profile should clarify which is mandatory and which is optional. For
instance N-best are an optional feature for VoiceXML 2.0/2.1, while the
other annotations are mandatory.
Resolution: Accepted (w/modifications)
With respect to the specification of what is
optional and mandatory for the profile that information
should be part of the EMMA VXML profile we propose
should be edited within the VBWG (See VB.A1).
As regards the option/mandatory status of EMMA
features separate from any specific profile we
had reviewed them in detail for the whole EMMA
specification and this will be reflected in the
next draft.
Issue VB-A2
From Paolo Baggia (2006-04-03):
Consider 'noinput' and 'nomatch'
Besides user input from a successful recognition, there are several
other types of results that VoiceXML applications deal with that should
be part of a VoiceXML profile for EMMA as well as the ones suggested in
Proposal 1.
'noinput' and 'nomatch' situations are mandatory for VoiceXML 2.0/2.1.
Since EMMA can also represent these, the EMMA annotations for 'noinput'
and 'nomatch' should be part of the VoiceXML EMMA profile.
Note that in VoiceXML 'nomatch' may carry recognition results as
described in Proposal 1 to be inserted in the application.lastresult$
variable only.
Resolution: Deferred
These comments are extremely useful for future versions of EMMA but
go beyond the goal and requirements of the current specification.
Issue VB-A3
From Paolo Baggia (2006-04-03):
DTMF/speech
It is very important that EMMA will be usable for either speech or DTMF
input results, because VoiceXML2.0/2.1 allows both these inputmode
values. We expect that the VoiceXML profile in EMMA will make this clear
to enforce a complete usage of EMMA for Voice Browser applications.
Resolution: Deferred
These comments are extremely useful for future versions of EMMA but
o beyond the goal and requirements of the current specification.
Issue VB-A4
From Paolo Baggia (2006-04-03):
Record results
EMMA can represent the results of a record operation, see the
description of the record element of VoiceXML http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.6, so the EMMA
annotations for recordings should also be part of a VoiceXML profile.
This feature is optional in VoiceXML 2.0/2.1.
Resolution: Deferred
These comments are extremely useful for future versions of EMMA but
go beyond the goal and requirements of the current specification.
Issue VB-B
From Paolo Baggia (2006-04-03):
EMMA and the evolution of VoiceXML
For the evolution of VoiceXML the current time is too premature to give
precise feedback, but the clear intention is to take care of an extended
usage of EMMA inside a future VoiceXML application.
This includes, but it is not limited to:
- leave access to the whole EMMA document inside the
application.lastresult variable (both a raw EMMA document
and a processed one, i.e. in ECMA-262 format)
- include proper media-types to allow a clear indication
if the raw results are expressed in EMMA or other
formats (e.g. NLSML). The same for the processed results.
Other possible evolutions will be to have a simple way to pass EMMA
results from VoiceXML to other modules to allow further processing.
A last point is that EMMA should be used to return results of Speaker
Identification Verification (SIV) too. Voice Browser SIV subgroup is
working to create a few examples to circulate them to you to get
feedbacks.
We will highly appreciate your comments on these ideas to better address
this subject in the context of the evolution of Voice Browser standards.
Resolution: Deferred
These comments are extremely useful for future versions of EMMA but
go beyond the goal and requirements of the current specification.
Issue i18N-1
From Felix Sasaki (2005-10-26):
Reference to RFC 1738
-------------------------------------------------------------------------
RFC 1738 is obsoleted by RFC 3986 (URI Generic Syntax). It would be good if
you could refer to RFC 3986 instead of 1738. The best thing would be if
you could add a normative reference to RFC 3987
(Internationalized Resource Identifiers (IRIs).
Resolution: Accepted
Agreed, document has been revised as suggested.
Issue i18N-2
From Felix Sasaki (2005-10-26):
General: Reference to RFC1766
--------------------------------------------------
RFC 1766 is obsoleted by 3066 (Tags for the Identification of Languages).
What is essential here is the reference to a BCP (best common practice),
which is for language identification BCP 47. Currently bcp 47 is represented by RFC
3066, so could you change the reference to "IETF BCP 47, currently represented by
RFC 3066"?
Resolution: Accepted
Agreed, document has been revised as suggested.
Issue i18N-3
From Felix Sasaki (2005-10-26):
General and sec. 2.4.1: References to XML and XMLNS
-------------------------------------------------------------------------------------
As for XML, you reference version 1.0. As for XMLNS, you reference version
1.1. Is there a reason for the mismatch of the versions?
Resolution: Accepted
Thank you for pointing this out. We have updated the specification
to reference XML 1.1 and XMLNS 1.1.
Issue i18N-4
From Felix Sasaki (2005-10-26):
Sec. 1.2, definition of "URI: Uniform Resource Identifier"\
------------------------------------------------------------------------------------
Here you refer to XML Schema for URIs. It would be good if you could also
refer to the underlying RFCs (see comment 1).
Resolution: Accepted
Agreed, document has been revised as suggested.
Issue i18N-7
From Felix Sasaki (2005-10-26):
On terminology: "An EMMA attribute is prefixed ..." should be "An EMMA
attribute is prefixed (qualified) ...". Also: "An EMMA attribute is not
prefixed ..."
should be "An EMMA attribute is not prefixed (unqualified) ..."
Resolution: Accepted
Thanks for pointing this out. We have investigated the use of these
terms in recent specifications and revised the EMMA specification 2.3 as
follows
to clarify the terminology:
"An EMMA attribute is qualified with the EMMA namespace prefix if the attribute
can also be used as an in-line annotation on elements in the application's
namespace.
Most of the EMMA annotation attributes in Section 4.2 are in this category.
An EMMA attribute is not qualified with the EMMA namespace prefix if the
attribute
only appears on an EMMA element. This rule ensures consistent usage of the
attributes across all examples."
Issue i18N-8
From Felix Sasaki (2005-10-26):
Have you thought of using RFC 2119 to indicate requirements levels (e.g.
with "must", "should", "must not" etc.)?
Resolution: Accepted
Agreed, in response we have conducted an extensive
review of the document revising language as needed and adding
in capitalization in accordance with RFC 2119. Also added of a small
paragraph near the beginning of the document indicating this.
Issue i18N-10
From Felix Sasaki (2005-10-26):
Reference to RFC 3023 (MIME media types), e.g. in appendix B.1
-----------------------------------------------------------------------------------------------------
Work is undertaken for a successor of RFC 3023. To be able to take its
changes into account, it would be good if you could change the reference to
RFC 3023 to
"RFC 3023 or its successor." Please have a look at How to Register an
Internet Media
Type for a W3C Specification.
Resolution: Accepted
Agreed, document has been revised as suggested.
Issue i18N-11
From Felix Sasaki (2005-10-26):
Reference to RFC 3023 in appendix B.1, on security considerations
-------------------------------------------------------------------------------------------------------
Please refer to the security considerations mentioned in RFC 3987.
Resolution: Accepted
Agreed, document has been revised as suggested.
Issue i18N-12
From Felix Sasaki (2005-10-26):
It would be good if you could make a clearer difference between normative
and non-normative parts of the specification.
Resolution: Accepted
Agreed, we have reviewed and reorganized the document so that
normative vs informative sections are clearly marked.
Issue i18N-2-6
From Richard Ishida (2007-05-02):
Definition of URI not normative. A definition of URI is given in the Terminology section that defines it in terms of RFC 3986 and XML Schema Part 2:Datatypes, but that section is not normative. We think the definition of URI should be normative
Resolution: Accepted
We will reference RFC 3896 and RFC 3987 where the document first
uses the term "URI" in normative text (section 3.2). We will use the
following text following from the example in XQuery.
"Within this specification, the term URI refers to a Universal Resource
Identifier as defined in [RFC3986] and extended in [RFC3987] with the
new name IRI. The term URI has been retained in preference to IRI to avoid
introducing new names for concepts such as "Base URI" that are defined
or referenced across the whole family of XML specifications."
Issue i18N-2-7
From Richard Ishida (2007-05-02):
IRIs and URIs. [[A URI is a unifying syntax for the expression of names and addresses of objects on the network as used in the World Wide Web (RFC3986). A URI is defined as any legal anyURI primitive as defined in XML Schema Part 2: Datatypes Second Edition Section 3.2.17[SCHEMA2].]]
We are concerned that you are disallowing IRIs here. (Btw, we did propose that you reference RFC 3987 as part of the first comment in a previous review [http://www.w3.org/International/2005/10/emma-review.html], and you agreed to implement that comment, but you seem to have overlooked this aspect.) The XML Schema 1.0 definition of anyURI does not encompass IRIs either (though this will be changed for XMLSchema 1.1).
We suggest that you adopt a definition like that of XQuery. The XQuery definition reads:
"Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC3986] and extended in [RFC3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications."
Resolution: Accepted
[There was no intention to disallow IRI's. We will add the proposed language from the XQuery definition to section 1.2.
Issue i18N-2-2
From Richard Ishida (2007-05-02):
Use of emma:lang. It's not at all clear to us what the difference is between emma:lang and xml:lang, the relationship between them, or when we should use which. (It might help to create examples that show the use of xml:lang as well as emma:lang.) [[In order handle inputs involving multiple languages, such as through code switching, the emma:lang tag MAY contain several language identifiers separated by spaces.]]
This is definitely something you cannot do with xml:lang, but we are wondering what is the value of doing it anyway. We are not sure what benefit it would provide.
Resolution: Accepted
We address each of these two points in turn:
Point 1: ACCEPT Clarification of emma:lang vs xml:lang function
The W3C multimodal working group accept that it is important to make clear the
differences between the xml:lang and emma:lang
attributes and plan to add clarificatory text into the emma:lang
section in the next draft of the EMMA specification. The
xml:lang and emma:lang attributes serve uniquely different and
equally important purposes. The role of xml:lang is to
indicate the language used for content in an XML element or
document. In contrast, the emma:lang attribute is used to
indicate the language employed by a user when entering an
input into a spoken or multimodal dialog system. Critically,
emma:lang annotates the language of the signal originating
from the user rather than the specific tokens used at a
particular stage of processing. This is most clearly
illustrated through consideration of an example involving,
multiple stages of processing of a user input -- the primary
use of EMMA markup. Consider the following scenario:
EMMA is being used to represent three stages in the
processing of a spoken input to an system for ordering
products. The user input is in Italian, after speech
recognition, the user input is first translated into
English, then a natural language understanding system converts
the English translation into a product ID (which is not in any
particular language). Since the input signal is a user
speaking Italian, the emma:lang will be emma:lang="it" on all of
these stages of processing. The xml:lang attribute, in contrast
will initial be "it", after translation the xml:lang will
be "en-US", and after language understanding "zxx", assuming the
use of "zxx" to indicate non-linguistic content.
The following table illustrates the relation between the
content in the EMMA document, the emma:lang and the xml:lang:
------------------------------------------------------------------------
CONTENT: emma:lang xml:lang processing stage
------------------------------------------------------------------------
condizionatore emma:lang="it" xml:lang="it" result from speech recognition
air conditioner emma:lang="it" xml:lang="en" result from machine translation
id1456 emma:lang="it" xml:lang="zxx" result from natural language understanding
The following are examples of EMMA documents corresponding to these
three processing stages. Abbreviated to show the critical attributes for
discussion here.
Note that <transcription>, <translation>, and <understanding> are
application
namespace attributes, not part of the EMMA markup.
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic">
<transcription
xml:lang="it">condizionatore</transcription>
</emma:interpretation>
</emma:emma>
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic">
<translation xml:lang="en">air
conditioner</translation>
</emma:interpretation>
</emma:emma>
<emma:emma>
<emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic">
<understanding
xml:lang="zxx">id1456</understanding>
</emma:interpretation>
</emma:emma>
In order to make clear these differences we will add clarifying text and
examples to the specification.
Point 2: Clarification, multiple values in emma:lang:
-----------------------------------------------------
In call center and other applications multilingual users provide
inputs in which they switch input language in mid utterance. The
emma:lang in these cases needs to indicate that the language
involved more than one language, e.g.
"quisiera hacer una collect call"
The emma:lang in this case would have value "sp en"
<emma:emma>
<emma:interpretation emma:lang="sp en" emma:mode="voice"
emma:medium="acoustic">
<transcription>quisiera hacer una collect
call</transcription>
</emma:interpretation>
</emma:emma>
In order to use xml:lang in this example perhaps an additional element
could be used, e.g. <span>. Would this work?
<emma:emma>
<emma:interpretation emma:lang="sp en" emma:mode="voice"
emma:medium="acoustic">
<transcription xml:lang="sp">quisiera hacer una
<span xml:lang="en">collect call </span></transcription>
</emma:interpretation>
</emma:emma>
Issue i18N-2-3
From Richard Ishida (2007-05-02):
HTTP and HTML meta elements also allow for multiple language tags, but use commas to separate tags, rather than just spaces. It may reduce confusion to follow
the same approach.
Resolution: Rejected
We agree that ',' are used in elements for
separation of multiple values but in
this case we are addressing separation of multiple
values within attribute values. It is common practice in
attributes for values to be space separated.
Furthermore, the value of emma:lang and other attributes in EMMA
which can hold multiple values, such as emma:medium
and emma:mode is of type XSD:NMTokens which is a white space
separated list of XSD:NMToken values.
Issue i18N-2-5
From Richard Ishida (2007-05-02):
typo. [[in order handle]]
-> 'in order to handle' ?
Resolution: Accepted
Comment: We will correct this typo in the next draft of the
specification.
Issue VB-A1.1
From Paolo Baggia (2006-04-03):
Values of emma:mode
Some clarification should be needed to explain how to map the values of
"emma:mode" (http://www.w3.org/TR/emma/#s4.2.11) to the expected values of the "inputmode" variable
(Table 10 in http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1). The voiceXML 2.0/2.1 prescribes two values: "speech"
and "dtmf".
Anther option is to adopt in EMMA the exact values expected by VoiceXML
2.0/2.1 to simplify the mapping. Other related fine grained EMMA
annotation are not possible in VoiceXML 2.0/2.1.
Resolution: Accepted
The MMIWG agree that the values of emma:mode of
specific relevance to VXML should be revised in EMMA.
For the current editor draft, and for the candidate
recommendation we will change the emma:mode values
as follows in Section 4.2.11 and throughout the
document as follows:
- from "dtmf_keypad" to "dtmf"
- from "speech" to "voice"
Issue VB-A5
From Paolo Baggia (2006-04-03):
Add informative examples
We think that some informative examples will improve the description of
the profile. This might include a SISR grammar for DTMF/speech and the
expected EMMA results to be compliant to the profile.
The examples should include both single result returned and N-best
results.
We think that also an alternative example of lattices would be very
interesting, even if in the VoiceXML 2.0/2.1 it will not be
representable, but nonetheless it will be useful for the evolution of
VoiceXML, see point B below.
Resolution: Deferred
These comments are extremely useful for future versions of EMMA but
go beyond the goal and requirements of the current specification.
Issue Public-01
From Paolo Martini (2007-04-25):
Node anchoring on signal time axis. I approached only recently EMMA and I have some problems understanding
the temporal anchoring of an emma:node.
I would instinctively expect a node to correspond to what ISO 8601
calls an "instant", a "point on the time axis".
With reference to paragraph 3.4, if I read correctly the document:
1. An emma:node can be anchored with absolute or relative timestamps.
In the absolute mode, the optional emma:start and emma:end attributes
seem to allow a duration, while in the relative mode, the optional
emma:offset-to-start (with emma:duration not allowed) seems to force an
instant status.
If, conceptually, a node is allowed to correspond to a segment of the
signal, I would welcome a comment on the rationale for that. If not, I
would suggest to replace emma:start and emma:end with a single "time
point"-like attribute or, at least, to forbid emma:end, implicitly
adding ambiguity in the semantics of emma:start.
2. An emma:arc implicitly asserts the existence of two nodes, but I
would say that the temporal attributes of the arcs, if present, define
those nodes. A node could be therefore defined more than once. I
simplify the example in 3.4.2:
<emma:arc from="1" to="2"
emma:start="1087995961542" emma:end="1087995962042">flights </emma:arc>
<emma:arc from="2" to="3"
emma:start="1087995962042" emma:end="1087995962542">to</emma:arc>
Being node 2 the same, what if emma:end in the "flights" arc and
emma:start in the "to" arc do not have the very same value?
Again, if this is conceptually allowed, I would welcome an explanation
of the rationale. Otherwise, I would prefer enforcing a coherent
description directly in the language instead of relying on validity
checks. For example, restricting the "definition" of nodes inside
node:element, i.e. forbidding timestamps in arcs.
I went through the document and the list archive and I wasn't able to
find answers to these doubts. Nevertheless, I apologize if these points
have already been addressed.
Thanks for your help and your work,
Paolo Martini
Resolution: Accepted (w/modifications)
You are correct that emma:node elements are intended to correspond to instants.
Regarding 1., we agree that as it stands the ability to place both
emma:start and emma:end on emma:node appears to allow a duration. This
is an error in the current draft as we did not intend for emma:start and emma:end
to be used on emma:node. In the next draft of the EMMA specification and
the corresponding schema we will remove the emma:start and emma:end
attributes from emma:node.
The primary motivation for the addition of
emma:node was to provide a place for annotations
which apply specifically to nodes rather than to
arcs. For example, in some representations of speech recognition lattices,
confidences or weights are placed on nodes in addition to arcs.
For this reason we define both nodes and arcs.
It is critical that we have both timestamps and node start end
annotations on arcs as they serve different purposes. The role of
the 'from' and 'to' annotations on arcs is to define the topology of the
graph. On the other hand the timestamps emma:start and emma:end are annotations
which describe temporal properties associated not necessarily with the arc but
with the label on the arc. There is in fact no guarantee that the emma:end on
'flights' in your example will be equivalent to the emma:start on 'to'.
If they were required to be the same, the transition point from one arc to the next
would have to be assigned to an arbitrary point in the silence between the two
words. Similarly if there is no silence between two words in sequence and
in fact they may share a geminate consonant, for example"well lit" "gas station"
word timings from the recognizer may in fact overlap, that is the end of
the arc for the word "well" may be later than beginning of "lit".
Perhaps the even stronger case for having both time and the 'from' 'to'
annotations is that in the lattice representation being at a particular
time point does not guarantee that you are on the same node in the lattice.
For example, imagine a lattice representing two possible strings:
'to boston'
'two blouses'
The lattice representation:
<emma:lattice initial="1" final="4">
<emma:arc from="1" to="2" start="1000" end ="2000">to</emma:arc>
<emma:arc from="1" to="3" start="1000" end ="2000">two</emma:arc>
<emma:arc from="2" to="4" start="2000" end ="4000">boston</emma:arc>
<emma:arc from="3" to="4" start="2000" end ="4000">blouses</emma:arc>
</emma:lattice>
Note that even though the first two arcs end at the same time point
those arcs lead to different states 2 vs. 3, encoding which path
has been taken in the graph.
The critical factor here is that the lattice representation does not
necessarily have to correspond to a time sequence. The lattice representation
is used to encode a range of possible interpretations of a signal. It is
often the case that the left to right sequence of symbols in the lattice corresponds to
time but there is no guarantee. For example, the lattice may represent
interpretations of a typed text string rather than speech. It is also possible that
a semantic representation encoded as a lattice could have time annotations
on the first arc which are later than time annotations on the final arc.
Since lattices represent abstractions over the signal we cannot assume
that time annotations define their topology.
In order to clarify this we will add text to the
specification making clear that lattices represent abstractions of the
signal, and that time annotations may describe labels rather than arcs.
We would greatly appreciate if you would review this response and
respond within three weeks indicating whether this resolves
your concern. If we do not receive a response within three weeks we
will assume that this response resolves your concern.
Issue i18N-9
From Felix Sasaki (2005-10-26):
Sec. 4.2.15 on references to a grammar
--------------------------------------------------------------
You identify a grammar by an URI. It might also be useful to be able to say
"just a french grammar", without specifying which one. That is, to have a
mechanism
to specify the relations like general vs specific between grammars.
Resolution: Rejected
We do not see any important use cases addressed by this
potential feature. Specifically, we don't believe that specifying 'just a
french grammar' would provide sufficient additional information over and
above the information provided by the 'emma:lang' attribute to make it worth
adding. This is due to the fact that it is only through successful
processing using a language-specific grammar that the processor can identify
the language used by the speaker in the first place.
Issue i18N-13
From Felix Sasaki (2005-10-26):
Is it possible to apply the emma:lang annotation also to tokens?
Resolution: Rejected
There is no language associated with the contents of emma:tokens. In many
cases, this attribute value will not be meaningful to the casual
reader. For instance,
it may describe the phonemes or phonetic units for speech
recognition. Proper nouns or
shared words such as 'no' in English and Spanish may appear in the grammars
for several
languages, though the meaning may be identical and the system may not care
which language applied.
It is proper to say that emma:tokens and emma:lang provide information
about the user's
input but not that emma:lang describes the language of the contents of
emma:tokens.
Issue i18N-2-1
From Richard Ishida (2007-05-02):
There is no language attribute available for use on the emma:literal element. Please add one.
Resolution: Accepted (w/modifications)
Every emma:literal element appears within an emma:interpretation
element, and the emma:lang attribute is permitted on emma:interpretation. Therefore, there is no need for another emma:lang attribute on the emma:literal element. For consistency we prefer that emma:lang appear on emma:interpretation rather than having emma:lang potentially appear on both elements. With respect to xml:lang, we will clarify in the specification that the xml:lang attribute
can appear on any emma element (including emma:literal).Every emma:literal element appears within an emma:interpretation element, and the emma:lang attribute is permitted on emma:interpretation.Therefore, there is no need for another emma:lang attribute on the emma:literal element. For consistency we prefer that emma:lang appear on emma:interpretation rather than having emma:lang potentiallyappear on both elements. With respect to xml:lang, we will clarify in the specification that the xml:lang attribute can appear on any emma element (including emma:literal).
Issue i18N-2-4
From Richard Ishida (2007-05-02):
Use of xml:lang="". In XML 1.0 you can indicate the lack of language information using xml:lang="". How does EMMA allow for that with xml:lang and emma:lang? We feel it ought to. See http://www.w3.org/International/questions/qa-no-language
Resolution: Accepted (w/modifications)
Thank you for raising this important issue. In addressing this issue and reading related documents such as
(http://www.w3.org/International/questions/qa-no-language),
we determined that in addition to the use of emma:lang="" we should also
address the use of emma:lang="zxx". Below we address each in turn:
1. Non-linguistic input (emma:lang="zxx"):
------------------------------------------
Given the use of EMMA for capturing multimodal input, including input
using pen/ink, sensors, computer vision etc there are many EMMA results
that capture non-linguistic input. Example include drawing areas, arrows
etc.
on maps and music input for tune recognition. This raises the question
of how non-linguistic inputs should be annotated for emma:lang. Following
on from
the use in xml:lang, we propose that non-linguistic input should be
marked using the value "zxx". Since we already refer to BCP 47 and use the
values from the IANA subtag registry for emma:lang values this does not require revision
of the EMMA markup. We will however, add an example and clarifying text to the
EMMA specification indicating the use of emma:lang="zxx" for non-linguistic
inputs.
To illustrate the difference between emma:lang and xml:lang for this
kind of case. Hummed input to a tune recognition application would be
emma:lang="zxx" since the input is not in a human language, but it the result was a
song title in English, that would be marked as xml:lang="en":
<emma:emma>
<emma:interpretation emma:lang="zxx" emma:mode="tune"
emma:medium="acoustic">
<songtitle xml:lang="en">another one bites the dust</songtitle>
</emma:interpretation>
<emma:emma>
2. Non-specification (emma:lang="")
-----------------------------------
Parallel to your suggested usage for xml:lang
(http://www.w3.org/International/questions/qa-no-language),
for cases in which there is no information about
whether the source input is in a particular human language and if so
which language, are annotated as emma:lang="".
Furthermore, in cases where there is not explicit emma:lang
annotation, and none is inherited from a higher element in the
document, the default value for emma:lang is "" meaning
that there is no information about whether the source
input is in a language and if so which language.
Issue SW-1
From Jin Liu (2006-10-2):
Suggest use of EMMA to represent output using emma:result element.
Resolution: Deferred
The current scope of the EMMA specification is to provide
a framework for representing and annotating user inputs.
There are considerably more issues to address and work
needed to give an adequate representation of user output
and so for the current specification document the multimodal
working group have chosen to defer work on output. For
example, how would graphical output be handled, if the
system is going to draw ink, display a table, or zoom a map?
There has been interest in output representation both inside
and outside the working group. In a future version of EMMA we
may consider this topic, and would at that time return to
your contribution and others we have received.
Issue SW-2
From Jin Liu (2006-10-2):
USING EMMA FOR STATUS COMMUNICATION AMONG COMPONENTS
PROPOSAL TO ADD EMMA ANNOTATIONS FOR STATUS COMMUNICATION
AMONG COMPONENTS:
emma:status
emma:actual-answer-time
emma:expected-answer-time
emma:query-running
Resolution: Rejected
The scope of EMMA is to provide an representation and annotation
mechanism for user inputs to spoken and multimodal systems. As
such status communication messages among processing components
fall outside the scope of EMMA and are better addressed as part of the
MMI architecture outside of EMMA. We are forwarding this feedback to
the architecture and authoring subgroups within the W3C Multimodal
working group. This contribution is of particular interest to the
authoring effort.
Issue SW-3
From Jin Liu (2006-10-2):
OOV
=======================================================================
PROPOSAL TO ADD EMMA:OOV MARKUP FOR INDICATING PROPERTIES OF
OUT OF VOCABULARY ITEMS:
emma:oov
<emma:arc emma:from="6" emma:to="7"
emma:start="1113501463034"
emma:end="1113501463934"
emma:confidence="0.72">
<emma:one-of id="MMR-1-1-OOV"
emma:start="1113501463034" emma:end="1113501463934">
<emma:oov emma:class="OOV-Celestial-Body"
emma:phoneme="stez"
emma:grapheme="sters"
emma:confidence="0.74"/>
<emma:oov emma:class="OOV-Celestial-Body"
emma:phoneme="stO:z"
emma:grapheme="staurs"
emma:confidence="0.77"/>
<emma:oov emma:class="OOV-Celestial-Body"
emma:phoneme="stA:z"
emma:grapheme="stars"
emma:confidence="0.81"/>
</emma:one-of>
</emma:arc>
Resolution: Rejected
While the ability to specify recognize and annotate the
presence of out of vocabulary items appears extremely
valuable, the EMMA group are concerned as to how many
recognizers will in fact provide this capability. Furthermore
to develop this proposal fully significant time will have to
be assigned. Therefore we believe that the proposed
annotation for oov is a best handled as vendor specific
annotation. EMMA provides an extensibility mechanism for
such annotations through the emma:info element. The
current markup from your feedback above does not meet the
EMMA XML schema as it contains emma:one-of within
a lattice emma:arc. Also the timestamp on the one of
may not be necessary since it matches that on emma:arc.
The oov information could alternatively be encoded as a vendor
or application specific extension
using emma:info as follows:
<emma:arc emma:from="6" emma:to="7"
emma:start="1113501463034"
emma:end="1113501463934"
emma:confidence="0.72">
<emma:info>
<example:oov class="OOV-Celestial-Body"
phoneme="stez"
grapheme="sters"
confidence="0.74"/>
<example:oov class="OOV-Celestial-Body"
phoneme="stO:z"
grapheme="staurs"
confidence="0.77"/>
<example:oov class="OOV-Celestial-Body"
phoneme="stA:z"
grapheme="stars"
confidence="0.81"/>
</emma:info>
</emma:arc>
Issue SW-4
From Jin Liu (2006-10-2):
In dialog applications it is important to distinguish between
each distinct turn. The xs:nonNegativInteger annotation specifies
the turn ID associated with an element.
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma">
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/emma/emma10.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation turn-id="42">
...
</emma:interpretation>
</emma:emma>
Resolution: Accepted
We agree that it is important to have an annotation of indicating turn id and adopt your suggestion.
We have added a new section to the specification:
4.2.17 Dialog turns: emma:dialog-turn attribute
The emma:dialog-turn annotation associates the EMMA result in the container
element with a dialog turn. The syntax and semantics of dialog turns is
left open to suit the needs of individual applications. For example, some applications
may use an integer value, where successive turns are represented by successive integers. Other
applications may combine a name of a dialog participant with an integer value
representing the turn number for that participant. Ordering semantics for comparison of
emma:dialog-turn is deliberately unspecified and left for applications to define.
Issue ITS-01
From Christian Lieske (2007-05-03):
i. Allowing ITS markup in EMMA.
With this provision in place, EMMA could for example easily carry for
example information on directionality, or ruby. Your example
[emma:tokens="arriving at 'Liverpool Street'"] could for example be enhanced
by local ITS markup (see
http://www.w3.org/TR/its/#basic-concepts-selection-local) as follows in
order to explicitly encode directionality information: [its:dir="ltr"
emma:tokens="arriving at 'Liverpool Street'"]. Please note, that the EMMA
design decision to encode tokens in an attribute prevents a decoration of
individual tokens. With an elements-based encoding of tokens, the example
[<tokens> arriving at 'Liverpool Street'</tokens>] furthermore could be
enhanced by local ITS markup as follows in order to explicitly encode the
fact that 'Liverpool Street' is a specific type of linguistic unit ('span'
by the way is an element which ITS recommends): [<tokens>arriving at <span
its:term="yes">Liverpool Street</span></tokens>"].
Aside: We have considered your response on tokens in
http://lists.w3.org/Archives/Public/public-i18n-core/2006JulSep/0074.html
while crafting this suggestion. We felt, that ITS-annotations to tokens
despite of your response would be valuable.
Resolution: Rejected
EMMA provides different mechanisms for representing captured input and the various stages of semantic analysis that follow. We agree that there are situations where ITS markup is appropriate within an EMMA document and that the 'emma:tokens' attribute does not permit embedded ITS annotations. The restricted content model of emma:tokens has been intentionally chosen to make common use cases simple. There are other approaches with greater expressive power where ITS annotations may be specified.
EMMA anticipates a rich diversity of user inputs (e.g. keyboard entry, speech, handwriting input) and provides multiple mechanisms for representing that input. The 'emma:tokens' attribute is the most limited of these. Other mechanisms such as the 'emma:signal' and the emma:derivation element offer far more freedom. To better explain these different mechanisms, we offer some background and walk through two illustrative examples showing how user input may be used to represented and/or summarized at various levels within the semantic analysis. We expect this review will better explain where 'emma:tokens' is appropriate.
Issue ITS-02
From Christian Lieske (2007-05-03):
ii. Creating an ITS Rule file (see
http://www.w3.org/TR/its/#link-external-rules) along with the EMMA
specification (e.g. as a non-normative appendix).
With this in place, localization/translation would become easier in case
EMMA instances or parts of EMMA instances (eg. an "interpretation") would
need to be transferred from one natural language to another one.
Several EMMA and elements and attributes contain text. Most, if not all
localization tools (as well as ITS) assume element content is translatable
and attribute content is not translatable. However in EMMA, this assumption
does not seem to be valid. The EMMA element "interpretation" for example
does not seem to contain immediate translatable content, and the EMMA
attribute "tokens" in some circumstances might have to be translated.
While this is fine because tools have ways to specify an element should not
be translated, it is very often quite difficult no know *which elements* or
*which attributes* should behave like that. Having a list of elements that
are non-translatable (or conversely if there are more non-translatable than
translatable elements) would help a lot. This list could be expressed using
ITS rules (see http://www.w3.org/TR/its/#basic-concepts-selection-global)
relating to "its:translate" (see "its:translate" see
http://www.w3.org/TR/its/#trans-datacat). This way all user of translation
tools (or other language-related applications such as machine-translation
engines, etc.) could look up that set of rules and process accordingly.
For the examples given above, and ITS rules file could be as simple as:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
<its:translateRule selector="//interpretation" translate="no"/>
<its:translateRule selector="//@tokens" translate="yes"/>
</its:rules>
Resolution: Deferred