Voice Browser Working Group Charter
The mission of the Voice Browser Working Group,
part of the Voice Browser
Activity, is to enable users to speak and
listen to Web applications by creating standard languages for
developing Web-based speech applications. The Voice Browser Working
Group concentrates on languages for capturing and producing speech
and managing the dialog between user and computer, while a related
Group, the Multimodal Interaction Working Group,
concentrates on additional input modes including keyboard and
mouse, ink and pen, etc.
Summary Table
End date |
31 January 2009 |
Confidentiality |
Proceedings are Member-only, but the group sends regular summaries of ongoing work to the public mailing list. |
Initial Chairs |
Jim Larson, Scott McGlashan |
Initial Team Contacts
(FTE %: 100) |
Kazuyuki Ashimura, new hire |
Usual Meeting Schedule |
Teleconferences: Weekly
Face-to-face meetings: 3 to 4 per year |
Background
The telephone was invented in the 1870s and continues to be a very
important means for us to communicate with each other. The Web by
comparison is very recent, but has rapidly become a competing
communications channel. The convergence of telecommunications and the
Web is now bringing the benefits of Web technology to the telephone,
enabling Web developers to create applications that can be accessed
via any telephone, and allowing people to interact with these
applications via speech and telephone keypads. The W3C Speech
Interface Framework is a suite of markup specifications aimed at
realizing this goal. It covers voice dialogs, speech synthesis, speech
recognition, telephony call control for voice browsers and other
requirements for interactive voice response applications, including
use by people with hearing or speaking impairments.
Some possible applications include:
- Accessing business information, including the corporate "front
desk" asking callers who or what they want, automated telephone
ordering services, support desks, order tracking, airline arrival
and departure information, cinema and theater booking services, and
home banking services.
- Accessing public information, including community information
such as weather, traffic conditions, school closures, directions
and events; local, national and international news; national and
international stock market information; and business and e-commerce
transactions.
- Accessing personal information, including calendars, address
and telephone lists, to-do lists, shopping lists, and calorie
counters.
- Assisting the user to communicate with other people via sending
and receiving voice-mail and email messages.
Under previous charters, going back to 2000, The Voice Browser
Working Group have created the W3C Speech Interface
Framework suite of specifications, which includes:
- VoiceXML 2.0
Recommendation, 16 March 2004 (press release, testimonials) and VoiceXML 2.1 Last Call Working
Draft, 28 July 2004, specifies the flow control and exchange of
information between users and computers.
- Speech
Recognition Grammar Specification 1.0 Recommendation, 16 March
2004, specifies the words and phrases which a speech recognition
system can convert from speech to text.
- Semantic Interpretation
for Speech Recognition Last Call Working Draft, 8 November
2004, specifies how text returned from a speech recognition system
can be modified and reformatted.
- SSML
Recommendation, 7 September 2004 (press release,
testimonials),
specifies how to render text as human-like speech by a speech
synthesis system.
-
Pronunciation Lexicon 1.0 Last Call Working Draft, 26 October
2006, specifies how words are pronounced. This information is used by
the speech synthesis system to render words as human-like speech,
and is used by the speech recognition system to convert human
speech to text.
- CCXML Working
Draft, 22 November 2006, specifies how to manage the telephone
system (answer incoming calls, initiate outgoing calls, create
conference calls, etc.)
- State
Chart State Chart XML (SCXML): State Machine Notation for Control
Abstraction Working Draft , January 24, 2006, specifies the
dialog flow of a speech or multimodal application. The dialog flow
is separate from the capture and rendering of information.
In addition to the above, here is a list of documents produced
by the Voice Browser Activity
Scope
All work items carried out under this Charter must fall
within the scope defined by this section.
- VoiceXML 2.1
- VoiceXML 2.1 is an extension to VoiceXML 2.0 that provides 8
new features to VoiceXML 2.0. The Group plans to take VoiceXML 2.1 through
to Recommendation status.
- VoiceXML 3.0
- VoiceXML 3.0 is the next major release of VoiceXML. VoiceXML
3.0 will provide powerful dialog capabilities that can be used to
build advanced speech applications, and to provide these
capabilities in a form that can be easily and cleanly integrated
with other W3C languages. VoiceXML 3.0 will provide enhancements to
existing dialog and media control, as well as major new features
(e.g. multimedia prompts, VCR controls, speaker identification and verification, modularization, a
cleaner separation between data/flow/dialog, and asynchronous
external eventing) to facilitate interoperation with external
applications and media components. The Group will create multiple profiles
of VoiceXML 3.0 that enable subsets of VoiceXML 3.0 to target
specific user cases. (e.g.,. handheld computers and cell phones
with too few resources for full VoiceXML). The Group plans to continue work
on VoiceXML 3.0, and plan to published several iterations of the
document.
- State Chart XML
- SCXML 1.0 is a generic XML control language based on Harel
State Charts. Although SCXML was designed as a control language for
VoiceXML 3.0 and for Multimodal Interaction dialog management,
SCXML may also be used for control other types of applications. The Group
plans to take SCXML 1.0 through to Recommendation status.
- Speech synthesis
-
SSML 1.1
enhances SSML 1.0 to better support widely spoken East-Asian,
Indian and Middle Eastern languages in a manner that improves its
usefulness in other languages as well. It also updates SSML 1.0 to
be more consistent with PLS, SISR and expected VoiceXML 3.0
functionality. The Group plans to take SSML 1.1 through to Recommendation
status. The Group may begin work on SSML 2.0 which will restructure SSML
1.1, enhance the <say-as> element, the
role attribute, and possibly provide additional
enhancements (for example, emotion
elements).
- Speech recognition grammars
- This covers context free grammars and statistical models of
speech, together with DTMF input. SRGS 1.0 for
context free grammar is already a full Recommendation. The Group may
resume work on N-Gram (statistical) model of speech.
- Pronunciation Lexicon
- Pronunciation Lexicon Specification (PLS 1.0) provides the
basis for describing pronunciation information for use in speech
recognition and synthesis, for use in tuning applications, e.g. for
proper names that have irregular pronunciations. The Group plans to take
PLS 1.0 to full Recommendation. The Group may enhance
the role attribute, possibly with a registry.
- Semantic interpretation for speech recognition
- SISR 1.0 describes annotations to grammar rules for extracting
the semantic results from recognition, either as XML or as a value
that can be held in an ECMAScript variable. The target for the XML
output is EMMA (Extensible Multimodal
Annotation Markup Language) which is being developed in the W3C
Multimodal Interaction Activity.
- Telephony call control for voice browsers (CCXML 1.0)
- CCXML 1.0 is an XML language for controlling connections,
conferences, and dialogs in a Voice Browser context. The Group plans to
take CCXML 1.0 through to Recommendation status. We may consider enhancing CCXML 1.0.
- Maintenance work
- The Working Group will be maintaining its existing (or
soon-to-be) Recommendations: VoiceXML 2.0, VoiceXML 2.1, SRGS 1.0,
SSML 1.1, SISR 1.0, PLS 1.0, SCXML 1.0, and CCXML 1.0. Maintenance
takes the form of: responding to questions and requests on the
public mailing list, issuing errata as needed and possibly
publishing minor updates to the specifications.
Success Criteria
- For each document to advance to proposed Recommendation, the group
will typically produce a technical report with two independent and
interoperable implementations for each feature.
Deliverables
The following documents are expected to become W3C
Recommendations:
The following documents are either notes or are not expected to
advance toward Recommendation:
The following documents may be revised depending upon the
interest of working group members:
Milestones
This Working Group is chartered to last until 31 January 2009.
The first face to face meeting after re-chartering will be held
in May or June 2007.
Here is a list of milestones identified at the time of
re-chartering. Others may be added later at the discretion of the
Working Group. The dates are for guidance only and subject to
change.
Note: The group will document significant changes from this initial schedule on the group home page.
|
Document |
Requirements |
First Public Working Draft |
Last Call Working Draft |
Candidate Recommendation |
Proposed Recommendation |
Recommendation |
CCXML 1.0 |
Completed |
Completed |
1Q2007 |
2Q2007 |
3Q2007 |
3Q2007 |
PLS 1.0 |
Completed |
Completed |
Completed |
2Q2007 |
3Q2007 |
4Q2007 |
SISR 1.0 |
Completed |
Completed |
Completed |
2Q2007 |
3Q2007 |
4Q2007 |
SSML 1.1 |
1Q2007 |
1Q2007 |
2Q2007 |
3Q2007 |
4Q2007 |
1Q2008 |
VoiceXML 2.1 |
Completed |
Completed |
Completed |
11/2006 |
12/2006 |
1Q2007 |
VoiceXML 3.0 |
1Q2007 |
3Q2007 |
3Q2008 |
TBD |
TBD |
TBD |
SCXML 1.0 |
1Q2007 |
Completed |
3Q2007 |
1Q2008 |
3Q2008 |
3Q2008 |
Dependencies
These are related activities that the Group may need to interact with
in ways to be determined, for example, to ask them to review this Group's
draft specifications, and for this Group to take advantage of their work to
fulfil its needs. Collaboration across working groups will be
essential to realizing the mission of the Voice Browser
Activity.
W3C-related activities
The following groups are identified as being related to the work
of this group.
- Internationalization — The specifications of the VBWG
are expected to be usable worldwide and be adapted to a wide
variety all language. An ongoing strong relationship with the I18N
groups is essential to achieve this goal.
- Multimodal Interaction WG — The MMIWG has a strong
link to the VBWG as it is chartered to develop specifications that
allow to use the Web with using any modality, not just
voice.
- Synchronized Multimedia — VoiceXML 3.0 will introduce
advanced media controls, involving timing and synchronization
specification borrowed from SMIL.
- WAI Protocols and Format — The VBWG expects that its
work will be reviewed by the WAI-PF group, in order to ensure
universal accessibility of the produced specifications.
- Hypertext Coordination Group — The "backplane"
framework that is being developed by the groups belonging to the
HCG: HTML, Web Applications, XForms, Compound Documents formats,
etc. needs to be compatible with the VBWG's Data-Presentation-Flow
framework, introduced in the design of VoiceXML 3.0.
- XML and Semantic Web Activities — Because the
specifications developed in the VBWG are all based on XML, the
group will follow the work of the XML Activity in order to keep
them compatible with the ongoing evolution of XML. Similarly, many
specifications in the VBWG express metadata using RDF. Therefore,
cooperation with the Semantic Web Best Practices is expected in
case questions arise on the use of RDF.
- Security — The Speaker Verification and Identification
features of VoiceXML 3.0 will benefit from review from the Web
Security Activity.
- Emotion Incubator Group — the Group may consider
making some extensions to support the recognition of or the presentation of emotions
in speech.
External groups
Here is a list of external groups with complementary goals to
the Voice Browser activity:
- ECMA TC32-TG11 — computer supported telecommunications
applications (CSTA)
- ETSI — work on DSR codecs, call control, human factors
and command vocabularies
- IETF SpeechSC working group or its successor — protocols for accessing
speech engines
- ISO/IEC JTC 1/SC 37 Biometrics — user
authentication
- ITU — telecommunication standards
- SALT Forum — tags for adding speech to HTML and other
markup languages
- VoiceXML Forum — an industry association for VoiceXML,
see memorandum of understanding
Participation
To be successful, the Voice Browser Working Group is expected to
have 15 or more active participants
for its duration. Effective participation to Voice Browser Working
Group is expected to consume one work day per
week for each participant; two days
per week for editors. The Voice Browser Working Group will
allocate also the necessary resources for building Test Suites for
each specification. In order to make rapid progress, the Voice
Browser Working Group consists of several subgroups, each working
on a separate document. Voice Browser Working Group members may
participate in one or more subgroups.
Participants are reminded of the Good Standing requirements of the W3C Process.
To become a participant of the Working Group, a representative
of a W3C Member organization must be nominated by their Advisory
Committee Representative as
described in the W3C Process. The associated IPR disclosure
must further satisfy the
requirements specified in the W3C Patent Policy (5 February
2004 Version).
Experts from appropriate communities may also be invited to join
the working group, following the
provisions for this in the W3C Process.
Working Group participants are not obligated to participate in
every work item, however the Working Group as a whole is
responsible for reviewing and accepting all work items.
Face to face meetings will be arranged 3 to 4 times a
year. The Chair will make Working Group meeting dates and locations
available to the group in a timely manner according to the W3C Process. The Chair
is also responsible for providing publicly accessible summaries of
Working Group face to face meetings, which will be announced on
www-voice@w3.org.
Communication
This group primarily conducts its work on the Member-only mailing
list w3c-voice-wg@w3.org (archive).
Certain topics need coordination with external groups. The Chair and
the Working Group can agree to discuss these topics on a public
mailing list. The archived mailing
list www-voice@w3.org
is used for public discussion of W3C proposals for Voice Browsers and
for public feedback on the group's deliverables.
Information about
the group (deliverables, participants, face-to-face meetings,
teleconferences, etc.) is available from the
Voice Browser Working
Group home page.
All proceedings of the Working Group (mail archives, telecon
minutes, face-to-face minutes) will be available to W3C
Members. Summaries of face-to-face meetings will be sent to the public list.
Decision Policy
As explained in the Process Document (section 3.3), this group will seek to make decisions
when there is consensus. When the Chair puts a question and
observes dissent, after due consideration of different opinions,
the Chair should record a decision (possibly after a formal vote)
and any objections, and move on.
This charter is written in accordance with Section 3.4, Votes of the W3C Process Document and
includes no voting procedures beyond what the Process Document
requires.
Patent Policy
This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote
the widest adoption of Web standards, W3C seeks to issue
Recommendations that can be implemented, according to this policy,
on a Royalty-Free basis.
For more information about disclosure obligations for this
group, please see the W3C Patent Policy Implementation.
About this Charter
This charter for the Voice Browser Working Group has been
created according to section 6.2 of the Process
Document. In the event of a conflict between this document or
the provisions of any charter and the W3C Process, the W3C Process
shall take precedence.
Please also see the previous charter
for this group.
Note: This charter was modified on 26 November 2007 to included the
informative note in section 4.1 referring readers to the home page of
the group for updated milestone information.
James A. Larson, Co-chair, Voice Browser Working Group
Max Froumentin, Voice Browser Activity Lead
Kazuyuki Ashimura, Voice Browser Working Group staff contact
Copyright© 2007 W3C ®
(MIT ,
ERCIM
, Keio), All
Rights Reserved.
$Date: 2007/11/26 23:18:30 $