Google, Mountain View
1st February 2006
Outline of today's talk
- The Ubiquitous Web
- Speech Enabling Web Pages
- Web Presentations
- Web Meetings
- Business Opportunities
- Concluding Remarks
The Ubiquitous Web
Ubiquitous Computing
Ubiquitous Web
Ubiquitous Web
W3C Ubiquitous Web Workshop
Tokyo, March 2006, driven by widespread interest
in Web Applications and the potential to go further.
- To explore the vision of the Web as a distributed
applications platform that works across a wide range of
devices in areas such as offices, home networks, mobile,
automotive, aviation, etc. with the potential for
increasing the range and reducing the cost of developing
and deploying such applications.
- To explain how current W3C work fits into this vision,
e.g. work on Web Application API's, Delivery Context
Interfaces, Device Descriptions, Multimodal Architecture,
etc.
- To identify and prioritize additional areas which
would benefit from standardization, in particular, the
integration of sessions and device coordination into web
applications, as a means to enable the benefits described
in the Call
for Participation.
Current Work
- Web APIs Working Group aims to standardize established
Web scripting interfaces e.g. Window object and XMLHTTPRequest
- Delivery Context Interfaces (DCI) model user preferences,
device capabilities and environmental conditions as a hierarchy
of DOM nodes
- This is intended to enable applications to dynamically adapt
to the context, and to provide access to a wide range of
services
- We have the framework and now the challenge is to work
together to build this out
- Multimodal Architecture and Interfaces describes a way to
loosely couple user interface components to interaction managers
via DOM events
- The IETF Widex Working Group is developing protocols for
exchanging DOM events and DOM updates between applications
and remote user interfaces
IETF Widex Working Group
Developing a protocol for remote user interfaces based
upon the Model-View-Controller paradigm, where the UI is
expressed in terms of an XML DOM and the protocol is
independent of the markup language.
+-----------------------------+ +---------------+
| Widex Server | | Widex Renderer|
| +-------+ .............. | | +-----------+ |
| | | . .--------------->| | |
| | | . View . | Updates | | | |
| | | . (Virtual) .<---------------| | |
| | | .............. | | | View | |
| | Model | | | | | |
| | | +------------+ | | | | |
| | | | |<---------------| (XML DOM) | |
| | | | Controller | | Events | | | |
| | | | |--------------->| | |
| +-------+ +------------+ | | +-----------+ |
+-----------------------------+ +---------------+
See draft-ietf-widex-requirements-00.txt,
V. Stirbu (Nokia) and D. Raggett (W3C/Canon),
January 12th, 2006
Note: at the request of OMA and several W3C
members, W3C has started work on a solution for streaming updates
for SVG documents that will also work with other XML languages.
Streaming introduces timing related requirements, and the W3C
and IETF groups will coordinate their work on this.
What's missing or can be improved upon?
- Managing resources within temporary or persistent
sessions
- work arounds exist using cookies and embedding session
information within URIs, but a more flexible framework is
needed especially for resources and bindings that last
beyond individual Web pages
- Extending device capabilities via network resources
- e.g. printers, projectors, speech synthesis and recognition,
natural language translation, geographic location, etc.
- need a way to discover such resources and bind them into
the current session
- Support for applications involving multiple devices
- with the means to pass events between devices
- URIs for naming devces, services and sessions
- enabling the use of rich metadata (the Semantic
Web) for resource discovery, acting across different kinds of
networks, and leveraging the distributed nature of the Web
Exposing Device Coordination to the Web
- Registering what services a device provides
- Discovering what services are available
- Could be local or remote
- May be physically nearby, but on different networks
- Binding to a service
- Using a service
- Relinquishing a service
- How to expose existing device coordination frameworks
to Web applications?
- UPnP, WSD, Jini, Salutation, ...
The DOM and Distributed Services
- Web application identifies need for a service
- e.g. speech synthesis and recognition
- It discovers and binds the service
- This exposes the service to the local DOM but hides
the details of how it is implemented
- Local interface can be described in IDL
and exploited via markup or scripting
- For a remote speech engine, the local interface acts
as a proxy for the speech engine
- The implementation could make use of Web Services,
or other protocols
Options for adding speech capabilities
- Handling speech modality in the network
- Loose coupling of modality interfaces
- e.g. XHTML locally with VoiceXML in the network,
with CCXML for high level flow control
- Handling speech modality in the browser
- Embedded vs networked speech
- latency, quality, vocabulary, network, battery, etc.
- Plugin vs local speech proxy
- Standard scripting interface?
Latency
- Simple commands with visual actions
- up, down, select, . . .
- Feels slow if delay is much greater than 100mS
- Dialogue turn hand over
- When user stops talking (or pauses)
- When application stops talking (or pauses)
- Seizing the turn (aka barge-in)
- User or application talks over the other party
- Network delays are not as bad as they seem
Using AJAX to add speech
- AJAX = JavaScript access to HTTP
- XMLHttp request object
- Supported by most modern Web browsers
- Local HTTP server handles device audio
- ALSA on Linux, and winmm.dll on Windows
- Open source speech codec for compression
- Remote HTTP server provides speech services
- ASR with audio in HTTP request,
and EMMA in HTTP response
- TTS with text or SSML in HTTP request,
and audio in HTTP response
HTTP for Speech Services
- Speech Synthesis
- http://localhost:8888/say?text="good afternoon"
- http://localhost:8888/say?uri=<ssml file>
- Speech recognition
- http://localhost:8888/hear?uri=<srgs file>
- Additional parameters for
- Listening on multiple grammars
- Single result vs sequence of results
- Time out parameters
- Additional command for pre-loading grammars
SRGS + SISR → EMMA
- Use W3C Recommendations for speech
grammars and semantic interpretation
<rule id="order">
<tag>var index=0; out.pizza = new Array();</tag>
<item repeat="0-1"><ruleref uri="#start"/></item>
<item>
<ruleref uri="#pizza"/>
<tag>out.pizza[index]=$pizza; index+=1;</tag>
</item>
<item repeat="0-">
<item><token>and</token></item>
<item>
<ruleref uri="#pizza"/>
<tag>out.pizza[index]=$pizza; index+=1;</tag>
</item>
</item>
<item repeat="0-1"><ruleref uri="#stop"/></item>
</rule>
Pizza Grammar
I would like
four small cheese pizzas with olives and peppers
[<start>] [<number>] [<size>] <type> (pizza | pizzas) [with <extras>] [<stop>]
<start> ::= I want | I would like | I'll have | I'd like | I'd love | Give me
<stop> :: thanks | please | if you please
<number> ::= a | one | two | ... | nine
<size> ::= small | medium | large
<type> ::= cheese | pepperoni | sausage
<extras> ::= <topping> [[and] <topping>]*
<topping> ::= mushroom | olives | onions | peppers | tomatoes
<emma:interpretation>
<pizza>
<size>small</size>
<number>4</number>
<type>cheese</type>
<topping>olives</topping>
<topping>peppers</topping>
</pizza>
</emma:interpretation>
Pizza Grammar
A slightly more complex
grammar allows for
several kinds of pizza to be requested at once
Give me a medium pepperoni pizza
and a large cheese pizza with peppers and onions.
<emma:interpretation>
<pizza>
<number>1</number>
<size>medium</size>
<type>pepperoni</type>
</pizza>
<pizza>
<number>1</number>
<size>large</size>
<type>cheese</type>
<topping>sausage</topping>
<topping>onions</topping>
</pizza>
</emma:interpretation>
Application to Ordering Pizza
- Implemented in XHTML+CSS+JavaScript
- Supports compound utterances
- Faster than filling out forms via GUI
- But requires flexible dialogue to work around
inevitable misunderstandings
- DIY solution for describing behavior
- Combination of scripting and markup
- Markup interpreted via JavaScript
- Can be made to work across browsers
- Experimentation before standardization
Modeling Behavior
- Scripted handlers for XHTML events, e.g.
onload, onmouseover, onfocus, onchange
- Asynchronous callbacks for HTTP responses
- Used to handle results of speech recognition
- Initiated via calls to XMLHTTP request
- Asynchronous timers (setTimeout)
- Use of custom markup
- Application state, dialogue goals and history
- Event driven state transition rules
- Behavior can be modelled at a higher level server-side
Logging
- Usability is based upon real world experience
- That means you need to collect lots of data
- Log dialogues and audio for later analysis
- Speech server log's ASR, TTS requests
- AJAX used for logging dialogue state
- Including changes via visual modality
- Application assigned session identifier
- Used to associate log entries for same session
- Must be sent as part of all server requests
Remarks
- Complex utterances are more natural but require
a more flexible approach for effective dialogues
- Exposing speech to Web pages via JavaScript offers
flexibility for rolling your own solutions whilst
remaining inter-operable across browsers
- Open questions include
- Whether to access speech via a plugin, or via
AJAX and a locally installed HTTP server?
- Whether to pass audio within HTTP or to use a
concurrent RTP-based stream?
- There is an opportunity for a standard speech object
that abstracts away from embedded vs networked speech
Web Presentations
- Web-based alternative to PowerPoint
- No more need for large email attachments
- Just include the link to your slides
- Create and update your slides in your web browser
- HTML Slidy
uses XHTML, CSS and JavaScript
- Each slide marked up in a div element with class="slide"
- Font size automatically adapts to window size
- Incremental revealing of slide contents
- Different backgrounds for different slides
- Outline lists for extra details
- Automatically created table of contents
- Slidy style sheets and script available as Open Source
Incremental display of slide contents
For incremental display, use class="incremental", for
instance:
- First bullet point
- Second bullet point
- Third bullet point
which is marked up as follows:
<ul class="incremental">
<li>First bullet point</li>
<li>Second bullet point</li>
<li>Third bullet point</li>
</ul>
You can also set class="incremental" or
"non-incremental" on individual elements (except for
<br />)
Incremental display of layered images
These can be marked up using CSS relative positioning, e.g.
<div class="incremental"
style="margin-left: 10em; position: relative">
<img src="face1.gif" alt="face"
style="position: static; vertical-align: bottom"/>
<img src="face2.gif" alt="eyes"
style="position: absolute; left: 0; top: 0" />
<img src="face3.gif" alt="nose"
style="position: absolute; left: 0; top: 0" />
<img src="face4.gif" alt="mouth"
style="position: absolute; left: 0; top: 0" />
</div>
Create outline lists with hidden content
You can make your bullet points or numbered list items
into outlines that you can expand or collapse
- Just add class="outline" to the ul or ol
element. Click on this list item for more details.
- The Slidy script will then treat the list
as an outline list.
- Clicking on outline list items will expand/collapse
block-level elements within that list item.
- Click on the above to make this list item
collapse again.
- Users will then see expand/collapse icons as appropriate
and may click anywhere on the list item to change its state.
This particular list item can't be expanded or collapsed.
- Add class="expand" to any li elements that
you want to start in an expanded state.
- By default Slidy hides all the block level elements within the
outline list items unless you have specified class="expand".
- Such pre-expanded items can be collapsed by clicking on them.
Future Plans
Recent additions have included a table of contents, and a way to
hide and reveal content in the spirit of outline lists. Further
work is anticipated on the following:
- Collecting a gallery of good looking slide themes
- Opportunities for graphics designers!
- Getting SVG Tiny to work on IE without need for SVG plugin
- Using scripts to dynamically convert SVG Tiny to VML
- Or via conversion to Macromedia Flash
- Tweaks for working with IE7 when that becomes available
- Richer styling for incrementally revealed content
Future Plans
- Alpha version of wysiwyg slide editor (see screenshot
and demo on IE)
- Using contentEditable when available, otherwise
falling back to textarea and plain text conventions
- Using XMLHttpRequest to dynamically reflect changes to server
- Mechanism for remotely driving Slidy as part of distributed meetings
- Using XMLHttpRequest to listen for navigation commands
- Using VoIP for accompanying audio and teleconferencing
- controlled via HTTP requests
- Synchronizing recorded spoken presentation with currently viewed slide
- Filters from PowerPoint and Open Office
Web Meetings
- Presenter driven slide presentations
- VoIP for delivering presentations, posing questions,
and general teleconferencing
- Speech provided by browser extension/plugin or locally
installed proxy controlled via XMLHttpRequest with
RTP audio stream and iLBC codec
- HTTP used to control server-side VoIP resources
- Highly scalable to support many simultaneous meetings
- Shared minute taking
- Anyone can take minutes, and everyone can see them
as they are being typed, enabling instant corrections
- Based upon AJAX and contentEditable/designMode
- Text-based meeting related functions
- Precedent: W3C's Zakim IRC
Teleconference Agent
- Tracks who's present, who wants to speak, and on what
subject, keeping people to time, agenda topics, actions,
resolutions, etc.
Web Chat and Presence
Despite its current momentum, Jabber may not be the
long term solution, and we are likely to see solutions
that are more closely integrated with the Web. An example
of this approach is provided by meebo
- AJAX makes it practical to support live chat sessions
and presence information within web pages
- Users can be identified via a cookie, or via a user name
and password obtained using a secure connection (https)
- The AJAX-based protocol can in principle, use the same XML
schemas as defined for Jabber (XMPP RFCs)
- The tricky bits are the security policies and mechanisms
- Who gets to see when I am online?
- Can bad-guys spam my chat room?
- AJAX itself introduces some security considerations
- You are restricted to same domain as the page that
loaded the script
- Referrer
spoofing and a lot more (Amit Klein, September 2005)
- hacks using tabs in places of spaces in HTTP request
Business Opportunities
- Consumer facing meeting services
- Free and supported by ads
- Ads chosen to match context
- from website and slide presentations
- word spotting for text and voice chat
- Enterprise facing meeting services
- These are charged as appropriate
- Hosted services for least effort
- Software licensing for local installation
- Third party consultancy support
- Fostering an ecosystem for customization and support
- Mashups with other Web-based services
Business Opportunities
- Integration with related on-demand services
- Remote storage, archival and search services
- Documents and Spreadsheets
- Information-based business processes
- Business to Business services
- Product support and training materials
- Inter-company meetings
- Business to Consumer services
- Sales and support materials
- Education and online learning services
- Teaching people remotely, e.g. for continuing education
- Browser-based model enables richer interactivity
Concluding Remarks
- Ubiquitous Web
- Speech Enabling Web Pages
- Web Presentations
- Web Meetings
- Business Opportunities
- Concluding Remarks
n.b. the handwriting font used in this presentation
(TSCu_Comic.ttf) is available free under
the Gnu GPL and was created by Thukaram Gopalrao.