Ubiquitous Computing

Ubiquitous computing represents a powerful shift in computation, where people live, work, and play in a seamlessly interweaving computing environment. Ubiquitous computing postulates a world where people are surrounded by computing devices and a computing infrastructure that supports us in everything we do.

Mark Weiser, The Computer of the 21st Century, Scientific American, Sept 1991.

Ubiquitous Web

The Ubiquitous Web seeks to broaden the capabilities of browsers to enable new kinds of web applications, particularly those involving coordination with other devices. Some examples include connecting a camera phone to a nearby printer, using a cell phone to give a business presentation with a wireless projector, and viewing your mailbox while listening to your messages.

Ubiquitous Web

These applications involve identifying resources and managing them within the context of an application session. The resources can be remote as in a network printer and projector, or local, as in the estimated battery life, network signal strength, and audio volume level. The Ubiquitous Web will provide a framework for exposing device coordination capabilities to Web applications.

W3C Ubiquitous Web Workshop

Tokyo, March 2006, driven by widespread interest in Web Applications and the potential to go further.

To explore the vision of the Web as a distributed applications platform that works across a wide range of devices in areas such as offices, home networks, mobile, automotive, aviation, etc. with the potential for increasing the range and reducing the cost of developing and deploying such applications.
To explain how current W3C work fits into this vision, e.g. work on Web Application API's, Delivery Context Interfaces, Device Descriptions, Multimodal Architecture, etc.
To identify and prioritize additional areas which would benefit from standardization, in particular, the integration of sessions and device coordination into web applications, as a means to enable the benefits described in the Call for Participation.

Current Work

Web APIs Working Group aims to standardize established Web scripting interfaces e.g. Window object and XMLHTTPRequest
Delivery Context Interfaces (DCI) model user preferences, device capabilities and environmental conditions as a hierarchy of DOM nodes
- This is intended to enable applications to dynamically adapt to the context, and to provide access to a wide range of services
- We have the framework and now the challenge is to work together to build this out
Multimodal Architecture and Interfaces describes a way to loosely couple user interface components to interaction managers via DOM events
The IETF Widex Working Group is developing protocols for exchanging DOM events and DOM updates between applications and remote user interfaces

IETF Widex Working Group

Developing a protocol for remote user interfaces based upon the Model-View-Controller paradigm, where the UI is expressed in terms of an XML DOM and the protocol is independent of the markup language.

+-----------------------------+            +---------------+
|       Widex Server          |            | Widex Renderer|
| +-------+    .............. |            | +-----------+ |
| |       |    .            .--------------->|           | |
| |       |    .    View    . |  Updates   | |           | |
| |       |    .  (Virtual) .<---------------|           | |
| |       |    .............. |            | |   View    | |
| | Model |                   |            | |           | |
| |       |    +------------+ |            | |           | |
| |       |    |            |<---------------| (XML DOM) | |
| |       |    | Controller | |   Events   | |           | |
| |       |    |            |--------------->|           | |
| +-------+    +------------+ |            | +-----------+ |
+-----------------------------+            +---------------+

See draft-ietf-widex-requirements-00.txt, V. Stirbu (Nokia) and D. Raggett (W3C/Canon), January 12th, 2006

Note: at the request of OMA and several W3C members, W3C has started work on a solution for streaming updates for SVG documents that will also work with other XML languages. Streaming introduces timing related requirements, and the W3C and IETF groups will coordinate their work on this.

What's missing or can be improved upon?

Managing resources within temporary or persistent sessions
- work arounds exist using cookies and embedding session information within URIs, but a more flexible framework is needed especially for resources and bindings that last beyond individual Web pages
Extending device capabilities via network resources
- e.g. printers, projectors, speech synthesis and recognition, natural language translation, geographic location, etc.
- need a way to discover such resources and bind them into the current session
Support for applications involving multiple devices
- with the means to pass events between devices
URIs for naming devces, services and sessions
- enabling the use of rich metadata (the Semantic Web) for resource discovery, acting across different kinds of networks, and leveraging the distributed nature of the Web

Exposing Device Coordination to the Web

Registering what services a device provides
- How to describe services
Discovering what services are available
- Could be local or remote
- May be physically nearby, but on different networks
Binding to a service
Using a service
Relinquishing a service
How to expose existing device coordination frameworks to Web applications?
- UPnP, WSD, Jini, Salutation, ...

The DOM and Distributed Services

Web application identifies need for a service
- e.g. speech synthesis and recognition
It discovers and binds the service
This exposes the service to the local DOM but hides the details of how it is implemented
- Local interface can be described in IDL and exploited via markup or scripting
For a remote speech engine, the local interface acts as a proxy for the speech engine
The implementation could make use of Web Services, or other protocols

Options for adding speech capabilities

Handling speech modality in the network
- Loose coupling of modality interfaces
  - e.g. XHTML locally with VoiceXML in the network, with CCXML for high level flow control
Handling speech modality in the browser
- Embedded vs networked speech
  - latency, quality, vocabulary, network, battery, etc.
- Plugin vs local speech proxy
- Standard scripting interface?

Latency

Simple commands with visual actions
- up, down, select, . . .
  - Feels slow if delay is much greater than 100mS
Dialogue turn hand over
- When user stops talking (or pauses)
- When application stops talking (or pauses)
Seizing the turn (aka barge-in)
- User or application talks over the other party
Network delays are not as bad as they seem

Using AJAX to add speech

AJAX = JavaScript access to HTTP
- XMLHttp request object
- Supported by most modern Web browsers
Local HTTP server handles device audio
- ALSA on Linux, and winmm.dll on Windows
- Open source speech codec for compression
Remote HTTP server provides speech services
- ASR with audio in HTTP request, and EMMA in HTTP response
- TTS with text or SSML in HTTP request, and audio in HTTP response

HTTP for Speech Services

Speech Synthesis
- http://localhost:8888/say?text="good afternoon"
- http://localhost:8888/say?uri=<ssml file>
Speech recognition
- http://localhost:8888/hear?uri=<srgs file>
- Additional parameters for
  - Listening on multiple grammars
  - Single result vs sequence of results
  - Time out parameters
- Additional command for pre-loading grammars

Application to ordering Pizza

You get to choose number, type, size and extra toppings!

pepperoni pizza founders pizza chicken pizza all the works pizza

SRGS + SISR → EMMA

Use W3C Recommendations for speech grammars and semantic interpretation

   <rule id="order">
      <tag>var index=0; out.pizza = new Array();</tag>
      <item repeat="0-1"><ruleref uri="#start"/></item>
      <item>
        <ruleref uri="#pizza"/>
        <tag>out.pizza[index]=$pizza; index+=1;</tag>
      </item>
      <item repeat="0-">
         <item><token>and</token></item>
         <item>
           <ruleref uri="#pizza"/>
           <tag>out.pizza[index]=$pizza; index+=1;</tag>
         </item>
      </item>
      <item repeat="0-1"><ruleref uri="#stop"/></item>
   </rule>

Pizza Grammar

I would like four small cheese pizzas with olives and peppers

[<start>] [<number>] [<size>] <type> (pizza | pizzas) [with <extras>] [<stop>]

<start> ::= I want | I would like | I'll have | I'd like | I'd love | Give me
<stop> :: thanks | please | if you please
<number> ::= a | one | two | ... | nine
<size> ::= small | medium | large
<type> ::= cheese | pepperoni | sausage
<extras> ::= <topping> [[and] <topping>]*
<topping> ::= mushroom | olives | onions | peppers | tomatoes

<emma:interpretation>
  <pizza>
     <size>small</size>
     <number>4</number>
     <type>cheese</type>
     <topping>olives</topping>
     <topping>peppers</topping>
  </pizza>
</emma:interpretation>

Pizza Grammar

A slightly more complex grammar allows for
several kinds of pizza to be requested at once

Give me a medium pepperoni pizza and a large cheese pizza with peppers and onions.

      <emma:interpretation>
         <pizza>
            <number>1</number>
            <size>medium</size>
            <type>pepperoni</type>
         </pizza>
         <pizza>
            <number>1</number>
            <size>large</size>
            <type>cheese</type>
            <topping>sausage</topping>
            <topping>onions</topping>
         </pizza>
      </emma:interpretation>

Application to Ordering Pizza

Implemented in XHTML+CSS+JavaScript
Supports compound utterances
- Faster than filling out forms via GUI
- But requires flexible dialogue to work around inevitable misunderstandings
DIY solution for describing behavior
- Combination of scripting and markup
- Markup interpreted via JavaScript
- Can be made to work across browsers
- Experimentation before standardization

Modeling Behavior

Scripted handlers for XHTML events, e.g. onload, onmouseover, onfocus, onchange
Asynchronous callbacks for HTTP responses
- Used to handle results of speech recognition
- Initiated via calls to XMLHTTP request
Asynchronous timers (setTimeout)
Use of custom markup
- Application state, dialogue goals and history
- Event driven state transition rules
Behavior can be modelled at a higher level server-side

Logging

Usability is based upon real world experience
- That means you need to collect lots of data
Log dialogues and audio for later analysis
- Speech server log's ASR, TTS requests
- AJAX used for logging dialogue state
  - Including changes via visual modality
- Application assigned session identifier
  - Used to associate log entries for same session
  - Must be sent as part of all server requests

Remarks

Complex utterances are more natural but require a more flexible approach for effective dialogues
Exposing speech to Web pages via JavaScript offers flexibility for rolling your own solutions whilst remaining inter-operable across browsers
Open questions include
- Whether to access speech via a plugin, or via AJAX and a locally installed HTTP server?
- Whether to pass audio within HTTP or to use a concurrent RTP-based stream?
There is an opportunity for a standard speech object that abstracts away from embedded vs networked speech

Web Presentations

Web-based alternative to PowerPoint
- No more need for large email attachments
  - Just include the link to your slides
- Create and update your slides in your web browser
HTML Slidy uses XHTML, CSS and JavaScript
- Each slide marked up in a div element with class="slide"
- Font size automatically adapts to window size
- Incremental revealing of slide contents
- Different backgrounds for different slides
- Outline lists for extra details
- Automatically created table of contents
- Slidy style sheets and script available as Open Source

Incremental display of slide contents

For incremental display, use class="incremental", for instance:

First bullet point
Second bullet point
Third bullet point

which is marked up as follows:

<ul class="incremental"> 
  <li>First bullet point</li> 
  <li>Second bullet point</li> 
  <li>Third bullet point</li> 
</ul>

You can also set class="incremental" or "non-incremental" on individual elements (except for <br />)

Incremental display of layered images

These can be marked up using CSS relative positioning, e.g.

<div class="incremental" 
 style="margin-left: 10em; position: relative"> 
  <img src="face1.gif" alt="face" 
   style="position: static; vertical-align: bottom"/> 
  <img src="face2.gif" alt="eyes" 
    style="position: absolute; left: 0; top: 0" /> 
  <img src="face3.gif" alt="nose" 
    style="position: absolute; left: 0; top: 0" /> 
  <img src="face4.gif" alt="mouth" 
    style="position: absolute; left: 0; top: 0" /> 
</div>

Create outline lists with hidden content

You can make your bullet points or numbered list items into outlines that you can expand or collapse

Just add class="outline" to the ul or ol element. Click on this list item for more details.
- The Slidy script will then treat the list as an outline list.
- Clicking on outline list items will expand/collapse block-level elements within that list item.
- Click on the above to make this list item collapse again.
Users will then see expand/collapse icons as appropriate and may click anywhere on the list item to change its state. This particular list item can't be expanded or collapsed.
Add class="expand" to any li elements that you want to start in an expanded state.
- By default Slidy hides all the block level elements within the outline list items unless you have specified class="expand".
- Such pre-expanded items can be collapsed by clicking on them.

Future Plans

Recent additions have included a table of contents, and a way to hide and reveal content in the spirit of outline lists. Further work is anticipated on the following:

Collecting a gallery of good looking slide themes
- Opportunities for graphics designers!
Getting SVG Tiny to work on IE without need for SVG plugin
- Using scripts to dynamically convert SVG Tiny to VML
- Or via conversion to Macromedia Flash
Tweaks for working with IE7 when that becomes available
Richer styling for incrementally revealed content

Future Plans

Alpha version of wysiwyg slide editor (see screenshot and demo on IE)
- Using contentEditable when available, otherwise falling back to textarea and plain text conventions
- Using XMLHttpRequest to dynamically reflect changes to server
Mechanism for remotely driving Slidy as part of distributed meetings
- Using XMLHttpRequest to listen for navigation commands
- Using VoIP for accompanying audio and teleconferencing
  - controlled via HTTP requests
- Synchronizing recorded spoken presentation with currently viewed slide
Filters from PowerPoint and Open Office
- and export to PDF via PrinceXML

Web Meetings

Presenter driven slide presentations
VoIP for delivering presentations, posing questions, and general teleconferencing
- Speech provided by browser extension/plugin or locally installed proxy controlled via XMLHttpRequest with RTP audio stream and iLBC codec
- HTTP used to control server-side VoIP resources
- Highly scalable to support many simultaneous meetings
Shared minute taking
- Anyone can take minutes, and everyone can see them as they are being typed, enabling instant corrections
  - Based upon AJAX and contentEditable/designMode
Text-based meeting related functions
- Precedent: W3C's Zakim IRC Teleconference Agent
- Tracks who's present, who wants to speak, and on what subject, keeping people to time, agenda topics, actions, resolutions, etc.

Web Chat and Presence

Despite its current momentum, Jabber may not be the long term solution, and we are likely to see solutions that are more closely integrated with the Web. An example of this approach is provided by meebo

AJAX makes it practical to support live chat sessions and presence information within web pages
- Users can be identified via a cookie, or via a user name and password obtained using a secure connection (https)
- The AJAX-based protocol can in principle, use the same XML schemas as defined for Jabber (XMPP RFCs)
- The tricky bits are the security policies and mechanisms
  - Who gets to see when I am online?
  - Can bad-guys spam my chat room?
AJAX itself introduces some security considerations
- You are restricted to same domain as the page that loaded the script
- Referrer spoofing and a lot more (Amit Klein, September 2005)
  - hacks using tabs in places of spaces in HTTP request

Business Opportunities

Consumer facing meeting services
- Free and supported by ads
- Ads chosen to match context
  - from website and slide presentations
  - word spotting for text and voice chat
Enterprise facing meeting services
- These are charged as appropriate
- Hosted services for least effort
- Software licensing for local installation
- Third party consultancy support
  - Fostering an ecosystem for customization and support
  - Mashups with other Web-based services

Business Opportunities

Integration with related on-demand services
- Remote storage, archival and search services
- Documents and Spreadsheets
  - Precedents: Writely and NumSum
- Information-based business processes
Business to Business services
- Product support and training materials
- Inter-company meetings
Business to Consumer services
- Sales and support materials
Education and online learning services
- Teaching people remotely, e.g. for continuing education
- Browser-based model enables richer interactivity

Concluding Remarks

Ubiquitous Web
Speech Enabling Web Pages
Web Presentations
Web Meetings
Business Opportunities
Concluding Remarks

n.b. the handwriting font used in this presentation (TSCu_Comic.ttf) is available free under the Gnu GPL and was created by Thukaram Gopalrao.

Web of Applications

Google, Mountain View
1st February 2006

Outline of today's talk

The Ubiquitous Web

Ubiquitous Computing

Ubiquitous Web

Ubiquitous Web

W3C Ubiquitous Web Workshop

Current Work

IETF Widex Working Group

What's missing or can be improved upon?

Exposing Device Coordination to the Web

The DOM and Distributed Services

Options for adding speech capabilities

Latency

Using AJAX to add speech

HTTP for Speech Services

Application to ordering Pizza

SRGS + SISR → EMMA

Pizza Grammar

Pizza Grammar

Application to Ordering Pizza

Modeling Behavior

Logging

Remarks

Web Presentations

Incremental display of slide contents

Incremental display of layered images

Create outline lists with hidden content

Future Plans

Future Plans

Web Meetings

Web Chat and Presence

Business Opportunities

Business Opportunities

Concluding Remarks

Web of Applications

Google, Mountain View 1st February 2006

Outline of today's talk

The Ubiquitous Web

Ubiquitous Computing

Ubiquitous Web

Ubiquitous Web

W3C Ubiquitous Web Workshop

Current Work

IETF Widex Working Group

What's missing or can be improved upon?

Exposing Device Coordination to the Web

The DOM and Distributed Services

Options for adding speech capabilities

Latency

Using AJAX to add speech

HTTP for Speech Services

Application to ordering Pizza

SRGS + SISR → EMMA

Pizza Grammar

Pizza Grammar

Application to Ordering Pizza

Modeling Behavior

Logging

Remarks

Web Presentations

Incremental display of slide contents

Incremental display of layered images

Create outline lists with hidden content

Future Plans

Future Plans

Web Meetings

Web Chat and Presence

Business Opportunities

Business Opportunities

Concluding Remarks

Google, Mountain View
1st February 2006