Project Review: Voice on the Web -- 25 Feb 2010

Objective of this project review:

There is a perception that the Voice Browser Working Group (VBWG) only cares about call centers, and that its specifications are unrelated to and disconnected from others such as HTML. To address this, the group is making substantial efforts at communicating its mission (of which this is one such effort) and at enhancing its VoiceXML specification to simplify its use by those familiar with HTML. With phones becoming browsers, it is more crucial than ever that the expertise and products of the VBWG be leveraged to support the rich and varied applications we see increasingly on our personal mobile devices.

If you miss this presentation, you risk a) re-inventing the wheel in HTML, b) wasting the opportunity to tap the world's largest repository of expertise in phone-based apps, and c) losing the group that produced the most commercially successful specifications since HTML.

Don't miss out -- come and find out what we're doing!

Voice on the Web Presentation

Slides for the project review: http://www.w3.org/2010/Talks/0225-vow-project-review/VBWG_project_review.pdf

kaz: This is a project review for Voice on the Web.
... We're encouraging people to know voice technology more.
... And to promote the voice activity
... The presenter is Dan Burnett. He is from Voxeo, is an active participant in VBWG and MMI, and is Co-Editor-in-Chief for the VoiceXML 3 draft.

burn: Slide 2
... Why are we even having this project review? The VBWG believes that voice technology are heavily under-utilized today. Specifically in Web applications. We're not seeing voice used as much as we'd like.
... We think that's because Web developers aren't aware of it's potential. We think W3C is the right place to make sure that Web developers have and are aware of the appropriate technologies for the Web.
... Slide 3
... Why do we care?
... HTML provides the primary visual UI for the Web. VoiceXML was designed to provide an aural UI for the Web.
... In many cases a visual output is the best, but in a number of cases voice is the best input modality.
... Some examples:
... Commercial sites on the Web today, when purchasing something, you go through the catalog, find what you want, then go to a billing screen, where you can select your form of payment.
... I find it irritating that I can't pay the way I want to. e.g. Discover Card, not usable everywhere, but many sites don't take it.
... Voice tends to be good at allowing people to cut through menu trees.

<timbl_> Slide 3 omits to mention populations literate and in position of a $10 phone but who cannot afford a smart phone.

<timbl_> which is a huge population.

burn: Someone just beginning to look through a catalog might just be able to say "I'd like to use a Discover Card" right up front. If properly programmed you can cut right to it.
... Voice search is another example.

<Ralph> [I see no reason why a form couldn't have a non-verbal way to enter 'I want to use my MasterCard'. The mode of entry can be independent of the content of the entry.]

burn: Either the information that needs to be typed is just plain long, and you don't want to type it, or you want to refine a search in a particular way and it's just convenient to say it.

[[Ralph, because that takes up 'space' in the UI. There's no concept of that in voice UIs]]

burn: @@ sweater example @@
... Over the last few years, the VBWG has learned a lot about how voice is useful where your eyes or hands are busy.

<steph> +1 to tim

burn: e.g. driving a car, where you eyes and hands are both busy.
... Many states are prohibiting use of hands, if not eyes as well.
... By addressing that use case you are also, if done properly, also addressing the needs of a community of people who cannot see or use their hands.

<shepazu> [note that places are also looking at banning voice calls as well... my town, for example]

burn: Many of the use cases are the same as the accessibility use cases.
... Another use case is illiterate populations. The visual Web demands a high literacy rate.
... I would not be surprised if we don't see changes in Africa where populations are coming online who previously weren't who cannot read?

<SteveB> People with low literacy are very important to the Web Foundation's work

burn: In the Coliseum example it'd be much easier to say that than type it.

timbl_: There's also a huge population of people who are literate but cannot afford a phone with a large screen.

<SteveB> +1 to Tim, inexpensive phones are another target

burn: Yes, I'm going to talk in a bit about our historic motivators for voice. Our new motivators include exactly what you say.
... Voice has been a primary medium for human communications for a very long time.
... Slide 4

<timbl_> (eg Rwanda has 23% cellphone penetration but only 1% internet)

[[ Clarification questions please ask now, otherwise let's bring them up at the end ]]

burn: With respect to the goals of the VBWG: we are particularly interested in making sure that voice technologies are available on the Web.
... That may seem obvious, but when you hear our history, and how we are perceived, it's not necessarily obvious.
... One of the things that's important for this group in particular to understand is that there is a group filled with voice experts.
... So, when I talk about this, remember that we have a lot more experience than just call centers.
... Slide 5
... For more than ten years now we've been building voice specifications.
... The group was initially driven very heavily by the call center industry.
... For those not aware, there are massive rooms dedicated to fulfilling telephone support requests, etc. There was a need for automation there.
... The most common case is just call routing, figuring out where the call belongs.
... When we started, there was a dramatic lack of open standards.
... The call center space was populated with traditional telephony companies and companies that had proprietary telephony equipment companies.
... You had to buy a large piece of hardware that would take these calls, forward on the rest, etc.
... All protocols were proprietary, lots of lock-in, very expensive.
... This was all hampering the adoption of voice technology.
... We've seen a lot of changes, an opening up of the telephony world.
... There were Java APIs, MS's APIs, but they were definitely not industry standards.
... In our community we needed standards that could operate on a large scale.

burn: It was extremely important that the implementation of the technology could be efficient.
... The voice technologies themselves were expensive in terms of time, and in terms of money. That's changing.
... Very time expensive, computing resources.
... There was a need for this stuff to happen in the network.

<timbl_> "in the network"

burn: Voice technologies needed computation to occur in the network in order for accuracy to be good enough to be usable, for response times to be acceptable.

<timbl_> In the internet model, nothing happens in the network.

burn: At the time, no one was thinking about Web architecture as the model for voice.
... It was a great selling point at the time to say that your Web presence infrastructure could be reused for voice. We take it for granted now, but people who are thinking of adding voice to specifications today, may not realize how you build voice technology into the Web infrastructure.
... Slide 6
... We've largely solved the problems we set out to do. We've got trillions of voice Web pages used every year.
... This is driving the need for the group today to move to mobile devices.
... Mobile devices are now just mini computers. They're quite rich feature-wise, they're multimodal in the general sense.
... We're interested in growing the language in ways that simplify developing voice technologies on the Web, while also giving them greater flexibility at the same time.
... Slide 7
... This slide is important because people need to understand that we're not just driven by call centers.
... As you have questions about Voice on the Web, or about standards (e.g. MRCP at the IETF), and network constraints, we know this, we're familiar with this.
... Anyone who looks at voice technologies or any standards out there today, looks and says "We can do this without the standard, it's really simple".
... We can tell you from experience that sure, you can simplify, but once you get beyond demos, it takes a lot of work.
... We're ready to share the address that experience.

burn: Another thing we understand is imprecision in technology. Think about geolocation, whether it's cell-tower or GPS, there is an error factor here.
... Voice technologies are similar, they're not exact. The results typically have confidence scores associated with them. We, as a group, have a lot of experience dealing with that. Commercial productive experience.
... The medium itself is interesting. In speech, <um> you can <ah> say many things, without saying things.
... The medium of speech can be imprecise and ambiguous. We have a lot of experience here.
... We have a lot of experience in how to direct users when necessary, and how to take what we are given and make the best of it.
... Slide 8
... What are we after here?
... We want to get the message out. Voice is not just about call centers.
... It's not just important to the Web today, but a crucial part to the future Web.

<ddahl> even mouse clicks can be imprecise, especially if the user doesn't have good control of their hands. some of the considerations that you have to take into account in voice are actually more generally useful.

<Ralph> +1 to Debbie's observation!

burn: The technology is not simple, but it's also not too complex.
... "We can do something simple" or "It's too complex and imprecise and we can't use it" -- the success of the VBWG, and the companies within it is a sign that there is a middle ground.
... We're interested in making sure that various specifications work well together.
... There are a variety of specifications at W3C that we want to make sure play well together.
... That doesn't necessarily mean VoiceXML, SCXML, CCXML, etc -- though we are working on that -- but we are also willing to discuss other alternatives, other APIs that are more appropriate, e.g. a JavaScript API for use in HTML if that's appropriate.

<kaz> [ I think we could use voice input as a supplemental modality for touchpad interface without mouse over event, e.g., by saying "this!" ]

burn: Ultimately we want to make sure that voice technology is able to be integrated with other technologies at W3C.
... We're also trying to improve the flexibility of VoiceXML. Make it easier to customize the user interface.
... We're generalizing right now the way in which the UI is built.
... e.g. creating Voice UI templates.
... Slide 9
... There are some things that we need from you, each of you individually and as a group together.
... We want to see voice technology used by W3C itself.
... I'd be interested in brainstorming on how voice technologies can be used today.
... What if there were a phone number that someone could call for w3c? What information would be useful today?

<steph> +1 to dan

overheard: [[ read me the HTML5 spec ]]

burn: I'd like people on this call to use the technologies themselves.

<dom> [would be interesting to explore voice usage for some of our internal tools]

burn: There are three companies listed in this presentation. All three have free web portals with tutorials.
... I think if you start trying it out, you would quickly come up with new ideas.
... We also need your help getting other groups to work with us.
... Particularly HTML 5.

<kaz> [ I'd suggest we include these tutorials sites on the VB public page as well. ]

burn: We'd like to get some advice on how to get together with these people and work together.

<marie> [+1 to the LT suggestion]

<steph> great talk !

burn: @@ last point about mobile devices @@

<Ralph> Thanks, Dan

Q&A

timbl_: We discussed this when wandering around Africa. VoiceXML and HTML should be brought together. But HTML and VoiceXML works entirely differently.
... The visual browser right now in HTML is on the client. The VoiceXML is now at a call center, typically no internet involved at all. No one sees the links.
... If you imagine a situation where there is public VoiceXML on the Web.
... Say someone at CDC has put up a voice dialog about AIDS for example. Then someone in MA puts up a voice dialog for say someone who is sick.
... The system may want to redirect them to the CDC center for AIDS research.
... That's the HTML model.
... That hasn't happened before, because simply you have to have a client be a voice browser.
... That's becoming possible as phones are becoming powerful.

<Ralph> [Tim is referring to the verbal analog of a hyperlink; i.e. when the phone operator says "for that question, let me transfer you to ..."]

timbl_: You can't talk about VoiceXML being used in a Web like way unless you explain how it works.
... Will I go to a Web site and it'll have a voice browser enabled? Will it work like a mashup? Do I call the browser or is it on my computer?
... So what is the model? Has anyone figured it out?

burn: The voice browser as we called it, has traditionally not lived on the device. The browser is living in the network, the device is just an access point.
... The cost associated with this is the cost of speech synthesis, asr, or even DTMF detection.

<kaz> [ speech mashup example: http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded ]

burn: I don't know of anyone who has tried to create a public Web.
... I don't have a great answer.

<marie> ['voice links']

burn: Some of the future of voice could easily live on the device.
... Others on the call who have things to say about that, please feel free to jump in.

ddahl: There are the two models, one is the calling the phone number and browsing around in the voice only world.

burn: When you have an actual mashup it is possible because of the IP transport of voice to go ahead and click to begin a conversation right there.
... I don't know if there is a standard right now for a click on a link to start you talking. There isn't a particular standard for how that transition occurs. Right now it occurs in parallel and then you maybe hang up the phone.

timbl_: If you write a VoiceXML document, are there entry points you can give.

<ddahl> the other model is more multimodal where you talk to a visual browser, but you need the voice-only model for the use cases where people don't have a display on their phone

timbl_: So if you can do that, you can link out of a document.

burn: Right, you can make links.

timbl_: And if it linked to a picture? There's no extra architecture added.
... The picture could link back to another piece of voice.

burn: Most companies that implement VoiceXML that has an <audio> tag also allow video. They can play that video and still be taking speech input.
... Yes, if the browser can do both, the browser could recognize an image, etc.
... But that's not part of the VoiceXML standard, the browser would have to know what to do.

<Zakim> steph, you wanted to discuss the need for public voice browsing services, and advocacy on that and to ask how to make the message out, where there is almost no example and to

steph: This is the case today. I have a portal, and I have a set of bookmarks that I put, and then I can select the one I want.

<SteveB> Steph wrote a portal

<SteveB> can access it by browser or voice

steph: You can get a number, have bookmarks to whatever sites you have, etc.

<SteveB> both

timbl_: These are VoiceXML bookmarks.

steph: I think we are mixing both voice and visual things.
... I was unable to find people putting VoiceXML on the Web.
... There was no way except for a geeky guy to browse the voice Web.
... In my world, I imagine you can find a specific number that you'd call, your own private VoiceXML home page in which you can search to other pages that you bookmark.

timbl_: Imagine Dom making his CSS helper mobile app into VoiceXML.
... What would it take to make a VoiceXML interface for it?

<dom> [TimBL is referring to the cheat sheet http://www.w3.org/2009/cheatsheet/]

burn: About voice, the words you might use for someone who is reading is going to be different than someone who is speaking.

<Ralph> [ah, perfect example; can I put a link in my HTML document to message #4 in my mobile phone provider's voicemail system? Does the VBWG believe I should be able to record a voicemail message on Tim's mobile carrier's voicemail with a link to a message in my mobile carrier's voicemail?]

timbl_: Dom's cheatsheet has a list of things like CSS properties, etc. If you called 1800w3ccss, you could find references to CSS.

burn: Even say the team contact list, or the list of specifications.

<dom> [I'm sure having a phone number that would answer the questions about W3C spamming people would be useful :) ]

<SteveB> that is still a pretty standards call center application

burn: "Thanks for calling W3C, what are you interested in learning about? Specification work? Our people? Or our mission?"

steph: I don't understand why it should be a phone number.

<ddahl> I wonder if you could build a voice-enabled document retrieval system for specs based on the text of all the specs

<SteveB> would like to see something more open, linking to other sites,

steph: For me, I don't think w3c should have to run it's own voice browser, it can just make it's VoiceXML application.

<marie> [yes, why a phone # only?]

steph: I think it's a major flaw if everyone who wants to use a voice application needs their own voice browser.

burn: You could run your own voice browser. Before voice was being commonly sent over VoIP, it was common if you wanted to build something like a voice browser, whether it was VoiceXML or something else.
... In those days you'd buy analog cards put them in a PC and hook them to phone lines.
... Now with VoIP we can start to provide a software only voice browser.

burn: You can download our software and you can set up a sip connection to it.
... So the user on the other end then needs either a SIP client or a SIP/POTS gateway.
... Or you can have a gateway that runs the voice browser. I'm hoping that becomes so common that it becomes ??

<Zakim> Ralph, you wanted to respond to Steph's "W3C shouldn't have to run its own voice system"

Ralph: I wanted to comment on Steph's "Why should W3C have to run it's own service?" -- this the dichotomy of running a Web server, vs Web hosting.

burn: Not everyone needs to do that, one of the reasons that my company provides the software in different ways.
... You own the visual browser, it may cost you, it may be free.
... My company has something free up to a point.
... A difference is that it's not the client that does it, it's not the person viewing the information that has to get the browser, it's the one who is providing the information who needs the browser.
... SIP may be changing that.

timbl_: What stops me from taking your software and connecting to it via a microphone?

burn: Nothing, you can do that today.

timbl_: How much of my machine will it take today?

burn: The most compute intensive part is the speech recognition. That's the biggest reason why most people will host it somewhere.

<Ralph> [I think I agree that sip: is the connector for One Web that contains voice as an equal media type]

timbl_: There's a vast amount of computing in people's laptops.
... My macbook pro today, it'll take one of the CPUs? That means in 3 years time it might only take one of the 16 CPUs.
... The CPU argument seems shortsighted.

burn: I agree strongly. Compute power needs are not something that are going to matter in the future.

timbl_: You also said it'd be too expensive.
... That sounds like a bug in the pricing model.
... Suppose I want to equip my Firefox to be VoiceXML enabled. I put a plugin in, and it lets me browse through VoiceXML that I find on the Web. That's the sort of thing that I imagine that Google would fold in to Chrome very quickly.
... Don't you have to compete with a 99 cent iPhone app?

burn: The costs are definitely coming down. It doesn't take much of your machine to run the software directly.

<kaz> [ Opera (until 8?) used support voice... ]

burn: My company provides a low cost ASR that is just included in the software we provide.
... The reason I talked about costs is just about how visual Web browsers do more today than they did ten years ago. Ten years from now voice browsers will also be more sophisticated.
... For basic needs, I think you'll have basic recognition running on the phone right now, but there will likely be something more advanced coming that won't run.
... One of the things Voxeo has been doing, we've been taking our hosted infrastructure and made it available via JavaScript.
... Those applications today run on the hosted network because they need access to telephony and speech resources, but shortly they'll just be remote calls, and they'll only access our remote network in the cloud as needed to do these other technologies.
... We see the industry as a whole going that direction.

timbl_: Is there OSS ASR at all?

burn: Yes, I haven't tried it.

<steph> look at http://www.w3.org/2008/MW4D/wiki/Tools

<steph> voice section

<Zakim> ddahl, you wanted to say that the problem isn't so much how to get the voice browser per se, but how to get the speech recognized, which is part of the browser, but they can be

ddahl: There's a difference between the ASR itself and the service.
... Something like Sphinx from CMU.

<Ralph> links to open source recognziers

ddahl: And there are starting to be more open speech services available. voice apps tutorials: the AT&T Speech Mashup

<kaz> [ speech mashup example (speak4it): http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded ]

<Ralph> CMU Sphinx

ddahl: There's the experimental WHAMI from MIT.

burn: We don't offer a direct speech service. Could do that in the future. The primary platform for that right now is MRCP.

burn: Just like you use HTTP for Web pages, we have MRCP for controlling speech resources.
... In our product right now, if you had an open source recognizer running on another machine that supports MRCP ??
... Instead of pointing at the MRCP server that's installed for our software now, you could just change it to use your own MRCP server.

<kaz> open source ASR developed by Japanese researchers

burn: You can install open source software on any machine you want and do the same pointing. A lot of the Voice browsers work that way today. That's what MRCP support means today.

timbl_: Voice is one of those things, like CSS is one of those things we pushed out there because we thought it was good.
... Multimodal too.
... What standards, what glue do we need to produce?

<kaz> [ I think MMI is one of the possible glues ]

timbl_: If you imagine local voice capabilities on your machine, that will get out there sometime. The first person to fold it in to the visual browser would launch a de facto standard.
... Get out there ahead of it with standardization is a good idea.
... We should get people on the W3C staff using it. Playing with it.
... The sort of thing that DanC did with tel:.

[[ at one time there was a proposed application type for 'universal voice browsing' in SIP at the IETF ]]

timbl_: Someone put some VoiceXML on the Web right now. Stick it in the public Web space. Put together an advanced adoption group who are playing with this stuff with the assumption everyone has this technology on their machines.

ddahl: One thing we have to be careful about are the use cases of small devices, that require remote speech recognition.
... It seems like we don't want to rely 100% on the native platform. We want the flexibility to use remote or local resources.

timbl_: The beginning of the Web started with people telneting to www.cern.ch and got a Web browser.
... That's now totally history. I'm glad we didn't design it for that.
... If we'd stuck with that then... well, there would be lots of dialup systems that wouldn't be open Web. Cern let you follow links anywhere.
... We need to push this. Link together voice --

<SteveB> Stephane and Steveb need to leave a couple minutes before bottom of hour

timbl_: In my mind, if I write a VoiceXML file that has a bunch of information about me, my SSN, etc and then I follow a link to elsewhere. Does VoiceXML define how state is transferred?

<SteveB> would be great to share what the Web Foundation is thinking about doing in the field

burn: In general, no. That information is generally stored server side.

timbl_: It'd make up a huge horrible URL typically.

<kaz> [ and SCXML as well ]

<Ralph> Matt: VoiceXML will let you build deep links into an application

<Ralph> ... e.g. you could link directly to AIDS information in a voice site

burn: There are two different programming models that we see today.
... Pages generated by the server, using jsp to generate VoiceXML.
... So your context could be maintained on the server, so you just generate the user interface, and then the information is maintained by the server.
... Then the other model of doing everything locally on the client. A data tag that you can use to send information to a server somewhere.
... Typically not maintained in the UI piece. There's not a general mechanism for transferring from one part of a UI to another.
... There's not a way to share that state without going through the server.
... People are used to that via HTML. The model is very similar.

<marie> [/me would welcome best practices]

<Ralph> [what's the REST analog for linking between voice apps? Does it even make sense to believe that RESTful transfer applies to the voice use cases?]

steph: It's good to look at the future. One of the major use cases we see at the W3F for VoiceXML is to deliver content and information in developing countries.]
... How can content provider's provide information in VoiceXML.
... How can they be a host of the Voice browser? Each content provider cannot afford managing twenty or a hundred lines on their own.

<kaz> [ Ralph, MMI WG think RESTful connection should be also useful for voice connection ]

steph: That's a level of investment that is ??
... How to build a kind of framework where a content provider just provides information in VoiceXML.

<Ralph> [Kaz, that's good to know; thanks. any pointers to discussion?]

steph: So that operators can provide a connection between the POTS and the voice application.
... I also want to ask about African languages in TTS and ASR.

steph: How to develop low cost ASR/TTS?
... And what about the usability of DTMF vs directed dialog vs natural language when you talk about people who do not have Web or ?? experience and is illiterate?
... I think the voice group may be interested in all of those topics.

burn: There are commercial drivers for low cost ASR and TTS. A lot of data needs to be collected to get that data costs a lot.
... Any time there's a substantial effort or time to produce something, you'll occasionally find people to do it for free, but more often get it for a fee.
... I'd like to see low cost ASR and TTS show up for the long tail languages. Not the most spoken or most commercial viable languages.
... VBWG would be very interested in talking about it. You can join us on a call sometime and we can talk about it.

[[ particularly where SSML and SRGS isn't sufficient for the African languages ]]

steph: [[ and on illiteracy? ]]

<kaz> [ that's one of the reason of our motivations for the Conv Apps workshop :) ]

burn: We don't particularly talk about this in our group. Typically we have knowledge of the interfaces we need to build and bring that to the group. But there are Voice UI experts who are very familiar with the best way to create UIs.
... Happy to talk to you about it, but it hasn't been part of our core for standardization.

ddahl: One of the things the voice community has found is that there's a big difference in voice user interfaces in more first world type countries. That would be of interest to you to look at.

<kaz> s/one of the reason of/one of/

<Ralph> [just ran into http://www.voice-push.com/voice-push/voice-push.php -- seems to have some useful resources there. Are the author(s) of that site connected with the WG? ]

steph: There are now quite a few different teams who are focusing on usability for the illiterate.
... Federating those groups for usability guidelines, would that be of interest to this group or not?

<SteveB> I'll need to drop off ... see my agenda item re: multi-modal and linked apps, and also was going to make point that we plan to train people in developing countries to help them build voice/html apps -- hope it is easy to do this :)

<SteveB> thanks a lot to everyone... great project review!

<burn> thanks steveb

steph: There are research groups around the world coming up with usability guidelines. Would this be relevant to the VBWG?

ddahl: I think it's relevant , but the Voice UI would have to be looked at as a distinct thing as well.
... The voice UI community experience has been that visual UI experience does not translate well to voice.

<Ralph> Matt: usability guidelines could inform the spec as well

<steph> if people are interested http://www.w3.org/2008/MW4D/wiki/Stories

<Ralph> ... may find some things, e.g. new UI paradigms, that we don't currently support

<steph> voice section has some papers on this topic

burn: If there's a place for us to participate in usability guidelines, even in visual Web, happening within W3C.

[[ Mobile Web Best Practices -- not particular UI, but influences design ]]

<ddahl> actually, i meant that there are even cultural differences in Voice UI in the developed world, so that suggests you would find other differences if you looked at a broader range of cultures. it would be worth looking at

burn: Happy to help participate in that if it exists.

Ralph: steph, can you share your voice portal?

steph: I can write that up in a mail.

<steph> ACTION: steph to send info on his portal [recorded in http://www.w3.org/2010/02/25-vow-minutes.html#action01]

kaz: Are the participants here interested in more?

burn: I think the next step is important, do what Tim said, go out and try it, play with it. Give it a try.
... I think it would be good to have some kind of follow on.
... More brainstorming after people have tried it.

[[ I like Tim's idea of an advanced group looking at some of the issues ]]

burn: I'm not sure what the right way forward is, get some team people together to brainstorm? what would be most effective?

<kaz> e.g.

<kaz> * http://studio.tellme.com

<kaz> * http://cafe.bevocal.com

<kaz> * http://evolution.voxeo.com

Ralph: I put an agenda item in for more tutorial on how to write a voice app.
... That would give us a ideas about other forum to start talking.

<plh> I'll be interested in an hands-on approach

<kaz> [ actually, matt is generating some simple voice app now ]

matt: I'd be happy to help, set up a forum for us to try it out.

timbl_: I'd be happy to try to install a browser for 15 minutes. If it takes longer than that, it needs to be faster :)

burn: You can download a voice browser from Voxeo. Or you can look at one of the other three sites for hosting a browser.

burn: That's where the place is to try it without installing anything.

Ralph: This was very useful, thank you Dan!

<Ralph> Here's what I wanted to say under agendum 2:

<Ralph> What's the technology layering envisioned by the VBWG?

<Ralph> high modularity allows reuse of:

<Ralph> . natural language interfaces

<Ralph> . techniques for dealing with imprecise input (Debbie's mouse-motor-skill observation)

<Ralph> . location context

burn: I'd still like to hear the next steps.

<Ralph> not just in voice mode, though voice may be driving several of these layers

burn: and please, contact our support, they're very good.

timbl_: Medium term, for future of VoiceXML, I'd like to push the future of multimedia. Think beyond call center and integrating VoiceXML from HTML.
... What happens when Firefox includes a voice browser?

<ddahl> yes, natural language can also include natural language processing of typed input. EMMA is designed so that it can accommodate voice, typed, or even handwritten input

timbl_: You could write an add on that signs up for VoiceXML and starts browsing when it finds it.

<Ralph> CSAIL Spoken Language Systems research group

timbl_: Then browser UI things, like watching a movie and saying 'turn down the volume'.

kaz: Do we want to have another call on voice applications? Next month sometime?

<Ralph> [[one of the things that has been on Ralph.Someday for years is to integrate [-> http://groups.csail.mit.edu/sls/technologies/galaxy.shtml Galaxy ] into Zakim ]]

<Ralph> [kaz, re: another talk on applications -- yes, I'd be interested]

<kaz> [adjourned]

Project Review: Voice on the Web
25 Feb 2010

Attendees

Contents

Objective of this project review:

Voice on the Web Presentation

Q&A

Summary of Action Items

Project Review: Voice on the Web 25 Feb 2010

Attendees

Contents

Objective of this project review:

Voice on the Web Presentation

Q&A

Summary of Action Items

Project Review: Voice on the Web
25 Feb 2010