Next Directions for Voice and the Web Breakout - Day 2

Meeting minutes

Presentation

kaz: agenda, review existing standards from AC meeting, interoperability issues, and possible workshop
… there are many existing standards

<kaz> slide 2

kaz: VoiceXML, SSML, CSS speech, WebSpeechAPI and Specification for Spoken Presentation

<kaz> slide 3

kaz: voice agents are getting popular -- accurate pronunciation, flexible speech styles, etc.
… need for improved voice agents

<kaz> slide 4

dd: concerns by voice interaction cg
… generic agents like Siri, Google, Alexa
… other systems as well
… not based on web standards primarily
… would like to get them to interoperability
… for example, for banking, retail, ...
… CG meeting next week exactly same time

slide 5

dd: a lot of parallels
… Web page vs IPA
… web pages have to deal with user interaction
… primarily using GUI
… IPA use interaction main based on voice and natural language
… arbitrary differences as well
… browser vs proprietary platforms

(IPA stands for Intelligent Personal Assistants)

dd: ecosystem of skills, actions or whatever
… have to find them through the platform

slide 6

dd: this is architecture of IPA generated by the CG
… not going into the details
… green box on the left is device
… the input device could be a microphone
… i the middle red box includes "dialogs"
… and blue box on the right includes "provider selection service"
… we have something component which perform the functions
… analogous with the browsers

slide 7

kaz: potential voice workshop
… try to solve potential pain points
… what is the best mechanism for discussion?
… feedback from the first breakout

<bkardell_> are the slides available so I can zoom in to see some of these things better than I could here?

existing standards, other related technologies, pain points, emotion, common sense database related to people's perception
… several participants said that emotion would be very interesting

https://github.com/w3c/strategy/issues/221

kaz: opinions?

Discussion

brian: zoom in on IPA architecture
… currently it's very underspecified how this is implemented
… in current implementations
… can you send SSML? Yes in some cases, but sometimes it doesn't work

https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-2.htm

<kaz> dd: welcome to the CG meeting as well

<bkardell_> ... if we could please share link the APA doc just referernced for review/minutes as well?

<kaz> Natural Language Interface Accessibility User Requirements

<IrfanA> https://www.w3.org/TR/spoken-html/

jason: points out two accessibility publications
… that are relevant to this discussion

<IrfanA> https://github.com/w3c/pronunciation/issues

<jasonjgw> https://www.w3.org/TR/naur/

irfan: please add issues to github

chris: as a content publisher, we've had to overcome a lot of proprietary content
… is there interest from device manufacturers?

kaz: would like to invite these vendors to the workshop

Jason: would be interested

avneesh: this is very important work, in the community group -- what would big players see as business benefits

dd: would be really interesting to look into
… have not done yet
… focus on our own short-term interest in gaps on interoperability so far
… but should look into it

<ddahl> philArcher: this is the 3rd TPAC in a row, have we yet reached a critical mass yet?

<ddahl> kaz: let's hold the workshop in the next 6 months or so

<ddahl> ...probably will be held remotely

<ddahl> lisa_seeman: how does this interact with people with cognitive disabilities

<ddahl> ...put some ideas in Content Usable note

<ddahl> ...how can this specification support people with voice disabilities

<stevelee> https://www.w3.org/TR/coga-usable/#voice-menus-user-story

<ddahl> ...this is full of potential and helps businesses with getting users who are struggling

<ddahl> ...that could be part of the business case

<ddahl> ... we also requested that audio descriptions have easier and more literal descriptions made request to APA

dd: really interesting
… might be difficult to have simple audio description
… but it would be considerable to use external services
… would be possible to use EMMA message
… useful technology
… natural language technology is a difficult technology
… but getting better and better

lisa: you could dialog with your users

dd: someone may need some additional treatment
… e.g., airline reservations, need many parameters

lisa: wondering if there would be possible to put in a facility

lisa: more directed dialog for people who need simplified dialog

lisa: how could you make a note for yourself?

<Zakim> kaz, you wanted to react to lisa_seeman

lisa: that would be good for people who have memory issues

kaz: maybe that could be integrated in architecture
… that could be discussed during the workshop

<mhakkinen_> https://w3c.github.io/pronunciation/gap-analysis_and_use-case/

mark: we've been raising the issue of getting better pronunciation for several years

<LisaSeemanKest_> thank you mark

mark: the education use case is that many users use computer read aloud. It might be good to bring in vendors from this community
… for example, Text Help
… we would be interested in the workshop

brian: wants to highlight that this and a lot conversations are in terms of voice assistants like Siri
… the use cases for TTS and STT are way broader than that
… my company makes products for embedded devices. There are many uses cases that aren't browsers or voice agents
… we should not limit this to conversational interfaces
… many devices can't support a full conversational interface

<cpn> it's not an either/or question

brian: will the workshop cover these things?
… the SSML has to make it all the way down to what's actually speaking
… not questioning the value of conversational interfaces, but would like to broaden discussion

kaz: we should talk about what's to be included

<Zakim> dwalka, you wanted to react to bkardell_

dirk: in Voice Interaction group we meant to include other modalities, like chatbots

<Zakim> mark, you wanted to react to dwalka

mark: let's consider emergency alerts, synthesized alerts also have problems with pronunication

<bkardell_> mark: do you have any link to the oasis stuff you mentioned

<mhakkinen_> http://docs.oasis-open.org/emergency/cap/v1.2/CAP-v1.2-os.html

mark: how can we improve this? did some earlier work

<Zakim> lisa, you wanted to react to mark

lisa: that would be a great use case. emergency communications have to be available to every single subgroup
… other use cases will be able to join that ecosystem at a lower cost

kaz: we might want to talk about not just voice but have a "Smart Agent" workshop

<Zakim> kaz, you wanted to react to mark

<kirkwood> +1 to ‘smart agent’ its clearer i think

<bkardell_> can we get a link to the iso standards mentioned?

tobias: working on DIN and OASIS standards. voice is very powerful, but the fastest way forward is to agree on minimal requirements
… and implement them

<kaz> OVON Open Voice Network

<kaz> Open Oasis RECITE Initiative

<kaz> Amazon Voice Interoperability Initiative

<phila_> I'm OK with Smart Agent workshop. My focus, unsurprisingly, is eCommerce and what's necessary for brand owners to help Smart Agents disambiguate products and retailers.

<kirkwood> ‘smart voice agent’

Wrap-up

kaz: would like to update the proposal with this feedback. Would like everyone to join the Program Committee
… please contact me

<kaz> slides

<kaz> github issue

<kaz> [adjourned]

– DRAFT –
Next Directions for Voice and the Web Breakout - Day 2

20 October 2021

Attendees

Meeting minutes

Presentation

Discussion

Wrap-up