Meeting minutes
Presentation
kaz: agenda, review existing standards from AC meeting, interoperability issues, and possible workshop
… there are many existing standards
<kaz> slide 2
kaz: VoiceXML, SSML, CSS speech, WebSpeechAPI and Specification for Spoken Presentation
<kaz> slide 3
kaz: voice agents are getting popular -- accurate pronunciation, flexible speech styles, etc.
… need for improved voice agents
<kaz> slide 4
dd: concerns by voice interaction cg
… generic agents like Siri, Google, Alexa
… other systems as well
… not based on web standards primarily
… would like to get them to interoperability
… for example, for banking, retail, ...
… CG meeting next week exactly same time
slide 5
dd: a lot of parallels
… Web page vs IPA
… web pages have to deal with user interaction
… primarily using GUI
… IPA use interaction main based on voice and natural language
… arbitrary differences as well
… browser vs proprietary platforms
(IPA stands for Intelligent Personal Assistants)
dd: ecosystem of skills, actions or whatever
… have to find them through the platform
slide 6
dd: this is architecture of IPA generated by the CG
… not going into the details
… green box on the left is device
… the input device could be a microphone
… i the middle red box includes "dialogs"
… and blue box on the right includes "provider selection service"
… we have something component which perform the functions
… analogous with the browsers
slide 7
kaz: potential voice workshop
… try to solve potential pain points
… what is the best mechanism for discussion?
… feedback from the first breakout
<bkardell_> are the slides available so I can zoom in to see some of these things better than I could here?
existing standards, other related technologies, pain points, emotion, common sense database related to people's perception
… several participants said that emotion would be very interesting
https://
kaz: opinions?
Discussion
brian: zoom in on IPA architecture
… currently it's very underspecified how this is implemented
… in current implementations
… can you send SSML? Yes in some cases, but sometimes it doesn't work
https://
<kaz> dd: welcome to the CG meeting as well
<bkardell_> ... if we could please share link the APA doc just referernced for review/minutes as well?
<kaz> Natural Language Interface Accessibility User Requirements
<IrfanA> https://
jason: points out two accessibility publications
… that are relevant to this discussion
<IrfanA> https://
<jasonjgw> https://
irfan: please add issues to github
chris: as a content publisher, we've had to overcome a lot of proprietary content
… is there interest from device manufacturers?
kaz: would like to invite these vendors to the workshop
Jason: would be interested
avneesh: this is very important work, in the community group -- what would big players see as business benefits
dd: would be really interesting to look into
… have not done yet
… focus on our own short-term interest in gaps on interoperability so far
… but should look into it
<ddahl> philArcher: this is the 3rd TPAC in a row, have we yet reached a critical mass yet?
<ddahl> kaz: let's hold the workshop in the next 6 months or so
<ddahl> ...probably will be held remotely
<ddahl> lisa_seeman: how does this interact with people with cognitive disabilities
<ddahl> ...put some ideas in Content Usable note
<ddahl> ...how can this specification support people with voice disabilities
<stevelee> https://
<ddahl> ...this is full of potential and helps businesses with getting users who are struggling
<ddahl> ...that could be part of the business case
<ddahl> ... we also requested that audio descriptions have easier and more literal descriptions made request to APA
dd: really interesting
… might be difficult to have simple audio description
… but it would be considerable to use external services
… would be possible to use EMMA message
… useful technology
… natural language technology is a difficult technology
… but getting better and better
lisa: you could dialog with your users
dd: someone may need some additional treatment
… e.g., airline reservations, need many parameters
lisa: wondering if there would be possible to put in a facility
lisa: more directed dialog for people who need simplified dialog
lisa: how could you make a note for yourself?
<Zakim> kaz, you wanted to react to lisa_seeman
lisa: that would be good for people who have memory issues
kaz: maybe that could be integrated in architecture
… that could be discussed during the workshop
<mhakkinen_> https://
mark: we've been raising the issue of getting better pronunciation for several years
<LisaSeemanKest_> thank you mark
mark: the education use case is that many users use computer read aloud. It might be good to bring in vendors from this community
… for example, Text Help
… we would be interested in the workshop
brian: wants to highlight that this and a lot conversations are in terms of voice assistants like Siri
… the use cases for TTS and STT are way broader than that
… my company makes products for embedded devices. There are many uses cases that aren't browsers or voice agents
… we should not limit this to conversational interfaces
… many devices can't support a full conversational interface
<cpn> it's not an either/or question
brian: will the workshop cover these things?
… the SSML has to make it all the way down to what's actually speaking
… not questioning the value of conversational interfaces, but would like to broaden discussion
kaz: we should talk about what's to be included
<Zakim> dwalka, you wanted to react to bkardell_
dirk: in Voice Interaction group we meant to include other modalities, like chatbots
<Zakim> mark, you wanted to react to dwalka
mark: let's consider emergency alerts, synthesized alerts also have problems with pronunication
<bkardell_> mark: do you have any link to the oasis stuff you mentioned
<mhakkinen_> http://
mark: how can we improve this? did some earlier work
<Zakim> lisa, you wanted to react to mark
lisa: that would be a great use case. emergency communications have to be available to every single subgroup
… other use cases will be able to join that ecosystem at a lower cost
kaz: we might want to talk about not just voice but have a "Smart Agent" workshop
<Zakim> kaz, you wanted to react to mark
<kirkwood> +1 to ‘smart agent’ its clearer i think
<bkardell_> can we get a link to the iso standards mentioned?
tobias: working on DIN and OASIS standards. voice is very powerful, but the fastest way forward is to agree on minimal requirements
… and implement them
<kaz> OVON Open Voice Network
<kaz> Open Oasis RECITE Initiative
<kaz> Amazon Voice Interoperability Initiative
<phila_> I'm OK with Smart Agent workshop. My focus, unsurprisingly, is eCommerce and what's necessary for brand owners to help Smart Agents disambiguate products and retailers.
<kirkwood> ‘smart voice agent’
Wrap-up
kaz: would like to update the proposal with this feedback. Would like everyone to join the Program Committee
… please contact me
<kaz> slides
<kaz> github issue
<kaz> <ashimura@w3.org>
<kaz> [adjourned]