IRC log of voice on 2019-09-18
Timestamps are in UTC.
- 01:47:11 [RRSAgent]
- RRSAgent has joined #voice
- 01:47:11 [RRSAgent]
- logging to https://www.w3.org/2019/09/18-voice-irc
- 01:47:14 [wseltzer]
- rrsagent, make logs public
- 04:42:49 [RRSAgent]
- RRSAgent has joined #voice
- 04:42:49 [RRSAgent]
- logging to https://www.w3.org/2019/09/18-voice-irc
- 05:01:59 [stevelee]
- stevelee has joined #voice
- 05:19:55 [phila]
- phila has joined #voice
- 05:23:45 [phila]
- phila has changed the topic to: Intro slide deck for TPAC Voice session https://docs.google.com/presentation/d/1HWaE_u9084sDdHJShcKANQcPn6PUV7I5ss8fJpLz4_Y/edit#
- 05:27:10 [takeru]
- takeru has joined #voice
- 05:31:54 [cpn]
- cpn has joined #voice
- 05:32:24 [hyojin]
- hyojin has joined #voice
- 05:32:47 [mhakkinen]
- mhakkinen has joined #voice
- 05:33:00 [cpn]
- meeting: Voice assistants: opportunities for standardisation
- 05:33:10 [Irfan]
- Irfan has joined #voice
- 05:33:13 [Irfan]
- present+
- 05:33:26 [tink]
- tink has joined #voice
- 05:33:39 [tink]
- present+ Léonie (tink)
- 05:33:46 [cpn]
- present+ Chris_Needham
- 05:34:13 [Zakim]
- Zakim has joined #voice
- 05:34:26 [carlosil]
- carlosil has joined #voice
- 05:34:54 [Irfan]
- rrsagent, make minutes
- 05:34:54 [RRSAgent]
- I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html Irfan
- 05:35:02 [cpn]
- scribenick: cpn
- 05:35:06 [minobu]
- minobu has joined #voice
- 05:35:10 [scheib]
- scheib has joined #voice
- 05:35:17 [cpn]
- [introductions from Phil, Leonie, Marco]
- 05:35:20 [mhakkinen]
- present+ mhakkinen
- 05:35:34 [scheib]
- present+ scheib
- 05:35:40 [cpn]
- Phil: A11y is a use case, other applications in healthcare, driving, etc
- 05:35:53 [dsr]
- dsr has joined #voice
- 05:36:02 [dsr]
- present+
- 05:36:05 [cpn]
- ... I know this is an important area, want to find out what we could do
- 05:36:06 [meredith]
- meredith has joined #voice
- 05:36:23 [mori]
- mori has joined #voice
- 05:36:24 [cpn]
- ... There are 5 different CGs on voice
- 05:36:36 [cpn]
- ... some addressing the same thing, mostly inactive
- 05:36:42 [cpn]
- ... voice interaction with the web isn't new
- 05:36:51 [cpn]
- ... also voice output is important, eg, for BBC
- 05:37:09 [cpn]
- ... none of this gives a clear direction on where we want to go
- 05:37:28 [cpn]
- ... [block diagram]
- 05:37:51 [cpn]
- ... [demo video from MIT]
- 05:38:10 [cpn]
- ... Open Voice Network
- 05:40:49 [cpn]
- ... it's a rare example of a voice assistant with a male voice
- 05:41:23 [cpn]
- ... add to shopping list important for retailers
- 05:41:52 [cpn]
- ... Intel and Cap Gemini (sp?) also involved in this
- 05:42:49 [cpn]
- ... APIs are needed, for intents and slots, training data (privacy implications), history of conversation context
- 05:43:13 [cpn]
- ... SSML, avoid writing code for each individual platform
- 05:43:20 [cpn]
- ... where is the common interest?
- 05:43:34 [cpn]
- ... what level of interest is there, and where to continue the conversation?
- 05:43:54 [dsr]
- 1998 W3C workshop on voice browsers
- 05:43:56 [cpn]
- ... what are your motivations, pain points, etc?
- 05:44:06 [dsr]
- https://www.w3.org/Voice/1998/Workshop/
- 05:44:12 [cpn]
- Topic: Previous W3C work
- 05:44:37 [cpn]
- Dave: Workshop in 1998 led to specs such as speech synthesis, speech recognition, SSML
- 05:44:49 [cpn]
- ... Describing the dialog you have with a voice assistant is complex
- 05:45:00 [cpn]
- ... Wanted to separate that from the synthesis and recognition
- 05:45:03 [cpn]
- ... Work done on APIs
- 05:45:05 [dsr]
- https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#targetText=Check%20the%20Browser%20compatibility%20table,SpeechRecognition%20(Asynchronous%20Speech%20Recognition.)
- 05:45:16 [cpn]
- Dave: This is the MDN page for the Web Speech API
- 05:45:41 [cpn]
- ... Browsers support synthesis, but few support recognition
- 05:45:53 [cpn]
- ... Then there's the relationship between voice interaction and chatbots
- 05:46:13 [cpn]
- ... Voice recognition has improved, so we now have good quality speech rec, so the problem is now text interaction
- 05:46:17 [dsr]
- https://developer.amazon.com/docs/custom-skills/create-the-interaction-model-for-your-skill.html
- 05:46:31 [cpn]
- Dave: This is the Amazon developer page for creating Alexa skills
- 05:46:49 [cpn]
- ... There's a declarative way to define intents and slots
- 05:46:53 [dsr]
- https://github.com/w3c/strategy/issues/134
- 05:47:22 [cpn]
- Dave: There's a range of conversation markup languages available, AIML, BOTML
- 05:47:31 [cpn]
- ... What are their strengths and weaknesses of these?
- 05:47:37 [cpn]
- ... What's the business value?
- 05:47:47 [cpn]
- ... Improve customer service using chatbots
- 05:48:01 [cpn]
- ... Includes not being annoying, where the agent on websites often gets in the way
- 05:48:12 [igarashi]
- igarashi has joined #voice
- 05:48:18 [cpn]
- ... We could have a CG, organise a W3C workshop
- 05:48:25 [cpn]
- ... Can we get the commercial companies interested?
- 05:49:14 [scheib]
- https://github.com/slightlyoff/declarative_web_actions mentions by Aaron G.
- 05:49:27 [cpn]
- Aaron_Gustavson: Declarative web, Web Actions, a generalised approach to interactions, also
- 05:49:51 [cpn]
- ... Declarative Web Actions is a way in the Web App Manifest to declare interactions with assistants such as Cortana, Siri, etc
- 05:50:06 [cpn]
- ... A way to tie into the operating system
- 05:50:22 [cpn]
- ... With Cortana, had a similar thing
- 05:50:34 [cpn]
- ... Placeholders for keywords with alternate phrasing
- 05:50:55 [cpn]
- ... It uses slots, similar architecture, intents were used and key phrasings for triggering
- 05:51:02 [cpn]
- ... Talk to Alex Russell
- 05:51:23 [cpn]
- Dave: A company could create an agent, or to allow third parties to plug in, which is more scalable
- 05:52:01 [cpn]
- Vincent: Working on Chrome and Google Assistant
- 05:52:21 [cpn]
- ... The market is changing rapidly, so it's a challenging time to do standardisation work
- 05:52:22 [aarongu]
- aarongu has joined #voice
- 05:52:30 [cpn]
- ... What we'll have in a few years might be quite different
- 05:52:59 [cpn]
- ... Architecture challenging because of changing technology, and businesses in this space are moving fast and differentiating themselves
- 05:53:12 [aarongu]
- Declarative Web Actions: https://github.com/slightlyoff/declarative_web_actions
- 05:53:12 [cpn]
- ... SSML has been adopted and extended by Amazon and Google
- 05:53:36 [cpn]
- Vincent: I was advocating use of the standardised parts of SSML, enables ingest of content from third parties
- 05:53:52 [cpn]
- ... Things can move faster by not using standards
- 05:54:15 [cpn]
- ... With the appropriate parties engaged, we'll find people receptive to add enhancements to SSML
- 05:54:17 [aarongu]
- Cortana’s Voice Command Definition (for reference) https://docs.microsoft.com/en-us/uwp/schemas/voicecommands/voice-command-elements-and-attributes-1-2
- 05:54:37 [cpn]
- Vincent: Another foundational technology are speech recognition and speech generation
- 05:54:39 [JohnRiv]
- JohnRiv has joined #voice
- 05:54:50 [cpn]
- ... Compapies don't need standardisation, they're moving fast
- 05:55:17 [cpn]
- ... How are users using agents? Many ways. Embedded agents in web pages, I don't see large usage
- 05:55:32 [cpn]
- ... Instead, appliance scenarios as input modality to the computer as a whole
- 05:55:45 [cpn]
- ... Using the assistant at the mobile OS level
- 05:56:00 [cpn]
- ... Smart speakers
- 05:56:10 [cpn]
- ... On laptops and desktops, there's less usage
- 05:56:51 [cpn]
- ... The best thing we can do at W3C is make web content as navigable and actionable as possible by OS level agents
- 05:56:58 [cpn]
- ... And build aspects of those agents into the browser
- 05:57:06 [kaz]
- kaz has joined #voice
- 05:57:13 [cpn]
- ... Alexa and Siri attempt to get structured data from the web, e.g., schema.org
- 05:57:22 [kaz]
- present+ Kaz_Ashimura
- 05:57:30 [cpn]
- ... These queries work the best: fact or structureal based queries give good responses
- 05:57:41 [cpn]
- ... Navigating a website with unique offerings isn't handled very well
- 05:58:09 [cpn]
- ... Having a page that responds to certain actions such as Ctrl+S for save, and having an associated voice action, has value
- 05:58:15 [kaz]
- q+
- 05:59:04 [cpn]
- Leonie: Is there room for new features in SSML. Such as effects, like "whisper", a quick way to produce specific patterns
- 06:00:20 [cpn]
- Vincent: Yes. Reprocity and adjacent attributes. It's complex, there's motivation to improve speech generation, this is so new it's hard to standardise
- 06:00:39 [cpn]
- Leonie: Google are restarting work on Web Speech API, is there interest in formalising that more?
- 06:00:43 [dsr]
- Amazon’s extensions to SSML: https://developer.amazon.com/blogs/alexa/post/5c631c3c-0d35-483f-b226-83dd98def117/new-ssml-features-give-alexa-a-wider-range-of-natural-expression
- 06:00:51 [cpn]
- Vincent: Don't know
- 06:01:20 [cpn]
- Dave: Interest from Amazon in extending SSML at W3C
- 06:01:37 [cpn]
- Vincent: Google would also be interested, but other things we're doing are out of scope
- 06:01:55 [cpn]
- Brian: Not everything there is currently supported in browsers
- 06:02:12 [cpn]
- Vincent: Would be good to have an artifact that describes state of SSML support
- 06:02:39 [cpn]
- Marko: Pronunciation TF from APA WG. Coming from education, consuming text to speech content
- 06:02:57 [cpn]
- ... Specific requirements for word pronunciation
- 06:03:10 [cpn]
- ... A barrier is that the HTML content can't host SSML
- 06:03:50 [cpn]
- ... Presentation cues in HTML could also be consumed by voice assistants, please participate in the TF
- 06:04:44 [Irfan]
- Pronunciation Task Force: https://www.w3.org/WAI/APA/task-forces/pronunciation/
- 06:05:00 [dsr]
- Chris: the broadcast industry got together under the EBU to discuss some of these issues, e.g. loudness of voice relative to other content, to present our content using our voice talents
- 06:05:30 [dsr]
- Concerns about difficulties of achieving write once run everywhere
- 06:05:54 [Irfan]
- s/Marko/Markku
- 06:06:25 [dsr]
- Need to involve implementers
- 06:06:49 [dsr]
- The EBU is expecting to provide a collection of requirements
- 06:07:11 [dsr]
- BBC would support work on extending SSML
- 06:07:26 [cpn]
- scribenick: cpn
- 06:08:03 [cpn]
- Kaz: There's also PLS, as well as SSML
- 06:08:11 [tink]
- Lyrebird is an API that can recreate the voices of real people. Demos on this page https://www.youtube.com/watch?v=YfU_sWHT8mo
- 06:08:23 [cpn]
- ... Also multi-modal architecture, EMMA data model
- 06:09:05 [tink]
- Lyrebird API here https://www.descript.com/lyrebird-ai
- 06:09:05 [kaz]
- s/EMMA data model/SCXML as the mechanism for that purpose, and EMMA data model/
- 06:09:07 [cpn]
- Phil: It sounds like SSML updates are potentially of interest
- 06:09:23 [cpn]
- ... Not keen to look at intents?
- 06:09:26 [meredith]
- https://www.irccloud.com/pastebin/cBgx1SEk/
- 06:09:26 [kaz]
- -> https://www.w3.org/TR/speech-synthesis11/ SSML 1.1
- 06:09:41 [cpn]
- Vincent: I see that as more challenging, companies are differentiating
- 06:09:53 [kaz]
- -> https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/ PLS
- 06:10:12 [dsr]
- Opportunities for operating system integrated voice agents being able to make use of semantic descriptions (e.g. schema.org) of services exposed by web sites.
- 06:10:18 [cpn]
- Aaron: Thinking about context providers, e.g., weather services, advertising specific apps to hook into a voice interface, is interseting
- 06:10:45 [kaz]
- ack kaz
- 06:11:02 [cpn]
- Leonie: As someone producing skills, a way to avoid having to write everything twice is desirable
- 06:11:30 [cpn]
- ... There's similarity with conversational models. I suspect the hooks are similar
- 06:12:02 [cpn]
- Vincent: I think there's huge potiential, more with SSML than intents though
- 06:12:35 [cpn]
- ... Hasn't started with a standards-first approach
- 06:13:09 [cpn]
- Dave: schema.org has allowed smart search, but also hooks for the OS voice assistent. how could we extend schema.org to provide the kinds of voice experiences people are looking for?
- 06:13:19 [cpn]
- ... Then the voice vendors have something common to work with?
- 06:13:24 [cpn]
- s/with?/with/
- 06:14:06 [cpn]
- Aaron: I'd like to be able to ask a website to search for things, and it know what to do
- 06:15:00 [cpn]
- Omar: I'm working on chatbots, I notice there's a ubiquity, it's on the webpage, then FB messenger etc
- 06:15:16 [cpn]
- ... We're thinking about intents, whether to do in frontend or back-end
- 06:15:42 [cpn]
- ... Would a web standard help with intents? Same for speech synthesis, where email or SMS are valid channels for the chatbot
- 06:16:24 [cpn]
- ... I'd like to see improvement in interoperability between Alexa and Siri
- 06:17:05 [cpn]
- ... For speech recognition, we do nothing, as mobile devices have it built in
- 06:17:40 [cpn]
- Phil: Does the browser has a speech synthesis API?
- 06:18:03 [cpn]
- Brian: Yes, it's not a great API, it lacks ability to give richer input than just text
- 06:18:35 [cpn]
- Leonie: This is being worked on in a CG, could bring support, move to WG?
- 06:18:50 [cpn]
- Brian: TAG has given input on Web Speech
- 06:19:08 [cpn]
- Kaz: Multi-application handling was included in the multi-modal architecture
- 06:19:18 [cpn]
- ... WoT is working on smart speakers and speech synthesis
- 06:19:34 [cpn]
- ... Not suggesting using WoT for this, but we should collaborate
- 06:20:02 [cpn]
- Phil: If there were a W3C workshop on this, would you come?
- 06:20:06 [meredith]
- reposting as link instead of snippet: https://github.com/w3c/strategy/issues/71#issuecomment-391105060
- 06:20:23 [cpn]
- Phil: We'd need implementers in the room
- 06:20:34 [cpn]
- Vincent: There's a good chance we could get people there
- 06:21:09 [cpn]
- Dave: In preparing the workshop we'd reach out to stakeholders, so we'd first want to make the right contacts, to make it relevant
- 06:21:15 [cpn]
- Vincent: I can help make contacts in Google
- 06:21:26 [cpn]
- Aaron: I can help at Microsoft
- 06:21:32 [cpn]
- Dan: I can also help at Google
- 06:21:48 [kaz]
- -> https://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/ MMI interoperability test report (as the starting point of what MMI Architecture is like to synchronize multiple agents like messenger and speech)
- 06:21:52 [cpn]
- Dan: We're always happy to try things out at schema.org
- 06:22:09 [cpn]
- present+ Dan_Brickley
- 06:22:51 [cpn]
- Dan: There's speakable, which reads things from news articles. There's work on intents and filling in forms
- 06:23:11 [cpn]
- ... We pull in feeds from Netflix, etc, schema.org works well for that
- 06:23:25 [kaz]
- i|If there were|-> https://www.w3.org/2019/09/18-wot-pf-minutes.html WoT PlugFest breakout minutes|
- 06:23:53 [cpn]
- Marko: It seems there's a perfect storm of people in the room to move things forward
- 06:24:01 [cpn]
- ... I have issues with SSML
- 06:24:13 [cpn]
- Leonie: Latest update was in 2010
- 06:24:53 [cpn]
- Brian: The Web Speech APIs are also from that time, but stopped since then
- 06:25:06 [cpn]
- Leonie: The standards pre-empted the current situation, things have now moved on
- 06:25:10 [kaz]
- i|If there were|-> https://github.com/w3c/wot/blob/master/PRESENTATIONS/2019-09_WoT-Plugfest.pdf WoT PlugFest summary slides|
- 06:25:13 [cpn]
- Phil: Which group should we join?
- 06:25:31 [cpn]
- Leonie: Voice Assistant Standardisation CG could be restarted
- 06:25:45 [cpn]
- Phil: Thank you everyone
- 06:25:49 [cpn]
- [adjourned]
- 06:25:52 [cpn]
- rrsagent, draft minutes
- 06:25:52 [RRSAgent]
- I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html cpn
- 06:25:57 [cpn]
- rrsagent, make log public
- 06:27:05 [mhakkinen]
- https://www.w3.org/WAI/APA/task-forces/pronunciation/
- 06:30:51 [phila]
- phila has joined #voice
- 06:31:19 [phila]
- RRSAgent, draft minutes
- 06:31:19 [RRSAgent]
- I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila
- 06:31:30 [phila]
- RRSAgent, make logs public
- 06:31:52 [phila]
- Meeting: Voice assistants - what needs standardization?
- 06:32:01 [phila]
- chair: PhilA
- 06:32:34 [phila]
- RRSAgent, draft minutes
- 06:32:34 [RRSAgent]
- I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila
- 06:50:31 [dsr]
- dsr has joined #voice
- 06:58:05 [dsr]
- dsr has joined #voice
- 07:00:52 [stevelee]
- stevelee has joined #voice
- 07:12:01 [dsr]
- dsr has joined #voice
- 07:34:25 [dsr]
- dsr has joined #voice
- 07:42:46 [phila]
- phila has joined #voice
- 08:03:26 [stevelee_]
- stevelee_ has joined #voice
- 08:12:34 [stevelee]
- stevelee has joined #voice
- 08:27:24 [Zakim]
- Zakim has left #voice
- 08:30:03 [dsr]
- dsr has joined #voice
- 11:28:03 [dsr]
- dsr has joined #voice
- 13:10:27 [stevelee]
- stevelee has joined #voice
- 15:24:42 [dsr]
- dsr has joined #voice