01:47:11 RRSAgent has joined #voice 01:47:11 logging to https://www.w3.org/2019/09/18-voice-irc 01:47:14 rrsagent, make logs public 04:42:49 RRSAgent has joined #voice 04:42:49 logging to https://www.w3.org/2019/09/18-voice-irc 05:01:59 stevelee has joined #voice 05:19:55 phila has joined #voice 05:23:45 phila has changed the topic to: Intro slide deck for TPAC Voice session https://docs.google.com/presentation/d/1HWaE_u9084sDdHJShcKANQcPn6PUV7I5ss8fJpLz4_Y/edit# 05:27:10 takeru has joined #voice 05:31:54 cpn has joined #voice 05:32:24 hyojin has joined #voice 05:32:47 mhakkinen has joined #voice 05:33:00 meeting: Voice assistants: opportunities for standardisation 05:33:10 Irfan has joined #voice 05:33:13 present+ 05:33:26 tink has joined #voice 05:33:39 present+ Léonie (tink) 05:33:46 present+ Chris_Needham 05:34:13 Zakim has joined #voice 05:34:26 carlosil has joined #voice 05:34:54 rrsagent, make minutes 05:34:54 I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html Irfan 05:35:02 scribenick: cpn 05:35:06 minobu has joined #voice 05:35:10 scheib has joined #voice 05:35:17 [introductions from Phil, Leonie, Marco] 05:35:20 present+ mhakkinen 05:35:34 present+ scheib 05:35:40 Phil: A11y is a use case, other applications in healthcare, driving, etc 05:35:53 dsr has joined #voice 05:36:02 present+ 05:36:05 ... I know this is an important area, want to find out what we could do 05:36:06 meredith has joined #voice 05:36:23 mori has joined #voice 05:36:24 ... There are 5 different CGs on voice 05:36:36 ... some addressing the same thing, mostly inactive 05:36:42 ... voice interaction with the web isn't new 05:36:51 ... also voice output is important, eg, for BBC 05:37:09 ... none of this gives a clear direction on where we want to go 05:37:28 ... [block diagram] 05:37:51 ... [demo video from MIT] 05:38:10 ... Open Voice Network 05:40:49 ... it's a rare example of a voice assistant with a male voice 05:41:23 ... add to shopping list important for retailers 05:41:52 ... Intel and Cap Gemini (sp?) also involved in this 05:42:49 ... APIs are needed, for intents and slots, training data (privacy implications), history of conversation context 05:43:13 ... SSML, avoid writing code for each individual platform 05:43:20 ... where is the common interest? 05:43:34 ... what level of interest is there, and where to continue the conversation? 05:43:54 1998 W3C workshop on voice browsers 05:43:56 ... what are your motivations, pain points, etc? 05:44:06 https://www.w3.org/Voice/1998/Workshop/ 05:44:12 Topic: Previous W3C work 05:44:37 Dave: Workshop in 1998 led to specs such as speech synthesis, speech recognition, SSML 05:44:49 ... Describing the dialog you have with a voice assistant is complex 05:45:00 ... Wanted to separate that from the synthesis and recognition 05:45:03 ... Work done on APIs 05:45:05 https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#targetText=Check%20the%20Browser%20compatibility%20table,SpeechRecognition%20(Asynchronous%20Speech%20Recognition.) 05:45:16 Dave: This is the MDN page for the Web Speech API 05:45:41 ... Browsers support synthesis, but few support recognition 05:45:53 ... Then there's the relationship between voice interaction and chatbots 05:46:13 ... Voice recognition has improved, so we now have good quality speech rec, so the problem is now text interaction 05:46:17 https://developer.amazon.com/docs/custom-skills/create-the-interaction-model-for-your-skill.html 05:46:31 Dave: This is the Amazon developer page for creating Alexa skills 05:46:49 ... There's a declarative way to define intents and slots 05:46:53 https://github.com/w3c/strategy/issues/134 05:47:22 Dave: There's a range of conversation markup languages available, AIML, BOTML 05:47:31 ... What are their strengths and weaknesses of these? 05:47:37 ... What's the business value? 05:47:47 ... Improve customer service using chatbots 05:48:01 ... Includes not being annoying, where the agent on websites often gets in the way 05:48:12 igarashi has joined #voice 05:48:18 ... We could have a CG, organise a W3C workshop 05:48:25 ... Can we get the commercial companies interested? 05:49:14 https://github.com/slightlyoff/declarative_web_actions mentions by Aaron G. 05:49:27 Aaron_Gustavson: Declarative web, Web Actions, a generalised approach to interactions, also 05:49:51 ... Declarative Web Actions is a way in the Web App Manifest to declare interactions with assistants such as Cortana, Siri, etc 05:50:06 ... A way to tie into the operating system 05:50:22 ... With Cortana, had a similar thing 05:50:34 ... Placeholders for keywords with alternate phrasing 05:50:55 ... It uses slots, similar architecture, intents were used and key phrasings for triggering 05:51:02 ... Talk to Alex Russell 05:51:23 Dave: A company could create an agent, or to allow third parties to plug in, which is more scalable 05:52:01 Vincent: Working on Chrome and Google Assistant 05:52:21 ... The market is changing rapidly, so it's a challenging time to do standardisation work 05:52:22 aarongu has joined #voice 05:52:30 ... What we'll have in a few years might be quite different 05:52:59 ... Architecture challenging because of changing technology, and businesses in this space are moving fast and differentiating themselves 05:53:12 Declarative Web Actions: https://github.com/slightlyoff/declarative_web_actions 05:53:12 ... SSML has been adopted and extended by Amazon and Google 05:53:36 Vincent: I was advocating use of the standardised parts of SSML, enables ingest of content from third parties 05:53:52 ... Things can move faster by not using standards 05:54:15 ... With the appropriate parties engaged, we'll find people receptive to add enhancements to SSML 05:54:17 Cortana’s Voice Command Definition (for reference) https://docs.microsoft.com/en-us/uwp/schemas/voicecommands/voice-command-elements-and-attributes-1-2 05:54:37 Vincent: Another foundational technology are speech recognition and speech generation 05:54:39 JohnRiv has joined #voice 05:54:50 ... Compapies don't need standardisation, they're moving fast 05:55:17 ... How are users using agents? Many ways. Embedded agents in web pages, I don't see large usage 05:55:32 ... Instead, appliance scenarios as input modality to the computer as a whole 05:55:45 ... Using the assistant at the mobile OS level 05:56:00 ... Smart speakers 05:56:10 ... On laptops and desktops, there's less usage 05:56:51 ... The best thing we can do at W3C is make web content as navigable and actionable as possible by OS level agents 05:56:58 ... And build aspects of those agents into the browser 05:57:06 kaz has joined #voice 05:57:13 ... Alexa and Siri attempt to get structured data from the web, e.g., schema.org 05:57:22 present+ Kaz_Ashimura 05:57:30 ... These queries work the best: fact or structureal based queries give good responses 05:57:41 ... Navigating a website with unique offerings isn't handled very well 05:58:09 ... Having a page that responds to certain actions such as Ctrl+S for save, and having an associated voice action, has value 05:58:15 q+ 05:59:04 Leonie: Is there room for new features in SSML. Such as effects, like "whisper", a quick way to produce specific patterns 06:00:20 Vincent: Yes. Reprocity and adjacent attributes. It's complex, there's motivation to improve speech generation, this is so new it's hard to standardise 06:00:39 Leonie: Google are restarting work on Web Speech API, is there interest in formalising that more? 06:00:43 Amazon’s extensions to SSML: https://developer.amazon.com/blogs/alexa/post/5c631c3c-0d35-483f-b226-83dd98def117/new-ssml-features-give-alexa-a-wider-range-of-natural-expression 06:00:51 Vincent: Don't know 06:01:20 Dave: Interest from Amazon in extending SSML at W3C 06:01:37 Vincent: Google would also be interested, but other things we're doing are out of scope 06:01:55 Brian: Not everything there is currently supported in browsers 06:02:12 Vincent: Would be good to have an artifact that describes state of SSML support 06:02:39 Marko: Pronunciation TF from APA WG. Coming from education, consuming text to speech content 06:02:57 ... Specific requirements for word pronunciation 06:03:10 ... A barrier is that the HTML content can't host SSML 06:03:50 ... Presentation cues in HTML could also be consumed by voice assistants, please participate in the TF 06:04:44 Pronunciation Task Force: https://www.w3.org/WAI/APA/task-forces/pronunciation/ 06:05:00 Chris: the broadcast industry got together under the EBU to discuss some of these issues, e.g. loudness of voice relative to other content, to present our content using our voice talents 06:05:30 Concerns about difficulties of achieving write once run everywhere 06:05:54 s/Marko/Markku 06:06:25 Need to involve implementers 06:06:49 The EBU is expecting to provide a collection of requirements 06:07:11 BBC would support work on extending SSML 06:07:26 scribenick: cpn 06:08:03 Kaz: There's also PLS, as well as SSML 06:08:11 Lyrebird is an API that can recreate the voices of real people. Demos on this page https://www.youtube.com/watch?v=YfU_sWHT8mo 06:08:23 ... Also multi-modal architecture, EMMA data model 06:09:05 Lyrebird API here https://www.descript.com/lyrebird-ai 06:09:05 s/EMMA data model/SCXML as the mechanism for that purpose, and EMMA data model/ 06:09:07 Phil: It sounds like SSML updates are potentially of interest 06:09:23 ... Not keen to look at intents? 06:09:26 https://www.irccloud.com/pastebin/cBgx1SEk/ 06:09:26 -> https://www.w3.org/TR/speech-synthesis11/ SSML 1.1 06:09:41 Vincent: I see that as more challenging, companies are differentiating 06:09:53 -> https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/ PLS 06:10:12 Opportunities for operating system integrated voice agents being able to make use of semantic descriptions (e.g. schema.org) of services exposed by web sites. 06:10:18 Aaron: Thinking about context providers, e.g., weather services, advertising specific apps to hook into a voice interface, is interseting 06:10:45 ack kaz 06:11:02 Leonie: As someone producing skills, a way to avoid having to write everything twice is desirable 06:11:30 ... There's similarity with conversational models. I suspect the hooks are similar 06:12:02 Vincent: I think there's huge potiential, more with SSML than intents though 06:12:35 ... Hasn't started with a standards-first approach 06:13:09 Dave: schema.org has allowed smart search, but also hooks for the OS voice assistent. how could we extend schema.org to provide the kinds of voice experiences people are looking for? 06:13:19 ... Then the voice vendors have something common to work with? 06:13:24 s/with?/with/ 06:14:06 Aaron: I'd like to be able to ask a website to search for things, and it know what to do 06:15:00 Omar: I'm working on chatbots, I notice there's a ubiquity, it's on the webpage, then FB messenger etc 06:15:16 ... We're thinking about intents, whether to do in frontend or back-end 06:15:42 ... Would a web standard help with intents? Same for speech synthesis, where email or SMS are valid channels for the chatbot 06:16:24 ... I'd like to see improvement in interoperability between Alexa and Siri 06:17:05 ... For speech recognition, we do nothing, as mobile devices have it built in 06:17:40 Phil: Does the browser has a speech synthesis API? 06:18:03 Brian: Yes, it's not a great API, it lacks ability to give richer input than just text 06:18:35 Leonie: This is being worked on in a CG, could bring support, move to WG? 06:18:50 Brian: TAG has given input on Web Speech 06:19:08 Kaz: Multi-application handling was included in the multi-modal architecture 06:19:18 ... WoT is working on smart speakers and speech synthesis 06:19:34 ... Not suggesting using WoT for this, but we should collaborate 06:20:02 Phil: If there were a W3C workshop on this, would you come? 06:20:06 reposting as link instead of snippet: https://github.com/w3c/strategy/issues/71#issuecomment-391105060 06:20:23 Phil: We'd need implementers in the room 06:20:34 Vincent: There's a good chance we could get people there 06:21:09 Dave: In preparing the workshop we'd reach out to stakeholders, so we'd first want to make the right contacts, to make it relevant 06:21:15 Vincent: I can help make contacts in Google 06:21:26 Aaron: I can help at Microsoft 06:21:32 Dan: I can also help at Google 06:21:48 -> https://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/ MMI interoperability test report (as the starting point of what MMI Architecture is like to synchronize multiple agents like messenger and speech) 06:21:52 Dan: We're always happy to try things out at schema.org 06:22:09 present+ Dan_Brickley 06:22:51 Dan: There's speakable, which reads things from news articles. There's work on intents and filling in forms 06:23:11 ... We pull in feeds from Netflix, etc, schema.org works well for that 06:23:25 i|If there were|-> https://www.w3.org/2019/09/18-wot-pf-minutes.html WoT PlugFest breakout minutes| 06:23:53 Marko: It seems there's a perfect storm of people in the room to move things forward 06:24:01 ... I have issues with SSML 06:24:13 Leonie: Latest update was in 2010 06:24:53 Brian: The Web Speech APIs are also from that time, but stopped since then 06:25:06 Leonie: The standards pre-empted the current situation, things have now moved on 06:25:10 i|If there were|-> https://github.com/w3c/wot/blob/master/PRESENTATIONS/2019-09_WoT-Plugfest.pdf WoT PlugFest summary slides| 06:25:13 Phil: Which group should we join? 06:25:31 Leonie: Voice Assistant Standardisation CG could be restarted 06:25:45 Phil: Thank you everyone 06:25:49 [adjourned] 06:25:52 rrsagent, draft minutes 06:25:52 I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html cpn 06:25:57 rrsagent, make log public 06:27:05 https://www.w3.org/WAI/APA/task-forces/pronunciation/ 06:30:51 phila has joined #voice 06:31:19 RRSAgent, draft minutes 06:31:19 I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila 06:31:30 RRSAgent, make logs public 06:31:52 Meeting: Voice assistants - what needs standardization? 06:32:01 chair: PhilA 06:32:34 RRSAgent, draft minutes 06:32:34 I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila 06:50:31 dsr has joined #voice 06:58:05 dsr has joined #voice 07:00:52 stevelee has joined #voice 07:12:01 dsr has joined #voice 07:34:25 dsr has joined #voice 07:42:46 phila has joined #voice 08:03:26 stevelee_ has joined #voice 08:12:34 stevelee has joined #voice 08:27:24 Zakim has left #voice 08:30:03 dsr has joined #voice 11:28:03 dsr has joined #voice 13:10:27 stevelee has joined #voice 15:24:42 dsr has joined #voice