01:47:11 <RRSAgent> RRSAgent has joined #voice
01:47:11 <RRSAgent> logging to https://www.w3.org/2019/09/18-voice-irc
01:47:14 <wseltzer> rrsagent, make logs public
04:42:49 <RRSAgent> RRSAgent has joined #voice
04:42:49 <RRSAgent> logging to https://www.w3.org/2019/09/18-voice-irc
05:01:59 <stevelee> stevelee has joined #voice
05:19:55 <phila> phila has joined #voice
05:23:45 <phila> phila has changed the topic to: Intro slide deck for TPAC Voice session https://docs.google.com/presentation/d/1HWaE_u9084sDdHJShcKANQcPn6PUV7I5ss8fJpLz4_Y/edit#
05:27:10 <takeru> takeru has joined #voice
05:31:54 <cpn> cpn has joined #voice
05:32:24 <hyojin> hyojin has joined #voice
05:32:47 <mhakkinen> mhakkinen has joined #voice
05:33:00 <cpn> meeting: Voice assistants: opportunities for standardisation
05:33:10 <Irfan> Irfan has joined #voice
05:33:13 <Irfan> present+
05:33:26 <tink> tink has joined #voice
05:33:39 <tink> present+ Léonie (tink)
05:33:46 <cpn> present+ Chris_Needham
05:34:13 <Zakim> Zakim has joined #voice
05:34:26 <carlosil> carlosil has joined #voice
05:34:54 <Irfan> rrsagent, make minutes
05:34:54 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html Irfan
05:35:02 <cpn> scribenick: cpn
05:35:06 <minobu> minobu has joined #voice
05:35:10 <scheib> scheib has joined #voice
05:35:17 <cpn> [introductions from Phil, Leonie, Marco]
05:35:20 <mhakkinen> present+ mhakkinen
05:35:34 <scheib> present+ scheib
05:35:40 <cpn> Phil: A11y is a use case, other applications in healthcare, driving, etc
05:35:53 <dsr> dsr has joined #voice
05:36:02 <dsr> present+
05:36:05 <cpn> ... I know this is an important area, want to find out what we could do
05:36:06 <meredith> meredith has joined #voice
05:36:23 <mori> mori has joined #voice
05:36:24 <cpn> ... There are 5 different CGs on voice
05:36:36 <cpn> ... some addressing the same thing, mostly inactive
05:36:42 <cpn> ... voice interaction with the web isn't new
05:36:51 <cpn> ... also voice output is important, eg, for BBC
05:37:09 <cpn> ... none of this gives a clear direction on where we want to go
05:37:28 <cpn> ... [block diagram]
05:37:51 <cpn> ... [demo video from MIT]
05:38:10 <cpn> ... Open Voice Network
05:40:49 <cpn> ... it's a rare example of a voice assistant with a male voice
05:41:23 <cpn> ... add to shopping list important for retailers
05:41:52 <cpn> ... Intel and Cap Gemini (sp?) also involved in this
05:42:49 <cpn> ... APIs are needed, for intents and slots, training data (privacy implications), history of conversation context
05:43:13 <cpn> ... SSML, avoid writing code for each individual platform
05:43:20 <cpn> ... where is the common interest?
05:43:34 <cpn> ... what level of interest is there, and where to continue the conversation?
05:43:54 <dsr> 1998 W3C workshop on voice browsers
05:43:56 <cpn> ... what are your motivations, pain points, etc?
05:44:06 <dsr> https://www.w3.org/Voice/1998/Workshop/
05:44:12 <cpn> Topic: Previous W3C work
05:44:37 <cpn> Dave: Workshop in 1998 led to specs such as speech synthesis, speech recognition, SSML
05:44:49 <cpn> ... Describing the dialog you have with a voice assistant is complex
05:45:00 <cpn> ... Wanted to separate that from the synthesis and recognition
05:45:03 <cpn> ... Work done on APIs
05:45:05 <dsr> https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#targetText=Check%20the%20Browser%20compatibility%20table,SpeechRecognition%20(Asynchronous%20Speech%20Recognition.)
05:45:16 <cpn> Dave: This is the MDN page for the Web Speech API
05:45:41 <cpn> ... Browsers support synthesis, but few support recognition
05:45:53 <cpn> ... Then there's the relationship between voice interaction and chatbots
05:46:13 <cpn> ... Voice recognition has improved, so we now have good quality speech rec, so the problem is now text interaction
05:46:17 <dsr> https://developer.amazon.com/docs/custom-skills/create-the-interaction-model-for-your-skill.html
05:46:31 <cpn> Dave: This is the Amazon developer page for creating Alexa skills
05:46:49 <cpn> ... There's a declarative way to define intents and slots
05:46:53 <dsr> https://github.com/w3c/strategy/issues/134
05:47:22 <cpn> Dave: There's a range of conversation markup languages available, AIML, BOTML
05:47:31 <cpn> ... What are their strengths and weaknesses of these?
05:47:37 <cpn> ... What's the business value?
05:47:47 <cpn> ... Improve customer service using chatbots
05:48:01 <cpn> ... Includes not being annoying, where the agent on websites often gets in the way
05:48:12 <igarashi> igarashi has joined #voice
05:48:18 <cpn> ... We could have a CG, organise a W3C workshop
05:48:25 <cpn> ... Can we get the commercial companies interested?
05:49:14 <scheib> https://github.com/slightlyoff/declarative_web_actions mentions by Aaron G.
05:49:27 <cpn> Aaron_Gustavson: Declarative web, Web Actions, a generalised approach to interactions, also
05:49:51 <cpn> ... Declarative Web Actions is a way in the Web App Manifest to declare interactions with assistants such as Cortana, Siri, etc
05:50:06 <cpn> ... A way to tie into the operating system
05:50:22 <cpn> ... With Cortana, had a similar thing
05:50:34 <cpn> ... Placeholders for keywords with alternate phrasing
05:50:55 <cpn> ... It uses slots, similar architecture, intents were used and key phrasings for triggering
05:51:02 <cpn> ... Talk to Alex Russell
05:51:23 <cpn> Dave: A company could create an agent, or to allow third parties to plug in, which is more scalable
05:52:01 <cpn> Vincent: Working on Chrome and Google Assistant
05:52:21 <cpn> ... The market is changing rapidly, so it's a challenging time to do standardisation work
05:52:22 <aarongu> aarongu has joined #voice
05:52:30 <cpn> ... What we'll have in a few years might be quite different
05:52:59 <cpn> ... Architecture challenging because of changing technology, and businesses in this space are moving fast and differentiating themselves
05:53:12 <aarongu> Declarative Web Actions: https://github.com/slightlyoff/declarative_web_actions
05:53:12 <cpn> ... SSML has been adopted and extended by Amazon and Google
05:53:36 <cpn> Vincent: I was advocating use of the standardised parts of SSML, enables ingest of content from third parties
05:53:52 <cpn> ... Things can move faster by not using standards
05:54:15 <cpn> ... With the appropriate parties engaged, we'll find people receptive to add enhancements to SSML
05:54:17 <aarongu> Cortana’s Voice Command Definition (for reference) https://docs.microsoft.com/en-us/uwp/schemas/voicecommands/voice-command-elements-and-attributes-1-2
05:54:37 <cpn> Vincent: Another foundational technology are speech recognition and speech generation
05:54:39 <JohnRiv> JohnRiv has joined #voice
05:54:50 <cpn> ... Compapies don't need standardisation, they're moving fast
05:55:17 <cpn> ... How are users using agents? Many ways. Embedded agents in web pages, I don't see large usage
05:55:32 <cpn> ... Instead, appliance scenarios as input modality to the computer as a whole
05:55:45 <cpn> ... Using the assistant at the mobile OS level
05:56:00 <cpn> ... Smart speakers
05:56:10 <cpn> ... On laptops and desktops, there's less usage
05:56:51 <cpn> ... The best thing we can do at W3C is make web content as navigable and actionable as possible by OS level agents
05:56:58 <cpn> ... And build aspects of those agents into the browser
05:57:06 <kaz> kaz has joined #voice
05:57:13 <cpn> ... Alexa and Siri attempt to get structured data from the web, e.g., schema.org
05:57:22 <kaz> present+ Kaz_Ashimura
05:57:30 <cpn> ... These queries work the best: fact or structureal based queries give good responses
05:57:41 <cpn> ... Navigating a website with unique offerings isn't handled very well
05:58:09 <cpn> ... Having a page that responds to certain actions such as Ctrl+S for save, and having an associated voice action, has value
05:58:15 <kaz> q+
05:59:04 <cpn> Leonie: Is there room for new features in SSML. Such as effects, like "whisper", a quick way to produce specific patterns
06:00:20 <cpn> Vincent: Yes. Reprocity and adjacent attributes. It's complex, there's motivation to improve speech generation, this is so new it's hard to standardise
06:00:39 <cpn> Leonie: Google are restarting work on Web Speech API, is there interest in formalising that more?
06:00:43 <dsr> Amazon’s extensions to SSML: https://developer.amazon.com/blogs/alexa/post/5c631c3c-0d35-483f-b226-83dd98def117/new-ssml-features-give-alexa-a-wider-range-of-natural-expression
06:00:51 <cpn> Vincent: Don't know
06:01:20 <cpn> Dave: Interest from Amazon in extending SSML at W3C
06:01:37 <cpn> Vincent: Google would also be interested, but other things we're doing are out of scope
06:01:55 <cpn> Brian: Not everything there is currently supported in browsers
06:02:12 <cpn> Vincent: Would be good to have an artifact that describes state of SSML support
06:02:39 <cpn> Marko: Pronunciation TF from APA WG. Coming from education, consuming text to speech content
06:02:57 <cpn> ... Specific requirements for word pronunciation
06:03:10 <cpn> ... A barrier is that the HTML content can't host SSML
06:03:50 <cpn> ... Presentation cues in HTML could also be consumed by voice assistants, please participate in the TF
06:04:44 <Irfan> Pronunciation Task Force: https://www.w3.org/WAI/APA/task-forces/pronunciation/
06:05:00 <dsr> Chris: the broadcast industry got together under the EBU to discuss some of these issues, e.g. loudness of voice relative to other content, to present our content using our voice talents
06:05:30 <dsr> Concerns about difficulties of achieving write once run everywhere
06:05:54 <Irfan> s/Marko/Markku
06:06:25 <dsr> Need to involve implementers
06:06:49 <dsr> The EBU is expecting to provide a collection of requirements
06:07:11 <dsr> BBC would support work on extending SSML
06:07:26 <cpn> scribenick: cpn
06:08:03 <cpn> Kaz: There's also PLS, as well as SSML
06:08:11 <tink> Lyrebird is an API that can recreate the voices of real people. Demos on this page https://www.youtube.com/watch?v=YfU_sWHT8mo
06:08:23 <cpn> ... Also multi-modal architecture, EMMA data model
06:09:05 <tink> Lyrebird API here https://www.descript.com/lyrebird-ai
06:09:05 <kaz> s/EMMA data model/SCXML as the mechanism for that purpose, and EMMA data model/
06:09:07 <cpn> Phil: It sounds like SSML updates are potentially of interest
06:09:23 <cpn> ... Not keen to look at intents?
06:09:26 <meredith> https://www.irccloud.com/pastebin/cBgx1SEk/
06:09:26 <kaz> -> https://www.w3.org/TR/speech-synthesis11/ SSML 1.1
06:09:41 <cpn> Vincent: I see that as more challenging, companies are differentiating
06:09:53 <kaz> -> https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/ PLS
06:10:12 <dsr> Opportunities for operating system integrated voice agents being able to make use of semantic descriptions (e.g. schema.org) of services exposed by web sites.
06:10:18 <cpn> Aaron: Thinking about context providers, e.g., weather services, advertising specific apps to hook into a voice interface, is interseting
06:10:45 <kaz> ack kaz
06:11:02 <cpn> Leonie: As someone producing skills, a way to avoid having to write everything twice is desirable
06:11:30 <cpn> ... There's similarity with conversational models. I suspect the hooks are similar
06:12:02 <cpn> Vincent: I think there's huge potiential, more with SSML than intents though
06:12:35 <cpn> ... Hasn't started with a standards-first approach
06:13:09 <cpn> Dave: schema.org has allowed smart search, but also hooks for the OS voice assistent. how could we extend schema.org to provide the kinds of voice experiences people are looking for?
06:13:19 <cpn> ... Then the voice vendors have something common to work with?
06:13:24 <cpn> s/with?/with/
06:14:06 <cpn> Aaron: I'd like to be able to ask a website to search for things, and it know what to do
06:15:00 <cpn> Omar: I'm working on chatbots, I notice there's a ubiquity, it's on the webpage, then FB messenger etc
06:15:16 <cpn> ... We're thinking about intents, whether to do in frontend or back-end
06:15:42 <cpn> ... Would a web standard help with intents? Same for speech synthesis, where email or SMS are valid channels for the chatbot
06:16:24 <cpn> ... I'd like to see improvement in interoperability between Alexa and Siri
06:17:05 <cpn> ... For speech recognition, we do nothing, as mobile devices have it built in
06:17:40 <cpn> Phil: Does the browser has a speech synthesis API?
06:18:03 <cpn> Brian: Yes, it's not a great API, it lacks ability to give richer input than just text
06:18:35 <cpn> Leonie: This is being worked on in a CG, could bring support, move to WG?
06:18:50 <cpn> Brian: TAG has given input on Web Speech
06:19:08 <cpn> Kaz: Multi-application handling was included in the multi-modal architecture
06:19:18 <cpn> ... WoT is working on smart speakers and speech synthesis
06:19:34 <cpn> ... Not suggesting using WoT for this, but we should collaborate
06:20:02 <cpn> Phil: If there were a W3C workshop on this, would you come?
06:20:06 <meredith> reposting as link instead of snippet: https://github.com/w3c/strategy/issues/71#issuecomment-391105060
06:20:23 <cpn> Phil: We'd need implementers in the room
06:20:34 <cpn> Vincent: There's a good chance we could get people there
06:21:09 <cpn> Dave: In preparing the workshop we'd reach out to stakeholders, so we'd first want to make the right contacts, to make it relevant
06:21:15 <cpn> Vincent: I can help make contacts in Google
06:21:26 <cpn> Aaron: I can help at Microsoft
06:21:32 <cpn> Dan: I can also help at Google
06:21:48 <kaz> -> https://www.w3.org/TR/2012/NOTE-mmi-interop-20120124/ MMI interoperability test report (as the starting point of what MMI Architecture is like to synchronize multiple agents like messenger and speech)
06:21:52 <cpn> Dan: We're always happy to try things out at schema.org
06:22:09 <cpn> present+ Dan_Brickley
06:22:51 <cpn> Dan: There's speakable, which reads things from news articles. There's work on intents and filling in forms
06:23:11 <cpn> ... We pull in feeds from Netflix, etc, schema.org works well for that
06:23:25 <kaz> i|If there were|-> https://www.w3.org/2019/09/18-wot-pf-minutes.html WoT PlugFest breakout minutes|
06:23:53 <cpn> Marko: It seems there's a perfect storm of people in the room to move things forward
06:24:01 <cpn> ... I have issues with SSML
06:24:13 <cpn> Leonie: Latest update was in 2010
06:24:53 <cpn> Brian: The Web Speech APIs are also from that time, but stopped since then
06:25:06 <cpn> Leonie: The standards pre-empted the current situation, things have now moved on
06:25:10 <kaz> i|If there were|-> https://github.com/w3c/wot/blob/master/PRESENTATIONS/2019-09_WoT-Plugfest.pdf WoT PlugFest summary slides|
06:25:13 <cpn> Phil: Which group should we join?
06:25:31 <cpn> Leonie: Voice Assistant Standardisation CG could be restarted
06:25:45 <cpn> Phil: Thank you everyone
06:25:49 <cpn> [adjourned]
06:25:52 <cpn> rrsagent, draft minutes
06:25:52 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html cpn
06:25:57 <cpn> rrsagent, make log public
06:27:05 <mhakkinen> https://www.w3.org/WAI/APA/task-forces/pronunciation/
06:30:51 <phila> phila has joined #voice
06:31:19 <phila> RRSAgent, draft minutes
06:31:19 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila
06:31:30 <phila> RRSAgent, make logs public
06:31:52 <phila> Meeting: Voice assistants - what needs standardization?
06:32:01 <phila> chair: PhilA
06:32:34 <phila> RRSAgent, draft minutes
06:32:34 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/18-voice-minutes.html phila
06:50:31 <dsr> dsr has joined #voice
06:58:05 <dsr> dsr has joined #voice
07:00:52 <stevelee> stevelee has joined #voice
07:12:01 <dsr> dsr has joined #voice
07:34:25 <dsr> dsr has joined #voice
07:42:46 <phila> phila has joined #voice
08:03:26 <stevelee_> stevelee_ has joined #voice
08:12:34 <stevelee> stevelee has joined #voice
08:27:24 <Zakim> Zakim has left #voice
08:30:03 <dsr> dsr has joined #voice
11:28:03 <dsr> dsr has joined #voice
13:10:27 <stevelee> stevelee has joined #voice
15:24:42 <dsr> dsr has joined #voice