IRC log of voice on 2021-10-18
Timestamps are in UTC.
- 06:45:28 [RRSAgent]
- RRSAgent has joined #voice
- 06:45:28 [RRSAgent]
- logging to https://www.w3.org/2021/10/18-voice-irc
- 06:45:31 [dom]
- RRSAgent, stay
- 06:45:34 [dom]
- RRSAgent, make log public
- 06:45:50 [dom]
- RRSAgent, this meeting spans midnight
- 14:28:54 [kaz]
- kaz has joined #voice
- 23:45:09 [kaz]
- kaz has joined #voice
- 23:45:45 [Zakim]
- Zakim has joined #voice
- 23:45:51 [kaz]
- rrsagent, bye
- 23:45:51 [RRSAgent]
- I see no action items
- 23:45:56 [RRSAgent]
- RRSAgent has joined #voice
- 23:45:56 [RRSAgent]
- logging to https://www.w3.org/2021/10/18-voice-irc
- 23:49:11 [kaz]
- meeting: Next Directions for Voice and the Web Breakout
- 23:55:08 [takio]
- takio has joined #voice
- 23:56:08 [Ben]
- Ben has joined #voice
- 23:59:16 [kaz]
- present+ Kaz_Ashimura__W3C, Bev_Corwin, Francis_Storr, Jennie_Delisi Masakakazu_Kitahara, Takio_Yamaoka__Yahoo_Japan
- 23:59:30 [Jennie]
- Jennie has joined #voice
- 23:59:30 [kaz]
- present+ Makoto_Murata__DAISY
- 23:59:54 [kaz]
- present+ Muhammad Sam_Kanta, Tomoaki_Mizushima__IRI
- 00:00:01 [MURATA_]
- MURATA_ has joined #voice
- 00:00:01 [kaz]
- zakim, who is on the call?
- 00:00:02 [Zakim]
- Present: Kaz_Ashimura__W3C, Bev_Corwin, Francis_Storr, Jennie_Delisi, Masakakazu_Kitahara, Takio_Yamaoka__Yahoo_Japan, Makoto_Murata__DAISY, Muhammad, Sam_Kanta,
- 00:00:04 [Zakim]
- ... Tomoaki_Mizushima__IRI
- 00:00:44 [BC]
- BC has joined #voice
- 00:00:49 [BC]
- Hello
- 00:01:12 [kirkwood]
- kirkwood has joined #voice
- 00:01:13 [MasakazuKitahara]
- MasakazuKitahara has joined #voice
- 00:01:17 [MURATA_]
- present+
- 00:01:22 [MasakazuKitahara]
- present+
- 00:01:29 [fantasai]
- fantasai has joined #voice
- 00:01:44 [Jennie]
- present+
- 00:02:13 [Ben]
- present+
- 00:03:09 [fantasai]
- scribenick: fantasai
- 00:03:14 [fantasai]
- kaz: Thanks for joining this breakout session
- 00:03:31 [fantasai]
- kaz: This is a breakout session on new directions for Voice and Web
- 00:03:39 [fantasai]
- kaz: There was a breakout panel during AC meeting
- 00:03:51 [fantasai]
- kaz: discussion about how to improve web speech capabilities in general
- 00:04:19 [fantasai]
- kaz: There were several breakout sessions previously (previous TPAC??)
- 00:04:27 [fantasai]
- kaz: We want to summarize situation and figure out how to improve
- 00:04:37 [kaz]
- -> https://www.w3.org/2021/Talks/1018-voice-dd-ka/20211018-voice-breakout-dd-ka.pdf slides
- 00:04:52 [fantasai]
- kaz: First, reviewing existing standards and requirements for voice and web
- 00:05:05 [fantasai]
- kaz: Then would like to look into the issue of interop among voice agents
- 00:05:13 [fantasai]
- kaz: Then think about potential voice workshop
- 00:05:29 [fantasai]
- kaz: If you have any questions please raise your hand on Zoom chat, or type q+ on IRC
- 00:05:58 [fantasai]
- [slide 2]
- 00:06:07 [fantasai]
- kaz: Existing mechanisms for speech interfaces
- 00:06:16 [fantasai]
- kaz: We used to have markup languages like VoiceXML and SSML
- 00:06:28 [fantasai]
- kaz: There was also CSS speech modules
- 00:06:38 [fantasai]
- kaz: And Web Speech API
- 00:06:48 [fantasai]
- kaz: Lastly there's specification for spoken presentation in HTML WD
- 00:07:01 [fantasai]
- kaz: Most popular one is Web Speech API, but this is not a W3C REC but a CG report
- 00:07:04 [fantasai]
- kaz: so that's a question
- 00:07:06 [fantasai]
- [slide 3]
- 00:07:18 [fantasai]
- kaz: As voice agents are getting more and more popular, and very useful
- 00:08:13 [ddahl]
- ddahl has joined #voice
- 00:09:05 [fantasai]
- kaz: Need improved voice agents
- 00:09:15 [Tomoaki_Mizushima]
- Tomoaki_Mizushima has joined #voice
- 00:09:30 [fantasai]
- [slide 4]
- 00:09:35 [fantasai]
- kaz: Interoperability of voice agents
- 00:09:40 [fantasai]
- kaz: local voice agent or on the cloud side
- 00:09:47 [fantasai]
- kaz: most are proprietary, and not based on actual standards
- 00:09:55 [fantasai]
- kaz: speech API is very convenient but not a standard yet
- 00:10:01 [fantasai]
- kaz: Desktop and mobile apps, various implementations
- 00:10:08 [fantasai]
- kaz: how can we get them to interoperate with each other?
- 00:10:15 [fantasai]
- kaz: Do we need some standards-based infrastructure?
- 00:10:26 [fantasai]
- kaz: Voice Interaction CG chaired by David has been working on interop issues
- 00:10:31 [fantasai]
- kaz: will meet next week during TPAC
- 00:10:55 [fantasai]
- [slide 5]
- 00:11:44 [fantasai]
- s/David/ddahl/
- 00:12:00 [fantasai]
- ddahl: Our CG has been working on voice and web, focusing on interop among intelligent personal assistants right now
- 00:12:14 [fantasai]
- ddahl: We've noticed that these assistants (like Siri, Cortana, Alexa, etc.)
- 00:12:22 [fantasai]
- ddahl: they really have a lot in common in terms of what they are useful for
- 00:12:40 [kaz]
- i|slide 5|-> https://www.w3.org/community/voiceinteraction/ Voice Interaction CG|
- 00:12:41 [fantasai]
- ddahl: Like a web page, their goal is to help users find info, learn things, be entertained, and also intelligent personal assistance
- 00:12:56 [fantasai]
- ddahl: They communicate with servers on the internet, which contribute functionality in service of their goals
- 00:13:18 [fantasai]
- ddahl: Two types of interacting are different because web page is primarily graphical UI and PA is primarily voice interaction
- 00:13:27 [fantasai]
- s/PA/IPA/
- 00:13:34 [fantasai]
- ddahl: But there are some arbitrary differences also
- 00:13:46 [fantasai]
- ddahl: web page rendered in browser; IPA in a proprietary platform
- 00:13:57 [fantasai]
- ddahl: but that's an arbitrary architectural difference that devs of IPAs have chosen to use
- 00:14:02 [fantasai]
- ddahl: web pages run in any browser
- 00:14:08 [fantasai]
- ddahl: but IPAs only run on their own platform
- 00:14:17 [fantasai]
- ddahl: If you have Amazon function it can't run on the Web, it can't run on your phone
- 00:14:25 [fantasai]
- ddahl: it runs only on its own proprietary smart speaker
- 00:14:38 [fantasai]
- ddahl: similarly, web pages are very familiary with URL mechanism or search engine
- 00:14:51 [fantasai]
- ddahl: IPA is found through its proprietary platform, however that platform chooses to make it available
- 00:15:01 [fantasai]
- ddahl: So finding functionality is purely proprietary
- 00:15:13 [fantasai]
- [next slide]
- 00:15:25 [fantasai]
- slide depicts diagram of IPA architecture
- 00:15:34 [fantasai]
- ddahl: Focus on the three major boxes
- 00:15:42 [fantasai]
- ddahl: First box is data capture parts of functionality
- 00:15:49 [fantasai]
- ddahl: In case if IPA, most typicaly want to capture speech
- 00:15:55 [fantasai]
- ddahl: compared to web page, we're capturing user input
- 00:16:00 [kaz]
- s/next slide/slide 6/
- 00:16:17 [fantasai]
- ddahl: function in the middle is basically does the intelligent parts of the processing
- 00:16:24 [fantasai]
- ddahl: This is analogous to a browser
- 00:16:32 [fantasai]
- ddahl: On the right we have connection to other functionalities
- 00:16:39 [fantasai]
- ddahl: other IPAs or other web sites
- 00:16:44 [fantasai]
- ddahl: Found through search engine, DNS, combination
- 00:16:58 [fantasai]
- ddahl: Rightmost part of this box we find other functionalities
- 00:17:09 [fantasai]
- ddahl: e.g. the websites themselves, in the case of an IPA some other IPA
- 00:17:14 [fantasai]
- ddahl: For example looking for shopping site
- 00:17:21 [fantasai]
- ddahl: want to find interoperably from UI
- 00:17:25 [fantasai]
- ddahl: That's architecture that we're looking at
- 00:17:28 [fantasai]
- ddahl: seems parallel to Web
- 00:17:34 [fantasai]
- ddahl: we'd like to be able to make those alignments possible
- 00:17:45 [fantasai]
- ddahl: and use as much of the existing Web infrastructure as possible for IPAs to be interoperable
- 00:17:54 [fantasai]
- [next slide]
- 00:18:05 [fantasai]
- kaz: There are many issues emerging these days
- 00:18:29 [fantasai]
- kaz: So we'd like to organize a dedicated W3C workshop to summarize the current situation, the pain points, and discuss how we could solve and improve the situation
- 00:18:37 [fantasai]
- kaz: by providing e.g. a forum fo rjoint discussion by related stakeholders
- 00:18:46 [fantasai]
- kaz: I've created a dedicated GH issue in the strategy repo
- 00:18:56 [fantasai]
- -> https://github.com/w3c/strategy/issues/221
- 00:19:07 [fantasai]
- kaz: Please join the workshop and give your thoughts, pain points, solitions
- 00:19:12 [fantasai]
- s/solition/solutions/
- 00:19:20 [fantasai]
- kaz: Any questions, comments?
- 00:19:21 [kaz]
- s/next slide/slide 7
- 00:19:35 [Sam]
- Sam has joined #voice
- 00:19:53 [fantasai]
- kaz: Murata-san, you were very interested in a11y in general and also interaction of ruby and speech
- 00:19:59 [fantasai]
- kaz: interested in this workshop?
- 00:20:14 [fantasai]
- MURATA_: Yes, interested, and wondering what are the existing obstacles to existing specifications?
- 00:20:19 [fantasai]
- MURATA_: Why are they not widely used?
- 00:20:26 [fantasai]
- kaz: There are various approaches to this
- 00:20:37 [fantasai]
- kaz: e.g. markup-based approach like VoiceXML/SSML
- 00:20:41 [fantasai]
- kaz: and CSS-based approach
- 00:20:44 [fantasai]
- kaz: and JS-based approach
- 00:20:57 [fantasai]
- kaz: So we should think about how to integrate all these mechanisms into common speech platform
- 00:21:12 [fantasai]
- kaz: and have content authors and applications able to use various features for controlling speech freely and nicely
- 00:21:21 [fantasai]
- kaz: that kind of integration should be one discussion point for the workshop as well
- 00:21:38 [fantasai]
- kaz: You have been working on text information. Part of this, pronunciation specification, should also be included
- 00:21:41 [fantasai]
- MURATA_: yes
- 00:21:55 [fantasai]
- kaz: any other questions/comments/opinions/ideas?
- 00:22:01 [fantasai]
- MURATA_: Let me report one thing about EPU
- 00:22:03 [kaz]
- q?
- 00:22:08 [fantasai]
- MURATA_: EPUB3 has included SSML and PLS
- 00:22:15 [fantasai]
- MURATA_: But now EPUB3 is heading for Recommendation
- 00:22:28 [fantasai]
- MURATA_: and some in WG don't want to include features that are not widely implemented
- 00:22:41 [fantasai]
- MURATA_: so WG decided to move SSML and PLS to a separate note, which is maintained by the EPUB WG
- 00:22:49 [fantasai]
- MURATA_: But that spec is detached from mainstream EPUB
- 00:22:55 [fantasai]
- MURATA_: Not intended to be a Recommendation in the near future
- 00:23:03 [fantasai]
- MURATA_: On the other hand, I know some Japanese companies use SSML and PLS
- 00:23:09 [kaz]
- q?
- 00:23:11 [fantasai]
- MURATA_: One company uses PLS and a few use SSML
- 00:23:22 [fantasai]
- MURATA_: In particular, the biggest textbook publisher in Japan uses SSML
- 00:23:42 [fantasai]
- MURATA_: And I hear the cost of ebook is 3-4 times more if try to really incorporate SSML and try to make everything natural
- 00:23:59 [fantasai]
- MURATA_: For textbooks, wrong pronunciation is very problematic, especially for new language learners
- 00:24:06 [fantasai]
- MURATA_: It is therefore worth the cost for these cases
- 00:24:15 [fantasai]
- MURATA_: But it is not cost-effective for broader materials
- 00:24:26 [fantasai]
- MURATA_: So SSML-based approach can't scale
- 00:24:31 [fantasai]
- MURATA_: But more optimistic about PLS
- 00:24:39 [fantasai]
- MURATA_: Japanese manga and novels, character names are unreadable
- 00:24:46 [fantasai]
- MURATA_: If you use PLS you have to describe each name only once
- 00:24:59 [fantasai]
- MURATA_: Dragon Slayer is very common, but doesn't read well using text to speech
- 00:25:03 [fantasai]
- MURATA_: I'm hoping that PLS would make things better
- 00:25:20 [fantasai]
- kaz: As former Team contact for Voice group, I love SSML 1.1 and PLS 1.0
- 00:25:31 [fantasai]
- kaz: I would like to see the potential for improving those specifications further
- 00:25:45 [fantasai]
- kaz: Also, there's possibility that we might want an even newer mechanism to achieve the requirements
- 00:26:00 [fantasai]
- kaz: For example, Léonie mentioned it is maybe good time to re-start speech work in W3C, during AC meeting
- 00:26:06 [fantasai]
- kaz: Personally I would like to say Yes!
- 00:26:15 [fantasai]
- kaz: So I think a workshop would be a good starting point for that direction
- 00:26:25 [fantasai]
- kaz: Any other viewpoints?
- 00:26:28 [kaz]
- q?
- 00:26:45 [kaz]
- q+ ddahl
- 00:26:46 [fantasai]
- ddahl: Want to say something about why things not implemented in browsers
- 00:26:47 [kaz]
- ack d
- 00:26:56 [fantasai]
- ddahl: Since those early specifications, technology has gotten much stronger
- 00:27:05 [fantasai]
- ddahl: previously, speech recognition did not work well
- 00:27:11 [fantasai]
- ddahl: now text to speech works much better also
- 00:27:25 [fantasai]
- ddahl: So I think much of this was marginalized, it didn't work, and wouldn't use it
- 00:27:31 [fantasai]
- ddahl: was considered it wouldn't have anything to do with the Web
- 00:27:37 [fantasai]
- ddahl: but now the tech is far better than it was at the time
- 00:27:43 [fantasai]
- ddahl: It really does make sense to look at how it is used in the browser
- 00:27:51 [kaz]
- q?
- 00:27:56 [kaz]
- ack f
- 00:28:12 [kaz]
- fantasai: CSS and PLS seem to very different
- 00:28:17 [kaz]
- ... CSS is about styling
- 00:28:25 [kaz]
- ... not closely tied with each other
- 00:28:51 [kaz]
- ... you definitely can't have only CSS speech module but could use it to extend what is existing
- 00:29:02 [kaz]
- ... cue sound, etc.
- 00:29:06 [kaz]
- ... sifting volume, etc.
- 00:29:26 [fantasai]
- s/sound/sound, pauses/
- 00:29:29 [kaz]
- ... can't change spoken pronunciation itself
- 00:29:33 [kaz]
- q?
- 00:29:47 [kaz]
- ... maybe we need new technology
- 00:29:55 [kaz]
- ... what is missing for that
- 00:30:00 [fantasai]
- s/maybe we need/you said maybe we need/
- 00:30:11 [fantasai]
- s/for that/that we need to create technology for?
- 00:30:31 [fantasai]
- kaz: I was thinking about how to integrate various modalities
- 00:30:35 [fantasai]
- kaz: that are not interoperable currently
- 00:30:48 [fantasai]
- kaz: also how to implement dialog processing for interactive services
- 00:30:54 [fantasai]
- kaz: and possible integration with IoT services
- 00:31:20 [fantasai]
- kaz: so 2001 Space Odessey, asking for voice as a key for opening the dooor
- 00:31:29 [takio]
- q+
- 00:31:30 [fantasai]
- kaz: maybe because I'm working for WoT and Smart Cities as well
- 00:31:43 [fantasai]
- kaz: my dream is to apply voice technology as part of user interfaces for IoT and smart cities
- 00:31:52 [kaz]
- q?
- 00:32:36 [fantasai]
- ????: I have a lot of opinions on what's needed. Used voice interface for 20+ years
- 00:32:44 [fantasai]
- ????: I had to use totally hands free for 3 yrs
- 00:32:45 [kaz]
- s/????:/kim:/
- 00:32:47 [kaz]
- s/????:/kim:/
- 00:32:56 [fantasai]
- kim: Now also use wacom tablet
- 00:33:04 [fantasai]
- kim: Speech is not really well integrated with other forms of input
- 00:33:18 [fantasai]
- kim: If speech was well implemented, many people would use a little bit. A few people would use for everything.
- 00:33:22 [fantasai]
- kim: There's so much that is not there
- 00:33:31 [fantasai]
- kim: You were talkinga bout it being siloed, and that's one of the problems
- 00:33:39 [fantasai]
- kim: for example, when you have keyboard shortcuts
- 00:33:45 [fantasai]
- kim: Sometimes you can change it, and that's great
- 00:34:02 [fantasai]
- kim: But can only link to letters now. Would be great to integrate with speech
- 00:34:06 [kaz]
- q?
- 00:34:07 [ddahl]
- q+ to talk about chatbots on websites
- 00:34:12 [kaz]
- q?
- 00:34:14 [fantasai]
- kim: Instead of thinking as another input method, how do you put alongside
- 00:34:25 [fantasai]
- kim: It should be something with good defaults and works alongside everything else
- 00:34:32 [fantasai]
- kim: Getting there more with Siri etc.
- 00:34:41 [fantasai]
- kim: If you say "search the web for green apples" it's faster than typing
- 00:34:50 [Jennie]
- +1 to Kim Patch - would also see a need for sounds/vocal melodies. Some cannot articulate clear words but can make a melody.
- 00:34:52 [fantasai]
- kim: but big gaps, I think because of the underlying technology
- 00:35:00 [fantasai]
- kim: But I think speech has a ton of potential
- 00:35:05 [fantasai]
- kim: I can show some of it using custom stuff
- 00:35:10 [fantasai]
- kim: that really has not been realized
- 00:35:16 [fantasai]
- kim: But it's also used some places where it shouldn't be used
- 00:35:26 [fantasai]
- kim: Send is a really bad one-word speech command!
- 00:35:34 [fantasai]
- kim: I see a lot of stuff being implemented that is not well thought through
- 00:35:43 [fantasai]
- kim: It's too bad that more of us don't use a little bit of speech
- 00:35:55 [Jennie]
- * kaz sure
- 00:35:57 [fantasai]
- kim: Also some problems like e.g. need to have a good microphone
- 00:36:08 [fantasai]
- kim: Engines are getting better, but have to make sure didn't record something totally off the wall
- 00:36:16 [kaz]
- q?
- 00:36:24 [kaz]
- ack t
- 00:36:30 [fantasai]
- takio: Thanks for presentation today
- 00:36:30 [kaz]
- q+ Jennie
- 00:36:40 [fantasai]
- takio: I'm new around here, not sure about this specification
- 00:36:50 [fantasai]
- takio: but I'm concerned about emotional things (?)
- 00:36:59 [fantasai]
- takio: e.g. if ...
- 00:37:09 [fantasai]
- takio: If laughing or angry, this may be dropped
- 00:37:24 [fantasai]
- takio: So I'm concerned about these specifications, if they take care of emotional expression
- 00:37:30 [fantasai]
- takio: Also asking about intermediate formats
- 00:37:33 [fantasai]
- takio: e.g. ...
- 00:37:40 [fantasai]
- takio: e.g. emotional info is important for that person
- 00:38:08 [fantasai]
- kaz: For example, some telecom companies or research companies have been working on extracting emotion info from speech
- 00:38:18 [fantasai]
- kaz: and trying to deal with that information once we've extracted some of it
- 00:38:18 [kaz]
- -> https://www.w3.org/TR/emotionml/ EmotionML
- 00:38:30 [fantasai]
- kaz: There is a dedicated specification to describe emotional information, named EmotionML
- 00:38:41 [fantasai]
- kaz: As debbie also mentioned, speech tech has improved a lot the last 10 years
- 00:38:49 [kaz]
- q?
- 00:38:54 [kaz]
- ack d
- 00:38:54 [Zakim]
- ddahl, you wanted to talk about chatbots on websites
- 00:38:55 [fantasai]
- kaz: We might want to also rethink EmotionML
- 00:39:04 [fantasai]
- ddahl: I've been noticing about websites recently
- 00:39:13 [fantasai]
- ddahl: complex websites especially tend to have a chatbot
- 00:39:23 [fantasai]
- ddahl: Seems like a failure of the website, that users can't find the information they're looking for
- 00:39:31 [fantasai]
- ddahl: so they add a chatbot to help find information quickly
- 00:39:40 [fantasai]
- ddahl: A very interesting characteristic of voice is that it is semantic
- 00:39:49 [fantasai]
- ddahl: It doesn't require the same kind of navigation that you need in a complex website
- 00:39:55 [fantasai]
- ddahl: theoretically you ask for what you want and you go there
- 00:40:06 [fantasai]
- ddahl: chatbots are normally not voice-enabled, but they are natural-language enabled
- 00:40:17 [fantasai]
- ddahl: and that's an area where we can have some synergy between traditional websites and voice interaction
- 00:40:22 [fantasai]
- kaz: That's a good use case
- 00:40:32 [fantasai]
- kaz: Reminds me of my recent TV
- 00:40:39 [fantasai]
- kaz: It has great capabilities, but there are so many menus
- 00:40:51 [fantasai]
- kaz: I'm not really sure how to use all these given the complicated menus
- 00:40:59 [fantasai]
- kaz: but it has speech recognition, so I can simply talk to that TV
- 00:41:04 [fantasai]
- kaz: "I'd like to watch Dragon Slayer"
- 00:41:20 [fantasai]
- ddahl: That's an amazing use case, because traditionally TV and DVRs were held up as examples of poor user interfaces
- 00:41:31 [fantasai]
- ddahl: Too difficult to even set the time, without lots of struggle
- 00:41:44 [fantasai]
- ddahl: So need to think about how to cut through layers of menus and navigation with voice and natural language
- 00:41:56 [fantasai]
- kaz: These days even TV devices use web interface for their UI
- 00:42:03 [fantasai]
- kaz: TV menu is a kind of web application
- 00:42:12 [fantasai]
- kaz: that implies speech interface is good solution
- 00:42:18 [kaz]
- q?
- 00:42:26 [kaz]
- ack j
- 00:42:38 [kim_patch]
- kim_patch has joined #voice
- 00:42:41 [fantasai]
- Jennie: I thought Kim's point about keyboard shortcut types redirecting is excellent
- 00:42:51 [fantasai]
- Jennie: Can see use case for ppl who use speech who has limited use of vocalization
- 00:43:02 [fantasai]
- Jennie: If there was a way to program instead of using a keyboard shortcut, using a melodic phrase
- 00:43:10 [fantasai]
- Jennie: similar to physical gesture on mobile device
- 00:43:26 [fantasai]
- Jennie: Would be helpful for ppl who are limited, to control devices
- 00:43:34 [fantasai]
- Jennie: Using a shortcut or shorthand of melodic phrase
- 00:43:42 [fantasai]
- Jennie: for ppl who are hospitalized or have limited mobility
- 00:43:50 [kaz]
- q+ kim
- 00:43:53 [kaz]
- ack kim
- 00:44:09 [fantasai]
- Kim: In early days ...
- 00:44:20 [fantasai]
- Kim: But one thing that worked really well was blowing to close the window
- 00:44:31 [fantasai]
- Kim: 5-6 years ago someone was experimenting with that in an engine
- 00:44:48 [fantasai]
- Kim: I think it would work well both for folks who have difficulty vocalizing, and would be neat for other people as well
- 00:44:52 [fantasai]
- Kim: but would have to be easy to do
- 00:45:25 [fantasai]
- ddahl: Needs to be easy to do, but would be interesting to adapt
- 00:45:36 [kaz]
- s/ddahl:/Jennie:/
- 00:45:42 [fantasai]
- Kim: 10yrs ago I was working with ppl who are gesture specialists, and trying to get a grant for combined speech + gesture
- 00:45:57 [Jennie]
- +1 to Kim P!
- 00:45:59 [fantasai]
- Kim: A couple of gestures, a couple of sounds, would add a lot to many use cases
- 00:46:11 [fantasai]
- Kim: True mixed input
- 00:46:20 [ddahl]
- q+
- 00:46:22 [kaz]
- q?
- 00:46:25 [kaz]
- ack d
- 00:46:45 [fantasai]
- ddahl: That was an interesting point about gestures, reminded me of the recent requirements for natural language interfaces just published
- 00:47:04 [fantasai]
- ddahl: They mentioned sign language interpretation in natural language interfaces
- 00:47:08 [fantasai]
- ddahl: that is obviously gesture based
- 00:47:17 [fantasai]
- ddahl: research world
- 00:47:23 [fantasai]
- ddahl: but thinking about gesture-based input
- 00:47:27 [fantasai]
- ddahl: could be personal gestures
- 00:47:33 [fantasai]
- ddahl: or formal language gestures, like sign language
- 00:47:38 [fantasai]
- ddahl: but that would help a lot of people
- 00:47:56 [Jennie]
- q+
- 00:47:56 [fantasai]
- Kim: With mixed input, can do multiple input at the same time that doesn't have to be aware of each other
- 00:48:06 [fantasai]
- Kim: When pointing, computer knows where you're pointing
- 00:48:09 [fantasai]
- Kim: Hard for computer
- 00:48:23 [fantasai]
- Kim: Computer doesn't have to be aware of this
- 00:48:28 [fantasai]
- s/Hard for computer/.../
- 00:48:29 [kaz]
- -> https://www.w3.org/TR/2021/WD-naur-20211012/ Natural Language Interface Accessibility User Requirements
- 00:48:32 [kaz]
- q?
- 00:48:35 [kaz]
- ack j
- 00:48:45 [fantasai]
- Jennie: One of the other questions I had, since I'm not as familiar with the specs
- 00:48:52 [fantasai]
- Jennie: for touchscreen devices and computers
- 00:49:09 [fantasai]
- Jennie: we have ways to control for tremors or repeated actions to choose the right one to respond to
- 00:49:32 [fantasai]
- Jennie: Do we have any consideration for that in voice, e.g. stuttering, to control which sounds the voice assistant would listen to?
- 00:50:00 [Ben]
- Afraid I don't, sorry!
- 00:50:28 [fantasai]
- ddahl: I don't know of anything like that. Would be very useful
- 00:50:38 [fantasai]
- ddahl: Probably some research, especially for stuttering, because it's a very common problem
- 00:50:43 [fantasai]
- ddahl: but still in the research world right now
- 00:50:57 [fantasai]
- Kim: In days of Dragon Dictate, had to pause between words
- 00:51:07 [fantasai]
- Kim: People who had serious speech problems, this worked well for them
- 00:51:24 [fantasai]
- Kim: and so they stuck with it even as speech input became more natural and looked for phrases
- 00:51:41 [fantasai]
- Kim: Speech seems remarkably good at understanding people with a lot of halting, almost better than accents
- 00:51:51 [fantasai]
- Kim: I've been surprised how well it deals with stutters
- 00:52:12 [fantasai]
- kaz: So probably during workshop we should cover those cases as well, what are actual pain points
- 00:52:18 [kaz]
- q?
- 00:52:49 [fantasai]
- Kim: Something else to think about
- 00:52:54 [fantasai]
- Kim: There's a time for natural language
- 00:53:15 [fantasai]
- Kim: And there's a time where it's a lot more useful to have good default set of commands, one way to say something (maybe a few) and let the user change anything they want
- 00:53:30 [fantasai]
- Kim: Dragon made mistake, I think, giving 24 different ways to say "go to end of the line"
- 00:53:40 [Ben]
- This is a link to a research paper titled "A DATASET FOR STUTTERING EVENT DETECTION FROM PODCASTS WITH PEOPLE WHO STUTTER". It might be useful reading material on the subject -> https://arxiv.org/pdf/2102.12394.pdf
- 00:53:41 [fantasai]
- Kim: If you have good defaults, it's much easier to teach someone
- 00:54:03 [fantasai]
- Kim: I think it's really important to think when natural language is better UX and when good default set of commands that can be learned easily and have structure is good
- 00:54:18 [kaz]
- q?
- 00:54:21 [fantasai]
- Kim: The type of interaction, and what fits, has to be considered
- 00:54:33 [fantasai]
- Jennie: Should we try to list topics for the workshop?
- 00:54:36 [Jennie]
- *Thanks for sharing that study Ben
- 00:54:37 [fantasai]
- kaz: yes that's a good idea
- 00:54:52 [fantasai]
- kaz: Starting with existing standards within W3C first
- 00:55:12 [fantasai]
- kaz: Specifications including natural language interface requirements, recent work as well
- 00:55:18 [fantasai]
- ddahl: Some technologies haven't found their way to any specs
- 00:55:25 [fantasai]
- ddahl: Like speaker recognition
- 00:55:28 [kaz]
- s/Jennie:/ddahl:/
- 00:55:38 [fantasai]
- ddahl: Any value to including that in a standard?
- 00:56:07 [fantasai]
- ddahl: What are pain points in a11y? What would be valueable to do in voice?
- 00:56:29 [fantasai]
- ddahl: maybe think about some disabilities that involve voices, either in speaking or hearing
- 00:56:44 [fantasai]
- ddahl: what can we do with text to speech that would cover some of the issues around pronunciation spec
- 00:56:48 [fantasai]
- ddahl: and SSML
- 00:57:04 [jamesn]
- jamesn has joined #voice
- 00:57:25 [fantasai]
- ddahl: I guess EmotionML would be an interesting presentation
- 00:57:46 [fantasai]
- ddahl: Looking at emotions being expressed in text or speech would add a lot to the users' perception of what the web page is trying to say
- 00:58:34 [fantasai]
- Kim: Some research at MIT using common sense database
- 00:58:51 [fantasai]
- Kim: They found it increased recognition a certain percent, but people's perception was that it was more than twice as good
- 00:59:00 [fantasai]
- Kim: I guess because it took out the most stupid mistakes
- 00:59:05 [fantasai]
- Kim: So the user experience was a lot better
- 00:59:29 [fantasai]
- kaz: So will revise workshop proposal based on discussion today
- 00:59:45 [fantasai]
- kaz: Kim, please give us further comments in the workshop committee
- 00:59:53 [kaz]
- -> https://github.com/w3c/strategy/issues/221 workshop proposal
- 00:59:56 [fantasai]
- kaz: would be great if more participants in this session can join the committee
- 01:00:08 [kaz]
- ashimura@w3.org
- 01:00:09 [fantasai]
- kaz: you can directly give your input in GH or contact me at my W3C email address
- 01:00:24 [Jennie]
- *Thank you - very interesting!
- 01:00:31 [fantasai]
- kaz: OK, time to adjourn
- 01:00:35 [fantasai]
- kaz: Thank you everyone!
- 01:00:37 [kaz]
- [adjourned]
- 01:00:38 [BC]
- Thank you
- 01:00:46 [kaz]
- rrsagent, make log public
- 01:00:49 [RRSAgent]
- I have made the request to generate https://www.w3.org/2021/10/18-voice-minutes.html fantasai
- 05:12:26 [Zakim]
- Zakim has left #voice