IRC log of voice on 2021-10-18

Timestamps are in UTC.

06:45:28 [RRSAgent]
RRSAgent has joined #voice
06:45:28 [RRSAgent]
logging to https://www.w3.org/2021/10/18-voice-irc
06:45:31 [dom]
RRSAgent, stay
06:45:34 [dom]
RRSAgent, make log public
06:45:50 [dom]
RRSAgent, this meeting spans midnight
14:28:54 [kaz]
kaz has joined #voice
23:45:09 [kaz]
kaz has joined #voice
23:45:45 [Zakim]
Zakim has joined #voice
23:45:51 [kaz]
rrsagent, bye
23:45:51 [RRSAgent]
I see no action items
23:45:56 [RRSAgent]
RRSAgent has joined #voice
23:45:56 [RRSAgent]
logging to https://www.w3.org/2021/10/18-voice-irc
23:49:11 [kaz]
meeting: Next Directions for Voice and the Web Breakout
23:55:08 [takio]
takio has joined #voice
23:56:08 [Ben]
Ben has joined #voice
23:59:16 [kaz]
present+ Kaz_Ashimura__W3C, Bev_Corwin, Francis_Storr, Jennie_Delisi Masakakazu_Kitahara, Takio_Yamaoka__Yahoo_Japan
23:59:30 [Jennie]
Jennie has joined #voice
23:59:30 [kaz]
present+ Makoto_Murata__DAISY
23:59:54 [kaz]
present+ Muhammad Sam_Kanta, Tomoaki_Mizushima__IRI
00:00:01 [MURATA_]
MURATA_ has joined #voice
00:00:01 [kaz]
zakim, who is on the call?
00:00:02 [Zakim]
Present: Kaz_Ashimura__W3C, Bev_Corwin, Francis_Storr, Jennie_Delisi, Masakakazu_Kitahara, Takio_Yamaoka__Yahoo_Japan, Makoto_Murata__DAISY, Muhammad, Sam_Kanta,
00:00:04 [Zakim]
... Tomoaki_Mizushima__IRI
00:00:44 [BC]
BC has joined #voice
00:00:49 [BC]
Hello
00:01:12 [kirkwood]
kirkwood has joined #voice
00:01:13 [MasakazuKitahara]
MasakazuKitahara has joined #voice
00:01:17 [MURATA_]
present+
00:01:22 [MasakazuKitahara]
present+
00:01:29 [fantasai]
fantasai has joined #voice
00:01:44 [Jennie]
present+
00:02:13 [Ben]
present+
00:03:09 [fantasai]
scribenick: fantasai
00:03:14 [fantasai]
kaz: Thanks for joining this breakout session
00:03:31 [fantasai]
kaz: This is a breakout session on new directions for Voice and Web
00:03:39 [fantasai]
kaz: There was a breakout panel during AC meeting
00:03:51 [fantasai]
kaz: discussion about how to improve web speech capabilities in general
00:04:19 [fantasai]
kaz: There were several breakout sessions previously (previous TPAC??)
00:04:27 [fantasai]
kaz: We want to summarize situation and figure out how to improve
00:04:37 [kaz]
-> https://www.w3.org/2021/Talks/1018-voice-dd-ka/20211018-voice-breakout-dd-ka.pdf slides
00:04:52 [fantasai]
kaz: First, reviewing existing standards and requirements for voice and web
00:05:05 [fantasai]
kaz: Then would like to look into the issue of interop among voice agents
00:05:13 [fantasai]
kaz: Then think about potential voice workshop
00:05:29 [fantasai]
kaz: If you have any questions please raise your hand on Zoom chat, or type q+ on IRC
00:05:58 [fantasai]
[slide 2]
00:06:07 [fantasai]
kaz: Existing mechanisms for speech interfaces
00:06:16 [fantasai]
kaz: We used to have markup languages like VoiceXML and SSML
00:06:28 [fantasai]
kaz: There was also CSS speech modules
00:06:38 [fantasai]
kaz: And Web Speech API
00:06:48 [fantasai]
kaz: Lastly there's specification for spoken presentation in HTML WD
00:07:01 [fantasai]
kaz: Most popular one is Web Speech API, but this is not a W3C REC but a CG report
00:07:04 [fantasai]
kaz: so that's a question
00:07:06 [fantasai]
[slide 3]
00:07:18 [fantasai]
kaz: As voice agents are getting more and more popular, and very useful
00:08:13 [ddahl]
ddahl has joined #voice
00:09:05 [fantasai]
kaz: Need improved voice agents
00:09:15 [Tomoaki_Mizushima]
Tomoaki_Mizushima has joined #voice
00:09:30 [fantasai]
[slide 4]
00:09:35 [fantasai]
kaz: Interoperability of voice agents
00:09:40 [fantasai]
kaz: local voice agent or on the cloud side
00:09:47 [fantasai]
kaz: most are proprietary, and not based on actual standards
00:09:55 [fantasai]
kaz: speech API is very convenient but not a standard yet
00:10:01 [fantasai]
kaz: Desktop and mobile apps, various implementations
00:10:08 [fantasai]
kaz: how can we get them to interoperate with each other?
00:10:15 [fantasai]
kaz: Do we need some standards-based infrastructure?
00:10:26 [fantasai]
kaz: Voice Interaction CG chaired by David has been working on interop issues
00:10:31 [fantasai]
kaz: will meet next week during TPAC
00:10:55 [fantasai]
[slide 5]
00:11:44 [fantasai]
s/David/ddahl/
00:12:00 [fantasai]
ddahl: Our CG has been working on voice and web, focusing on interop among intelligent personal assistants right now
00:12:14 [fantasai]
ddahl: We've noticed that these assistants (like Siri, Cortana, Alexa, etc.)
00:12:22 [fantasai]
ddahl: they really have a lot in common in terms of what they are useful for
00:12:40 [kaz]
i|slide 5|-> https://www.w3.org/community/voiceinteraction/ Voice Interaction CG|
00:12:41 [fantasai]
ddahl: Like a web page, their goal is to help users find info, learn things, be entertained, and also intelligent personal assistance
00:12:56 [fantasai]
ddahl: They communicate with servers on the internet, which contribute functionality in service of their goals
00:13:18 [fantasai]
ddahl: Two types of interacting are different because web page is primarily graphical UI and PA is primarily voice interaction
00:13:27 [fantasai]
s/PA/IPA/
00:13:34 [fantasai]
ddahl: But there are some arbitrary differences also
00:13:46 [fantasai]
ddahl: web page rendered in browser; IPA in a proprietary platform
00:13:57 [fantasai]
ddahl: but that's an arbitrary architectural difference that devs of IPAs have chosen to use
00:14:02 [fantasai]
ddahl: web pages run in any browser
00:14:08 [fantasai]
ddahl: but IPAs only run on their own platform
00:14:17 [fantasai]
ddahl: If you have Amazon function it can't run on the Web, it can't run on your phone
00:14:25 [fantasai]
ddahl: it runs only on its own proprietary smart speaker
00:14:38 [fantasai]
ddahl: similarly, web pages are very familiary with URL mechanism or search engine
00:14:51 [fantasai]
ddahl: IPA is found through its proprietary platform, however that platform chooses to make it available
00:15:01 [fantasai]
ddahl: So finding functionality is purely proprietary
00:15:13 [fantasai]
[next slide]
00:15:25 [fantasai]
slide depicts diagram of IPA architecture
00:15:34 [fantasai]
ddahl: Focus on the three major boxes
00:15:42 [fantasai]
ddahl: First box is data capture parts of functionality
00:15:49 [fantasai]
ddahl: In case if IPA, most typicaly want to capture speech
00:15:55 [fantasai]
ddahl: compared to web page, we're capturing user input
00:16:00 [kaz]
s/next slide/slide 6/
00:16:17 [fantasai]
ddahl: function in the middle is basically does the intelligent parts of the processing
00:16:24 [fantasai]
ddahl: This is analogous to a browser
00:16:32 [fantasai]
ddahl: On the right we have connection to other functionalities
00:16:39 [fantasai]
ddahl: other IPAs or other web sites
00:16:44 [fantasai]
ddahl: Found through search engine, DNS, combination
00:16:58 [fantasai]
ddahl: Rightmost part of this box we find other functionalities
00:17:09 [fantasai]
ddahl: e.g. the websites themselves, in the case of an IPA some other IPA
00:17:14 [fantasai]
ddahl: For example looking for shopping site
00:17:21 [fantasai]
ddahl: want to find interoperably from UI
00:17:25 [fantasai]
ddahl: That's architecture that we're looking at
00:17:28 [fantasai]
ddahl: seems parallel to Web
00:17:34 [fantasai]
ddahl: we'd like to be able to make those alignments possible
00:17:45 [fantasai]
ddahl: and use as much of the existing Web infrastructure as possible for IPAs to be interoperable
00:17:54 [fantasai]
[next slide]
00:18:05 [fantasai]
kaz: There are many issues emerging these days
00:18:29 [fantasai]
kaz: So we'd like to organize a dedicated W3C workshop to summarize the current situation, the pain points, and discuss how we could solve and improve the situation
00:18:37 [fantasai]
kaz: by providing e.g. a forum fo rjoint discussion by related stakeholders
00:18:46 [fantasai]
kaz: I've created a dedicated GH issue in the strategy repo
00:18:56 [fantasai]
-> https://github.com/w3c/strategy/issues/221
00:19:07 [fantasai]
kaz: Please join the workshop and give your thoughts, pain points, solitions
00:19:12 [fantasai]
s/solition/solutions/
00:19:20 [fantasai]
kaz: Any questions, comments?
00:19:21 [kaz]
s/next slide/slide 7
00:19:35 [Sam]
Sam has joined #voice
00:19:53 [fantasai]
kaz: Murata-san, you were very interested in a11y in general and also interaction of ruby and speech
00:19:59 [fantasai]
kaz: interested in this workshop?
00:20:14 [fantasai]
MURATA_: Yes, interested, and wondering what are the existing obstacles to existing specifications?
00:20:19 [fantasai]
MURATA_: Why are they not widely used?
00:20:26 [fantasai]
kaz: There are various approaches to this
00:20:37 [fantasai]
kaz: e.g. markup-based approach like VoiceXML/SSML
00:20:41 [fantasai]
kaz: and CSS-based approach
00:20:44 [fantasai]
kaz: and JS-based approach
00:20:57 [fantasai]
kaz: So we should think about how to integrate all these mechanisms into common speech platform
00:21:12 [fantasai]
kaz: and have content authors and applications able to use various features for controlling speech freely and nicely
00:21:21 [fantasai]
kaz: that kind of integration should be one discussion point for the workshop as well
00:21:38 [fantasai]
kaz: You have been working on text information. Part of this, pronunciation specification, should also be included
00:21:41 [fantasai]
MURATA_: yes
00:21:55 [fantasai]
kaz: any other questions/comments/opinions/ideas?
00:22:01 [fantasai]
MURATA_: Let me report one thing about EPU
00:22:03 [kaz]
q?
00:22:08 [fantasai]
MURATA_: EPUB3 has included SSML and PLS
00:22:15 [fantasai]
MURATA_: But now EPUB3 is heading for Recommendation
00:22:28 [fantasai]
MURATA_: and some in WG don't want to include features that are not widely implemented
00:22:41 [fantasai]
MURATA_: so WG decided to move SSML and PLS to a separate note, which is maintained by the EPUB WG
00:22:49 [fantasai]
MURATA_: But that spec is detached from mainstream EPUB
00:22:55 [fantasai]
MURATA_: Not intended to be a Recommendation in the near future
00:23:03 [fantasai]
MURATA_: On the other hand, I know some Japanese companies use SSML and PLS
00:23:09 [kaz]
q?
00:23:11 [fantasai]
MURATA_: One company uses PLS and a few use SSML
00:23:22 [fantasai]
MURATA_: In particular, the biggest textbook publisher in Japan uses SSML
00:23:42 [fantasai]
MURATA_: And I hear the cost of ebook is 3-4 times more if try to really incorporate SSML and try to make everything natural
00:23:59 [fantasai]
MURATA_: For textbooks, wrong pronunciation is very problematic, especially for new language learners
00:24:06 [fantasai]
MURATA_: It is therefore worth the cost for these cases
00:24:15 [fantasai]
MURATA_: But it is not cost-effective for broader materials
00:24:26 [fantasai]
MURATA_: So SSML-based approach can't scale
00:24:31 [fantasai]
MURATA_: But more optimistic about PLS
00:24:39 [fantasai]
MURATA_: Japanese manga and novels, character names are unreadable
00:24:46 [fantasai]
MURATA_: If you use PLS you have to describe each name only once
00:24:59 [fantasai]
MURATA_: Dragon Slayer is very common, but doesn't read well using text to speech
00:25:03 [fantasai]
MURATA_: I'm hoping that PLS would make things better
00:25:20 [fantasai]
kaz: As former Team contact for Voice group, I love SSML 1.1 and PLS 1.0
00:25:31 [fantasai]
kaz: I would like to see the potential for improving those specifications further
00:25:45 [fantasai]
kaz: Also, there's possibility that we might want an even newer mechanism to achieve the requirements
00:26:00 [fantasai]
kaz: For example, Léonie mentioned it is maybe good time to re-start speech work in W3C, during AC meeting
00:26:06 [fantasai]
kaz: Personally I would like to say Yes!
00:26:15 [fantasai]
kaz: So I think a workshop would be a good starting point for that direction
00:26:25 [fantasai]
kaz: Any other viewpoints?
00:26:28 [kaz]
q?
00:26:45 [kaz]
q+ ddahl
00:26:46 [fantasai]
ddahl: Want to say something about why things not implemented in browsers
00:26:47 [kaz]
ack d
00:26:56 [fantasai]
ddahl: Since those early specifications, technology has gotten much stronger
00:27:05 [fantasai]
ddahl: previously, speech recognition did not work well
00:27:11 [fantasai]
ddahl: now text to speech works much better also
00:27:25 [fantasai]
ddahl: So I think much of this was marginalized, it didn't work, and wouldn't use it
00:27:31 [fantasai]
ddahl: was considered it wouldn't have anything to do with the Web
00:27:37 [fantasai]
ddahl: but now the tech is far better than it was at the time
00:27:43 [fantasai]
ddahl: It really does make sense to look at how it is used in the browser
00:27:51 [kaz]
q?
00:27:56 [kaz]
ack f
00:28:12 [kaz]
fantasai: CSS and PLS seem to very different
00:28:17 [kaz]
... CSS is about styling
00:28:25 [kaz]
... not closely tied with each other
00:28:51 [kaz]
... you definitely can't have only CSS speech module but could use it to extend what is existing
00:29:02 [kaz]
... cue sound, etc.
00:29:06 [kaz]
... sifting volume, etc.
00:29:26 [fantasai]
s/sound/sound, pauses/
00:29:29 [kaz]
... can't change spoken pronunciation itself
00:29:33 [kaz]
q?
00:29:47 [kaz]
... maybe we need new technology
00:29:55 [kaz]
... what is missing for that
00:30:00 [fantasai]
s/maybe we need/you said maybe we need/
00:30:11 [fantasai]
s/for that/that we need to create technology for?
00:30:31 [fantasai]
kaz: I was thinking about how to integrate various modalities
00:30:35 [fantasai]
kaz: that are not interoperable currently
00:30:48 [fantasai]
kaz: also how to implement dialog processing for interactive services
00:30:54 [fantasai]
kaz: and possible integration with IoT services
00:31:20 [fantasai]
kaz: so 2001 Space Odessey, asking for voice as a key for opening the dooor
00:31:29 [takio]
q+
00:31:30 [fantasai]
kaz: maybe because I'm working for WoT and Smart Cities as well
00:31:43 [fantasai]
kaz: my dream is to apply voice technology as part of user interfaces for IoT and smart cities
00:31:52 [kaz]
q?
00:32:36 [fantasai]
????: I have a lot of opinions on what's needed. Used voice interface for 20+ years
00:32:44 [fantasai]
????: I had to use totally hands free for 3 yrs
00:32:45 [kaz]
s/????:/kim:/
00:32:47 [kaz]
s/????:/kim:/
00:32:56 [fantasai]
kim: Now also use wacom tablet
00:33:04 [fantasai]
kim: Speech is not really well integrated with other forms of input
00:33:18 [fantasai]
kim: If speech was well implemented, many people would use a little bit. A few people would use for everything.
00:33:22 [fantasai]
kim: There's so much that is not there
00:33:31 [fantasai]
kim: You were talkinga bout it being siloed, and that's one of the problems
00:33:39 [fantasai]
kim: for example, when you have keyboard shortcuts
00:33:45 [fantasai]
kim: Sometimes you can change it, and that's great
00:34:02 [fantasai]
kim: But can only link to letters now. Would be great to integrate with speech
00:34:06 [kaz]
q?
00:34:07 [ddahl]
q+ to talk about chatbots on websites
00:34:12 [kaz]
q?
00:34:14 [fantasai]
kim: Instead of thinking as another input method, how do you put alongside
00:34:25 [fantasai]
kim: It should be something with good defaults and works alongside everything else
00:34:32 [fantasai]
kim: Getting there more with Siri etc.
00:34:41 [fantasai]
kim: If you say "search the web for green apples" it's faster than typing
00:34:50 [Jennie]
+1 to Kim Patch - would also see a need for sounds/vocal melodies. Some cannot articulate clear words but can make a melody.
00:34:52 [fantasai]
kim: but big gaps, I think because of the underlying technology
00:35:00 [fantasai]
kim: But I think speech has a ton of potential
00:35:05 [fantasai]
kim: I can show some of it using custom stuff
00:35:10 [fantasai]
kim: that really has not been realized
00:35:16 [fantasai]
kim: But it's also used some places where it shouldn't be used
00:35:26 [fantasai]
kim: Send is a really bad one-word speech command!
00:35:34 [fantasai]
kim: I see a lot of stuff being implemented that is not well thought through
00:35:43 [fantasai]
kim: It's too bad that more of us don't use a little bit of speech
00:35:55 [Jennie]
* kaz sure
00:35:57 [fantasai]
kim: Also some problems like e.g. need to have a good microphone
00:36:08 [fantasai]
kim: Engines are getting better, but have to make sure didn't record something totally off the wall
00:36:16 [kaz]
q?
00:36:24 [kaz]
ack t
00:36:30 [fantasai]
takio: Thanks for presentation today
00:36:30 [kaz]
q+ Jennie
00:36:40 [fantasai]
takio: I'm new around here, not sure about this specification
00:36:50 [fantasai]
takio: but I'm concerned about emotional things (?)
00:36:59 [fantasai]
takio: e.g. if ...
00:37:09 [fantasai]
takio: If laughing or angry, this may be dropped
00:37:24 [fantasai]
takio: So I'm concerned about these specifications, if they take care of emotional expression
00:37:30 [fantasai]
takio: Also asking about intermediate formats
00:37:33 [fantasai]
takio: e.g. ...
00:37:40 [fantasai]
takio: e.g. emotional info is important for that person
00:38:08 [fantasai]
kaz: For example, some telecom companies or research companies have been working on extracting emotion info from speech
00:38:18 [fantasai]
kaz: and trying to deal with that information once we've extracted some of it
00:38:18 [kaz]
-> https://www.w3.org/TR/emotionml/ EmotionML
00:38:30 [fantasai]
kaz: There is a dedicated specification to describe emotional information, named EmotionML
00:38:41 [fantasai]
kaz: As debbie also mentioned, speech tech has improved a lot the last 10 years
00:38:49 [kaz]
q?
00:38:54 [kaz]
ack d
00:38:54 [Zakim]
ddahl, you wanted to talk about chatbots on websites
00:38:55 [fantasai]
kaz: We might want to also rethink EmotionML
00:39:04 [fantasai]
ddahl: I've been noticing about websites recently
00:39:13 [fantasai]
ddahl: complex websites especially tend to have a chatbot
00:39:23 [fantasai]
ddahl: Seems like a failure of the website, that users can't find the information they're looking for
00:39:31 [fantasai]
ddahl: so they add a chatbot to help find information quickly
00:39:40 [fantasai]
ddahl: A very interesting characteristic of voice is that it is semantic
00:39:49 [fantasai]
ddahl: It doesn't require the same kind of navigation that you need in a complex website
00:39:55 [fantasai]
ddahl: theoretically you ask for what you want and you go there
00:40:06 [fantasai]
ddahl: chatbots are normally not voice-enabled, but they are natural-language enabled
00:40:17 [fantasai]
ddahl: and that's an area where we can have some synergy between traditional websites and voice interaction
00:40:22 [fantasai]
kaz: That's a good use case
00:40:32 [fantasai]
kaz: Reminds me of my recent TV
00:40:39 [fantasai]
kaz: It has great capabilities, but there are so many menus
00:40:51 [fantasai]
kaz: I'm not really sure how to use all these given the complicated menus
00:40:59 [fantasai]
kaz: but it has speech recognition, so I can simply talk to that TV
00:41:04 [fantasai]
kaz: "I'd like to watch Dragon Slayer"
00:41:20 [fantasai]
ddahl: That's an amazing use case, because traditionally TV and DVRs were held up as examples of poor user interfaces
00:41:31 [fantasai]
ddahl: Too difficult to even set the time, without lots of struggle
00:41:44 [fantasai]
ddahl: So need to think about how to cut through layers of menus and navigation with voice and natural language
00:41:56 [fantasai]
kaz: These days even TV devices use web interface for their UI
00:42:03 [fantasai]
kaz: TV menu is a kind of web application
00:42:12 [fantasai]
kaz: that implies speech interface is good solution
00:42:18 [kaz]
q?
00:42:26 [kaz]
ack j
00:42:38 [kim_patch]
kim_patch has joined #voice
00:42:41 [fantasai]
Jennie: I thought Kim's point about keyboard shortcut types redirecting is excellent
00:42:51 [fantasai]
Jennie: Can see use case for ppl who use speech who has limited use of vocalization
00:43:02 [fantasai]
Jennie: If there was a way to program instead of using a keyboard shortcut, using a melodic phrase
00:43:10 [fantasai]
Jennie: similar to physical gesture on mobile device
00:43:26 [fantasai]
Jennie: Would be helpful for ppl who are limited, to control devices
00:43:34 [fantasai]
Jennie: Using a shortcut or shorthand of melodic phrase
00:43:42 [fantasai]
Jennie: for ppl who are hospitalized or have limited mobility
00:43:50 [kaz]
q+ kim
00:43:53 [kaz]
ack kim
00:44:09 [fantasai]
Kim: In early days ...
00:44:20 [fantasai]
Kim: But one thing that worked really well was blowing to close the window
00:44:31 [fantasai]
Kim: 5-6 years ago someone was experimenting with that in an engine
00:44:48 [fantasai]
Kim: I think it would work well both for folks who have difficulty vocalizing, and would be neat for other people as well
00:44:52 [fantasai]
Kim: but would have to be easy to do
00:45:25 [fantasai]
ddahl: Needs to be easy to do, but would be interesting to adapt
00:45:36 [kaz]
s/ddahl:/Jennie:/
00:45:42 [fantasai]
Kim: 10yrs ago I was working with ppl who are gesture specialists, and trying to get a grant for combined speech + gesture
00:45:57 [Jennie]
+1 to Kim P!
00:45:59 [fantasai]
Kim: A couple of gestures, a couple of sounds, would add a lot to many use cases
00:46:11 [fantasai]
Kim: True mixed input
00:46:20 [ddahl]
q+
00:46:22 [kaz]
q?
00:46:25 [kaz]
ack d
00:46:45 [fantasai]
ddahl: That was an interesting point about gestures, reminded me of the recent requirements for natural language interfaces just published
00:47:04 [fantasai]
ddahl: They mentioned sign language interpretation in natural language interfaces
00:47:08 [fantasai]
ddahl: that is obviously gesture based
00:47:17 [fantasai]
ddahl: research world
00:47:23 [fantasai]
ddahl: but thinking about gesture-based input
00:47:27 [fantasai]
ddahl: could be personal gestures
00:47:33 [fantasai]
ddahl: or formal language gestures, like sign language
00:47:38 [fantasai]
ddahl: but that would help a lot of people
00:47:56 [Jennie]
q+
00:47:56 [fantasai]
Kim: With mixed input, can do multiple input at the same time that doesn't have to be aware of each other
00:48:06 [fantasai]
Kim: When pointing, computer knows where you're pointing
00:48:09 [fantasai]
Kim: Hard for computer
00:48:23 [fantasai]
Kim: Computer doesn't have to be aware of this
00:48:28 [fantasai]
s/Hard for computer/.../
00:48:29 [kaz]
-> https://www.w3.org/TR/2021/WD-naur-20211012/ Natural Language Interface Accessibility User Requirements
00:48:32 [kaz]
q?
00:48:35 [kaz]
ack j
00:48:45 [fantasai]
Jennie: One of the other questions I had, since I'm not as familiar with the specs
00:48:52 [fantasai]
Jennie: for touchscreen devices and computers
00:49:09 [fantasai]
Jennie: we have ways to control for tremors or repeated actions to choose the right one to respond to
00:49:32 [fantasai]
Jennie: Do we have any consideration for that in voice, e.g. stuttering, to control which sounds the voice assistant would listen to?
00:50:00 [Ben]
Afraid I don't, sorry!
00:50:28 [fantasai]
ddahl: I don't know of anything like that. Would be very useful
00:50:38 [fantasai]
ddahl: Probably some research, especially for stuttering, because it's a very common problem
00:50:43 [fantasai]
ddahl: but still in the research world right now
00:50:57 [fantasai]
Kim: In days of Dragon Dictate, had to pause between words
00:51:07 [fantasai]
Kim: People who had serious speech problems, this worked well for them
00:51:24 [fantasai]
Kim: and so they stuck with it even as speech input became more natural and looked for phrases
00:51:41 [fantasai]
Kim: Speech seems remarkably good at understanding people with a lot of halting, almost better than accents
00:51:51 [fantasai]
Kim: I've been surprised how well it deals with stutters
00:52:12 [fantasai]
kaz: So probably during workshop we should cover those cases as well, what are actual pain points
00:52:18 [kaz]
q?
00:52:49 [fantasai]
Kim: Something else to think about
00:52:54 [fantasai]
Kim: There's a time for natural language
00:53:15 [fantasai]
Kim: And there's a time where it's a lot more useful to have good default set of commands, one way to say something (maybe a few) and let the user change anything they want
00:53:30 [fantasai]
Kim: Dragon made mistake, I think, giving 24 different ways to say "go to end of the line"
00:53:40 [Ben]
This is a link to a research paper titled "A DATASET FOR STUTTERING EVENT DETECTION FROM PODCASTS WITH PEOPLE WHO STUTTER". It might be useful reading material on the subject -> https://arxiv.org/pdf/2102.12394.pdf
00:53:41 [fantasai]
Kim: If you have good defaults, it's much easier to teach someone
00:54:03 [fantasai]
Kim: I think it's really important to think when natural language is better UX and when good default set of commands that can be learned easily and have structure is good
00:54:18 [kaz]
q?
00:54:21 [fantasai]
Kim: The type of interaction, and what fits, has to be considered
00:54:33 [fantasai]
Jennie: Should we try to list topics for the workshop?
00:54:36 [Jennie]
*Thanks for sharing that study Ben
00:54:37 [fantasai]
kaz: yes that's a good idea
00:54:52 [fantasai]
kaz: Starting with existing standards within W3C first
00:55:12 [fantasai]
kaz: Specifications including natural language interface requirements, recent work as well
00:55:18 [fantasai]
ddahl: Some technologies haven't found their way to any specs
00:55:25 [fantasai]
ddahl: Like speaker recognition
00:55:28 [kaz]
s/Jennie:/ddahl:/
00:55:38 [fantasai]
ddahl: Any value to including that in a standard?
00:56:07 [fantasai]
ddahl: What are pain points in a11y? What would be valueable to do in voice?
00:56:29 [fantasai]
ddahl: maybe think about some disabilities that involve voices, either in speaking or hearing
00:56:44 [fantasai]
ddahl: what can we do with text to speech that would cover some of the issues around pronunciation spec
00:56:48 [fantasai]
ddahl: and SSML
00:57:04 [jamesn]
jamesn has joined #voice
00:57:25 [fantasai]
ddahl: I guess EmotionML would be an interesting presentation
00:57:46 [fantasai]
ddahl: Looking at emotions being expressed in text or speech would add a lot to the users' perception of what the web page is trying to say
00:58:34 [fantasai]
Kim: Some research at MIT using common sense database
00:58:51 [fantasai]
Kim: They found it increased recognition a certain percent, but people's perception was that it was more than twice as good
00:59:00 [fantasai]
Kim: I guess because it took out the most stupid mistakes
00:59:05 [fantasai]
Kim: So the user experience was a lot better
00:59:29 [fantasai]
kaz: So will revise workshop proposal based on discussion today
00:59:45 [fantasai]
kaz: Kim, please give us further comments in the workshop committee
00:59:53 [kaz]
-> https://github.com/w3c/strategy/issues/221 workshop proposal
00:59:56 [fantasai]
kaz: would be great if more participants in this session can join the committee
01:00:08 [kaz]
ashimura@w3.org
01:00:09 [fantasai]
kaz: you can directly give your input in GH or contact me at my W3C email address
01:00:24 [Jennie]
*Thank you - very interesting!
01:00:31 [fantasai]
kaz: OK, time to adjourn
01:00:35 [fantasai]
kaz: Thank you everyone!
01:00:37 [kaz]
[adjourned]
01:00:38 [BC]
Thank you
01:00:46 [kaz]
rrsagent, make log public
01:00:49 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/10/18-voice-minutes.html fantasai
05:12:26 [Zakim]
Zakim has left #voice