Voice Interaction -- 09 Sep 2020

<scribe> scribe: ddahl

debbie: hoping to try to go through architecture and use case

<scribe> agenda: connect use case to architecture

dirk: dialog strategy, branding, replace TTS by NLG

debbie: there could also be a voice interaction with audio files or hybrid audio and TTS

jon: what about voice registry and W3C architecture

dirk: we can do that in a follow-up call
... dialog strategy added to document version 1.1

https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-1.htm

dirk: in section 3.2.2
... actually 3.2.2.1
... dialog registry may have to be omitted for some dialog strategies
... should not make any assumptions about the exact strategy

debbie: we could give an example of VoiceXML as a frame-based strategy

dirk: SCXML could be an example of state-based
... Trindikit would be information state update, could be starting to be commercialized

debbie: amazon could be an example of frame-based because of slot-filling

dirk: dialog manager would be a black box
... could there be a standard for dialog representation

debbie: maybe would reduce interest on the part of people with a powerful dialog manager
... could separate dialog definition from execution
... in VoiceXML FIA is separate from the form
... maybe sometime we would want to have a dialog representation language

dirk: Conversational Interfaces Community Group https://www.w3.org/community/conv/
... was defining a dialog scripting language
... programming relying on states

debbie: where do we put the dialog strategy

dirk: not visible, because it defines dynamic behavior
... executed by the dialog manager

<scribe> ACTION:dirk to add examples of dialog strategies

branding

dirk: in this architecture, there's no way to have platform branding
... this could be solved by having a dedicated dialog that does routing forward and backward
... there would be a dedicated core dialog, with its own TTS, or forward the audio directly to IPA
... replace TTS by NLG
... TTS could be part of NLG, "output" would make it more general

debbie: how about "user input" and "system output" and explain in text
... there is a huge use case for visual output
... gestures and gaze are probably not going to be used very soon
... biometrics are common now

dirk: we should give some hints about multimodality but most use cases are voice

use case walkthrough

debbie: how do we represent that

dirk: numbers in diagram referring to text
... sequence diagram would be hard to understand

debbie: make steps in use case more explicit and number them

<scribe> ACTION: debbie to update use case

debbie: two separate diagrams
... first step is to expand on use case, then draw a second diagram
... should be able to do a bulleted list by tomorrow

dirk: can update the diagram tomorrow

<scribe> ACTION:dirk to update diagram with numbers referring to use case

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]

This is scribe.perl Revision of Date 
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/bsed/based/
Present: debbie jon dirk
Found Scribe: ddahl
Inferring ScribeNick: ddahl

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: debbie dirk

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)

[End of scribe.perl diagnostic output]

- DRAFT -

Voice Interaction

09 Sep 2020

Attendees

Contents

branding

use case walkthrough

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output