HTML Speech Incubator Group Teleconference -- 16 Jun 2011

<burn> trackbot, start telcon

<trackbot> Date: 16 June 2011

<burn> Scribe: Patrick_Ehlen

<burn> satish, will you be joining us today? or anyone from Google?

<burn> ScribeNick: ehlen

<satish> burn: I can't join the conf call as I'm on a train, hence only in IRC

<satish> burn: Bjorn is still on paternity leave, not sure about Glen Shires

<burn> satish, thanks. Hopefully Glen will join. We will be making decisions today about other discussion topics

New design decisions?

robert: is audio recording without recognition be supported?

are there important scenarios for supporting recording without recognition

<burn> satish, any update on markup binding?

markup binding

<satish> burn: None, Bjorn was collecting input from the chrome team and since he has gone on leave I have no contact on what the status was.

<burn> satish, can you please check? we are not waiting on the answer, but it would be nice to have the input

robert: google issue on whether there should be a button to press

<satish> burn: yes, I can take an action to get a definitive answer in the next few days.

burn: satish will take this on w/ the chrome team

discussion time

do we need to support audio recording with recognition?

burn: an advantage could be endpointing.
... is that an important criteria in this case as well?

charles: another question is how real-time is the reco response?
... a recording may result in reco later
... an identifier might later associate the recording with a reco transcription

burn: brings up question of whether we support reco on recorded audio

robert: garbage models could be used to make recording in edge cases
... "overloading" recognition
... or will recording be a more common task
... Do we think recording with endpointing is important?

milan: channel adaptation, sharing headers in same structure, parameters could be reused; sharing the same network paths -- convenient to use same

Charles: Also, the on-line vs. off-line cases

milan: would most recording be associated with an attempt to understand the text in the recording?

burn: Most significant feature is the endpointing

milan: in that case, why not just use dict model, do reco, and save the waveform as backup?
... and how common would that be. If not so common, could use a garbage model (even a "first-class" one)

burn: seems strange to call recording a weird special case of reco
... in favor of using the recording resource as described in mrcp

robert: though endpointing may be valuable, would we support a "record" object in the API? how would this go all the way to the developer?

burn: does not seem to be in our scope

olli: there are other proposals that would handle recording

charles: channel adaptation

burn: channel normalization is not a valid reason for recording support

charles: should probably also include built-in record grammar

(milan above)

milan: use case: may want to to do dictation in parallel with c&c
... e.g., provide a c&c followed immediately by dictation

burn: but does that really belong as a built-in type in a grammar?
... sounds like there is not real consensus today vis-a-vis supporting a recording capability

robert: have not heard a compelling reason to support recording

burn: consensus not to do it now

milan: would like a standard way to do it, should the need arise

burn: we could state that we reserve this for the future

milan: there should be some consistent and portable way to do this across engines

robert: could be done as a proprietary extension

milan: at least provide a consistent hack, like builtin:record

robert: that's what the garbage model recording would be

milan: that's fine, as long as all engines support this type of garbage model

burn: to summarize, can't agree on specific recording scenarios

(robert above)

scribe: should agree on supporting garbage-recording scenario

burn: as a group, agree not to define an explicit recording capability at this time.
... can be supported using a garbage model, or capabilities defined outside this group

what are the built-ins, and what does that mean?

milan: existing builtins: dictation, search, address, numbers

robert: already agreed there should be a certain set of predefined grammars
... so how do we refer to those?

burn: 2 things make builtins interesting: (1) parameterization; (2) no language is required

milan: markup already has certain defined types, parameters, etc, as native to HTML5. Would make sense to pay attention to that here

burn: an unconstrained text box should naturally bind to a dictation model

milan: should we remap the names of the builtins?

burn: argue strongly for using html as a starting point

robert: These should be builtins, not re-used vxml grammars

<smaug> could someone paste a link to voicexml's builtin grammars ?

charles: they've become a de facto standard; not supporting them is awkward

<Robert> these are the HTML input types: http://www.w3.org/TR/html5/the-input-element.html#attr-input-type

burn: if someone wants to support legacy builtins in a way that doesn't break existing builtins, that's not a problem

<Robert> perhaps have builtins that match these

charles: there needs to be some way to include these

(milan above)

scribe: is there something about this that can't be represented by a query string?

michael: do you want to reference, for example, an html number type, or some arbitrary number?

milan: easier to use old builtins & augment them

charles: need to look at greater good of using html vs vxml

<mbodell> Widely implemented? See http://en.wikipedia.org/wiki/URI_scheme

burn: michael, how would you reference grammars that are assoc. with html input types?

michael: an html ruleref, with various attributes; or don't specify URI and ref them by markeup AP...
... most important is associating grammars with individual input elements
... not a strong use case to have URIs for these things, or ability for user to write their own that reference these

burn: when people want to hack something up quickly, common input types should lend themselves to being included as part of a larger utterance

michael: may be other ways to specify input for that type of scenarios

burn: maybe reference not the grammar but the input type itself

charles: similar input types not always require the same grammar

burn: but the app author may want a way to link these different types of builtin grammars together

milan: perhaps just do the proposal

burn: who on the call is interested in builtin models?

charles: interested in it; this group seems focused on web search and dictation, as opposed to broader html cases

michael: there will probably be a standard set of grammar libraries, though perhaps the market will provide those

johnston: can't see us requiring something like a "zip code" lib, for internationalization reasons

michael: HTML has already handled a lot of these issues

(milan above)

(michael, above, actually)

milan: should there be an html binding?

michael: would be better if you could speech enable certain input types with little work

robert: if no builtins were specified, what are the consequences?

burn: if you want broad adoptability and usage, it needs to be as easy to create simple apps as vxml

robert: we need it to do the html binding.
... so how much do we need the html binding part?

milan: definitely need the capability to specify search, dictation, etc.

robert: that's different from looking at html input types, etc. that's a complex problem

milan: would like to have a notion of how to solve binding problem before we do dictation

robert: does anyone have a proposal to volunteer?

milan: perhaps can do it after I get the dictation stuff out

micheal: there is a topic in the API about markup bindings.

burn: true that it's a binding issue
... without a proposal, it doesn't happen.
... so it will be up to someone to write a proposal

milan: perhaps sending a message to google on this

robert: or to satish

burn: action item for milan to talk with satish and ask for help on structuring a proposal
... reminder: no call next week

robert: but there will be a protocol meeting

- DRAFT -

HTML Speech Incubator Group Teleconference

16 Jun 2011

Attendees

Contents

New design decisions?

markup binding

discussion time

do we need to support audio recording with recognition?

what are the built-ins, and what does that mean?

Summary of Action Items

Scribe.perl diagnostic output