HTML Speech Incubator Group Teleconference -- 30 Jun 2011

<trackbot> Date: 30 June 2011

<scribe> scribe: ddahl

review updated final report draft

dan: email me if you have problems

http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html

approve proposed changes to report draft

dan: marc suggested wording changes to requirements, we should approve
... i don't agree with all of them, redundancy isn't a problem
... propose making changes based on our current understanding. let me know if you have concerns.

status report from the WebAPI subgroup

dan: we'll start with the status and bring up anything that should be discussed in the larger group
... fyi, will leave for half an hour half an hour in

michael: will start discussing drafts

dan: any general discussion?

michael: not yet.
... Raj is doing summary of requirements and design decisions, we don't know if there will be directional changes.

dan: is there any discussion from the rest of the group?

WebAPI subgroup

danD: the idea was that I can create an object that isn't necessarily the ASR or TTS object, and then I can bind to the service.
... the protocol will drive some of the parameters
... will send an update based on bjorn's comments

bjorn: i'm fine with the functionality, but maybe we do need two objects

danD: will try to blend proposal with bjorn's comments

michael: do we agree or not on two vs. one interface?

danD: I don't know at the time when i do the query what services will be provided, TTS, ASR, or both

bjorn: does it make sense to have a service that can provide both?

michael: we do have a discussion point on this

danD: having an interface bridge won't hurt

bjorn: my objection to having a single one is that it makes the interface more complicated
... i want to be able to handle the case where i have one or the other or both

michael: other comments on Dan's interface?

danD: this won't be a full-fledged API or module in itself, it's just initialization
... we should start building a table saying "these are the things I want to identify"

bjorn: if i want to have support for ASR or TTS it's hard to see what the API is. what if they are two different services. you have to do a bunch of checking flags.

olli: it depends on whether the parameters are the same for both cases.

bjorn: you also do totally different things with different services. there would need to be some kind of generic interface

michael: it would succeed or fail depending on what you asked it to do.

bjorn: it's better to specify two objects than having one giant object

<satish> (I got disconnected and will try calling in again)

bjorn: it's a syntactic issue

michael: it also depends on whether there are a lot of services that are one or another

bjorn: what parameters do you need to specify? URI, language, non-standard things like non-standard grammar format.

michael: other parameters?

michaelJ: grammar?

bjorn: this is querying for capabilities of the recognizer
... it would make sense for the grammar to be a parameter, for example if you had some specific grammars, like "support for a specific grammar like 'date'".

michael: that could be for the moral equivalent of the builtins

dan: we're touching on some issues that we've already decided on, so we shouldn't revisit decisions that we already made

bjorn: standard queries would be grammar, language, and vendor-specific, so it doesn't matter too much if we have one API or two

michael: you may want to give them to the recognizer, not get them back from the recognizer

danD: we talked about not wanted to disclose what the application wanted to do.

bjorn: should get a list of what grammars and languages the recognizer supports

michael: it should accept a list of grammars and languages as it's criteria and you get an engine back
... should return failure if the service can't support all the languages, but in the case of languages you might want to know if the service supports a subset

bjorn: someone could pass in a list of all the languages in the world

olli: the user agent should be able to ask the user

danD: if i just ask what languages you support, how is that a privacy issue?

olli: if the service supports only Finnish and English, you could guess that i'm Finnish

<bringert> I got disconnected

michael: you could also use the API for the local device that always has the user's language on it.
... services don't have to necessarily be honest about their answers

glenn: this seems like a major limitation that we're putting on developers for privacy reasons.

bjorn: regardless, we should say "give me a service that supports XYZ", and it's ok for the service to say "no comment"

michael: we want to allow the user to customize the service

charles: web servers already get the locale

olli: getting supported languages is just another data about the user

bjorn: most common use case is ASR and TTS for locale, so how about if we just get the locale language

olli: that might work

danD: so far, we should be able to provide the filter criteria for the grammar and the language, it should be optional, will get another version, we can discuss further

bjorn: we could say that the default locale language is supported, it's the additional languages that are supported that we have to think about

danD: will start a table of other attributes that should be available at initialization
... and will get an update

michael: now look at HTML bindings

bjorn: would like there to be an element that can be standalone or enclosed in other elements
... not sure about control element
... the important things for me on the recognition element, it should be possible for the web app author to put it on a form

olli: how do you actually bind the value?

bjorn: the definition of a value for a form control is that it's always a string without formatting
... not so obvious for checkbox, it has to be defined for each type
... it's the kind of think you put in the "value" attribute for non-text elements
... for textarea or content editable it's the text

olli: automatic binding in X+V was annoying

michael: the difference is the optionality, you don't have to do it. as for the microphone, the reco image is platform-specific, microphone, button, etc.

olli: the graphical presention could be problematic

bjorn: each browser will have to decide what security model it wants to implement

michael: not sure about usefullness of the form, but the "for" does seem useful

bjorn: form is just a convenience

<burn> hey, sounds like bjorn wants voicexml :)

bjorn: should we look at label?
... the HTML label does what we want
... we want to do the same things that label does

olli: when will user give permission?

michael: each browser will be different
... some people want the button to appear on the screen without asking permission

bjorn: Google Voice search, for example, you don't want to have to prompt the user every time

olli: worried about when user will give permission

bjorn: easier in the CaptureAPI case if there's no markup

michael: you need to check for permission when you do the reco, not just to have a reco object

olli: if the user never wants speech, maybe the browser doesn't even render the microphone

bjorn: olli, are you still concerned about consistency of permission policy?

olli: my concerns are that the user agent needs permission before using the reco object

bjorn: is the CaptureAPI similar to the Javascript recognition API?

olli: you get similar data in CaptureAPI and reco

<smaug> http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication

bjorn: you can get a "permission denied" error code, that's very similar to our API

michael: what doesn't work is that the permission check happens before the binding

danD: there are two steps, one the rendering of the object, and then the user decides to use that UI element, and that's a privacy and consent issue
... it makes more sense if it doesn't even prompt the user until it knows something is there

olli: a query to find out what kind of recognizer object is available is ok

bjorn: do you see a problem with the HTML API having a different method?
... i think browsers should implement permission after the user clicks the button

olli: what if user has already started speaking

bjorn: no permission could either cancel or not start recognition

michael: user should be able to revoke permission

bjorn: these things are up to the user agent, having the Javascript API and the button should make it possible to implement appropriate privacy and security

michael: move on, because other topics
... do we agree that we don't need HTML bindings for TTS?

bjorn: don't have anything against it, but maybe a waste of time.

michael: we can leave it as it is for now.

let's start on bjorn's speech recognition events, similar to what i sent before the f2f

scribe: added timestamps, there are also a number of error codes that we need to agree on
... what about nomatch and noinput, are they errors or kinds of input?

michael: i think they're different types of result
... nomatch seems like a result, but noinput seems like a different kind of event

dan: we look at rejections

michael: if rejection was just below confidence you may want to look at that.

charles: noinput could be like a volume issue

michael: nospeech would not generate an nbest on our platform

dan: for us it would be the same way

glenn: why have multiple events instead of a single event that returns different parameters?

michael: i don't think you're typically doing the same thing with noinput vs. nomatch

charles: it's nice to have the engine decide if it's a nomatch

dan: sometimes the engine ends up with no answer, the vast majority of nomatch is confidence-based

glenn: should make sure that results returned are in as similar a format as possible

bjorn: what about nospeech?

dan: error to me means that something broke, not like a normal expected user situation

bjorn: the distinction between error and normal is not always clear

dan: true user interface behavior is not an error, "abort" would only be an error if you grouped together user-initiated abort and engine abort

bjorn: are permission problems or network problems errors?

michael: would not consider abort or noinput errors

glenn: I would tie them all into the same event, that would be simpler for the developer

michael: in the continuous case you don't care about noinput

dan: we won't resolve this in the remaining time.

michael: we can continue discussion on the list

- DRAFT -

HTML Speech Incubator Group Teleconference

30 Jun 2011

Attendees

Contents

review updated final report draft

approve proposed changes to report draft

status report from the WebAPI subgroup

WebAPI subgroup

Summary of Action Items

Scribe.perl diagnostic output