See also: IRC log
<trackbot> Date: 30 June 2011
<scribe> scribe: ddahl
dan: email me if you have problems
http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html
dan: marc suggested wording
changes to requirements, we should approve
... i don't agree with all of them, redundancy isn't a
problem
... propose making changes based on our current understanding.
let me know if you have concerns.
dan: we'll start with the status
and bring up anything that should be discussed in the larger
group
... fyi, will leave for half an hour half an hour in
michael: will start discussing drafts
dan: any general discussion?
michael: not yet.
... Raj is doing summary of requirements and design decisions,
we don't know if there will be directional changes.
dan: is there any discussion from the rest of the group?
danD: the idea was that I can
create an object that isn't necessarily the ASR or TTS object,
and then I can bind to the service.
... the protocol will drive some of the parameters
... will send an update based on bjorn's comments
bjorn: i'm fine with the functionality, but maybe we do need two objects
danD: will try to blend proposal with bjorn's comments
michael: do we agree or not on two vs. one interface?
danD: I don't know at the time when i do the query what services will be provided, TTS, ASR, or both
bjorn: does it make sense to have a service that can provide both?
michael: we do have a discussion point on this
danD: having an interface bridge won't hurt
bjorn: my objection to having a
single one is that it makes the interface more
complicated
... i want to be able to handle the case where i have one or
the other or both
michael: other comments on Dan's interface?
danD: this won't be a
full-fledged API or module in itself, it's just
initialization
... we should start building a table saying "these are the
things I want to identify"
bjorn: if i want to have support for ASR or TTS it's hard to see what the API is. what if they are two different services. you have to do a bunch of checking flags.
olli: it depends on whether the parameters are the same for both cases.
bjorn: you also do totally different things with different services. there would need to be some kind of generic interface
michael: it would succeed or fail depending on what you asked it to do.
bjorn: it's better to specify two objects than having one giant object
<satish> (I got disconnected and will try calling in again)
bjorn: it's a syntactic issue
michael: it also depends on whether there are a lot of services that are one or another
bjorn: what parameters do you need to specify? URI, language, non-standard things like non-standard grammar format.
michael: other parameters?
michaelJ: grammar?
bjorn: this is querying for
capabilities of the recognizer
... it would make sense for the grammar to be a parameter, for
example if you had some specific grammars, like "support for a
specific grammar like 'date'".
michael: that could be for the moral equivalent of the builtins
dan: we're touching on some issues that we've already decided on, so we shouldn't revisit decisions that we already made
bjorn: standard queries would be grammar, language, and vendor-specific, so it doesn't matter too much if we have one API or two
michael: you may want to give them to the recognizer, not get them back from the recognizer
danD: we talked about not wanted to disclose what the application wanted to do.
bjorn: should get a list of what grammars and languages the recognizer supports
michael: it should accept a list
of grammars and languages as it's criteria and you get an
engine back
... should return failure if the service can't support all the
languages, but in the case of languages you might want to know
if the service supports a subset
bjorn: someone could pass in a list of all the languages in the world
olli: the user agent should be able to ask the user
danD: if i just ask what languages you support, how is that a privacy issue?
olli: if the service supports only Finnish and English, you could guess that i'm Finnish
<bringert> I got disconnected
michael: you could also use the
API for the local device that always has the user's language on
it.
... services don't have to necessarily be honest about their
answers
glenn: this seems like a major limitation that we're putting on developers for privacy reasons.
bjorn: regardless, we should say "give me a service that supports XYZ", and it's ok for the service to say "no comment"
michael: we want to allow the user to customize the service
charles: web servers already get the locale
olli: getting supported languages is just another data about the user
bjorn: most common use case is ASR and TTS for locale, so how about if we just get the locale language
olli: that might work
danD: so far, we should be able to provide the filter criteria for the grammar and the language, it should be optional, will get another version, we can discuss further
bjorn: we could say that the default locale language is supported, it's the additional languages that are supported that we have to think about
danD: will start a table of other
attributes that should be available at initialization
... and will get an update
michael: now look at HTML bindings
bjorn: would like there to be an
element that can be standalone or enclosed in other
elements
... not sure about control element
... the important things for me on the recognition element, it
should be possible for the web app author to put it on a
form
olli: how do you actually bind the value?
bjorn: the definition of a value
for a form control is that it's always a string without
formatting
... not so obvious for checkbox, it has to be defined for each
type
... it's the kind of think you put in the "value" attribute for
non-text elements
... for textarea or content editable it's the text
olli: automatic binding in X+V was annoying
michael: the difference is the optionality, you don't have to do it. as for the microphone, the reco image is platform-specific, microphone, button, etc.
olli: the graphical presention could be problematic
bjorn: each browser will have to decide what security model it wants to implement
michael: not sure about usefullness of the form, but the "for" does seem useful
bjorn: form is just a convenience
<burn> hey, sounds like bjorn wants voicexml :)
bjorn: should we look at
label?
... the HTML label does what we want
... we want to do the same things that label does
olli: when will user give permission?
michael: each browser will be
different
... some people want the button to appear on the screen without
asking permission
bjorn: Google Voice search, for example, you don't want to have to prompt the user every time
olli: worried about when user will give permission
bjorn: easier in the CaptureAPI case if there's no markup
michael: you need to check for permission when you do the reco, not just to have a reco object
olli: if the user never wants speech, maybe the browser doesn't even render the microphone
bjorn: olli, are you still concerned about consistency of permission policy?
olli: my concerns are that the user agent needs permission before using the reco object
bjorn: is the CaptureAPI similar to the Javascript recognition API?
olli: you get similar data in CaptureAPI and reco
bjorn: you can get a "permission denied" error code, that's very similar to our API
michael: what doesn't work is that the permission check happens before the binding
danD: there are two steps, one
the rendering of the object, and then the user decides to use
that UI element, and that's a privacy and consent issue
... it makes more sense if it doesn't even prompt the user
until it knows something is there
olli: a query to find out what kind of recognizer object is available is ok
bjorn: do you see a problem with
the HTML API having a different method?
... i think browsers should implement permission after the user
clicks the button
olli: what if user has already started speaking
bjorn: no permission could either cancel or not start recognition
michael: user should be able to revoke permission
bjorn: these things are up to the user agent, having the Javascript API and the button should make it possible to implement appropriate privacy and security
michael: move on, because other
topics
... do we agree that we don't need HTML bindings for TTS?
bjorn: don't have anything against it, but maybe a waste of time.
michael: we can leave it as it is for now.
let's start on bjorn's speech recognition events, similar to what i sent before the f2f
scribe: added timestamps, there
are also a number of error codes that we need to agree on
... what about nomatch and noinput, are they errors or kinds of
input?
michael: i think they're
different types of result
... nomatch seems like a result, but noinput seems like a
different kind of event
dan: we look at rejections
michael: if rejection was just below confidence you may want to look at that.
charles: noinput could be like a volume issue
michael: nospeech would not generate an nbest on our platform
dan: for us it would be the same way
glenn: why have multiple events instead of a single event that returns different parameters?
michael: i don't think you're typically doing the same thing with noinput vs. nomatch
charles: it's nice to have the engine decide if it's a nomatch
dan: sometimes the engine ends up with no answer, the vast majority of nomatch is confidence-based
glenn: should make sure that results returned are in as similar a format as possible
bjorn: what about nospeech?
dan: error to me means that something broke, not like a normal expected user situation
bjorn: the distinction between error and normal is not always clear
dan: true user interface behavior is not an error, "abort" would only be an error if you grouped together user-initiated abort and engine abort
bjorn: are permission problems or network problems errors?
michael: would not consider abort or noinput errors
glenn: I would tie them all into the same event, that would be simpler for the developer
michael: in the continuous case you don't care about noinput
dan: we won't resolve this in the remaining time.
michael: we can continue discussion on the list
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/support/supports/ Succeeded: s/, +1.425.922.aaaa// Succeeded: s/, +1.650.279.aabb// Succeeded: s/, +44.207.881.aacc// Succeeded: s/, +44.794.417.aadd// Succeeded: s/, Robert_Brown// Succeeded: s/Druta/Dan_Druta/ Succeeded: s/Bjorn_Bringert/Bjorn_Bringert, Satish_Sampath/ Found Scribe: ddahl Inferring ScribeNick: ddahl Default Present: Dan_Burnett, Patrick_Ehlen, Michael_Johnston, Olli_Pettay, Michael_Bodell, Dan_Druta, Debbie_Dahl, Charles_Hemphill, Glen_Shires, Bjorn_Bringert, Satish_Sampath Present: Dan_Burnett Patrick_Ehlen Michael_Johnston Olli_Pettay Michael_Bodell Dan_Druta Debbie_Dahl Charles_Hemphill Glen_Shires Bjorn_Bringert Satish_Sampath Regrets: Raj_Tumuluri Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0078.html Found Date: 30 Jun 2011 Guessing minutes URL: http://www.w3.org/2011/06/30-htmlspeech-minutes.html People with action items:[End of scribe.perl diagnostic output]