See also: IRC log
<burn_> trackbot, start telcon
<trackbot> Date: 02 June 2011
<Robert> can you hear me?
<burn_> Scribe: Michael_Johnston
<burn_> ScribeNick: Michael
<burn_> Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
burn: start with review of face
to face minutes, will review again next week
... comments on minutes
burn: comments on draft at this point?
all: silence
michael: does move to have emma document in dom, remove impetus for json variant of emma
bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object
milan: need to do xml parsing?
bodell: will be much the same as other http requests that return xml, dont need to parse
milan: are mobile devices a problem, verbosity of xml
bodell: no
dan: is there pressure from this group to build a json version of emma?
all: agreement: no push for json version of emma
burn: any other issues to add to list for discussion
bjorn: no feedback from chrome team yet
bodell: keep html binding
lightweight, js constructor, simple "for" mechanism, small work
to define, if dont want then remove the element
... should not mess up js api
olli: problem with for attribute
it what it can point to, what elements can be used as target,
doesnt quite work with content editable, important use
case
... clarifies issue, need to make clear which elements can be
targets and what the semantics is
... also content editable areas
michael: have to define semantics when target is e.g. a drop down or radio button
olli: may be new kinds of elements also
bodell: assumption would be to bind to any element, but they would not all have to work,
bodell; some browsers would want to handle more input types
olli: reco would be element in
the dom, what is the benefit of the reco
... if for is not used
bodell: google desire to have element with microphone click api
bjorn: have proposed several
things along the way,
... most important aspect is to have an element you can click
to start speaking without the pop up or info bar
olli: no clear what the element gives
robert: follow up with chrome folks
bjorn: still waiting on
that
... do agree that html element discussion does not block the js
api discussion
olli: issue may get solved along the way
burn: need to see concrete proposal to make decision
<marc> http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
burn: will go through each ...
bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then
burn: audio codecs mandatory
robert: even IP status around
speex is unclear also
... are only reasonable answers pcm and mulaw, despite their
flaws
bjorn: flac, high bandwidth
<bringert> FLAC
<bringert> http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
milan: speex is in ietf draft on how to package in rtp
<Milan> http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
burn: rtp way to send it does not mean there are ip issues
bjorn: need to require some codecs or cant be interoperable
milan: problems sounds similar
bodell: rfc does not have a patent policy
burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear
bjorn: need protocol for interoperability
milan: protocol for RTC
burn: opus, codecs from two
organizations, trying to blend, not clear if IP issues are
being resolved, making container
... can use either one if you have permission
... dont have an answer yet, really need one, industry wide
problem, may not be ours to solve, return to this
<mbodell> See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex
robert: will follow up re: speex again
milan: impact on protocol team if need to negotiate codec
marc: speex is not good enough for tts
<marc> ogg vorbis
<marc> and FLAC
burn: few names as candidates flac, ogg vorbis, speex, pcm
bjorn: already use flac in launched clients
olli: ogg vorbis is core html audio
<smaug> not core HTML audio. Some browsers just happen to support it
burn: candidates to consider flac, ogg vorbis, speex, pcm
burn: think we expect streaming, less clarity on how
milan: sending audio on regular time intervals as it is collected or generated
bjorn: discussed how to get
events while capturing
... how it is done is a protocol question
burn: asr may begin before the
user is
... finished speaking, result before engine comes
milan: without regular timed packets, wont get events on regular interval
bjorn: latency is what is app observable
bodell: having multiple events is
not a big problem
... data in events can deal with timing
milan: if app is realtime, five seconds ago go this event
bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation
burn: market takes care of product requirements
robert: fair to say that standard
should not have inherent limitations
... 50 ms or so is the threshold
bjorn: protocol design should not make it impossible to achieve low latency event delivery
marc: audio streaming in the tts
case?
... send audio while still rendering rest of an long
utterance
bodell: tts is generally fast enough that this is not a problem
marc: if tts has to process all
text before returning audio, could be a problem,
... wants to make sure that what we create here does not
prevent an implementation doing this
bjorn: up to engine whether it starts to synthesize
marc: wav format, header has filesize, makes proper streaming
bjorn: protocol should make it
possible for the tts to be streamed and start playing
before
... synthesis is complete
burn: issue of supporting format
coming back in video and
... and playing the audio
bjorn: should not require playing audio from video
robert: api should not prevent this
burn: video with three audio tracks, how does apis select
robert: our proposal separated capture api from reco, could support different kinds of capture
burn: protocol design should not preclude streaming of video codecs
raj: why specify video?
robert: if codec can be packetized in real time should be ok
burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio?
bodell: issue of latency impacting times
bjorn: agreed UA being basis for the clock
burn: dont have requirements for timing info from server
bodell: tts case?
bjorn: seems reasonable for server to include timing info
robert: could do offset from start
burn: something that UA can
convert into UA local timestamp
... different ways to achieve that
... doesnt say what is made available in the api
bodell; many different times, when the utterance start etc
bodell: when received,
marc: impact on order that events are received
milan: will UA generate these events when using remote service
bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order
milan: start of energy is
different than start of speech
... hard to write web app if get two start of speech events
bodell: different events,
bodell; was fixed order for the non continuous case
charles: could arrange fixed order delivery, even if times inside do not reflect this
bodell; no practical to hold events and put them in the desired order
burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info
marc: dont want to override better info from remote service
burn: front is for optimization so dont have to send all the audio
bodell: events could be in
different orders
... not convinced in having standard order
milan: UA only have sound start,
sound end
... avoid duplication,
bodell; already have different event names
robert: in name need to make
clear some events are from energy
... detector others are from speech reco
milan: source of events
bodell: unmake statement about specific ordering
milan: new statement that user agent can insert are energy related events
marc: and probably capture start and end
charles: seems strong since speech service might or might not be remote
burn: removed ordering
... energy detector can only generate sound start stop
burn; speech service can only deliver the speech start stop
charles; if not order can be guarantee delivery
burn: how to guarantee it '
milan: as long as have single source for events
michael: (need a blackboard for this)
bodell: solved by removing
required ordering
... allows all the use cases
... also works with continuous case
... thought had solved the issue
burn: but start before end?
... can get end without having seen a start
milan: reluctant to give up the ordering, if have single source for each type of event
burn: agreed speech service can only generate one, can't guarantee that they wont cross in time
milan: use remote speech service as the canonical
bodell: easiest to understand cross for end, UA would raise both events in the order they occurred
milan: it is possible to impose
an ordering
... pros and cons, flexibility, or predictability for the web
app developer
bjorn: events from the same source should be in the same order
<burn_> s/17:32:59 [Zakim] Zakim has left #htmlspeech//
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/know/no/ Succeeded: s/simple for /simple "for" / Succeeded: s/bjorn;/bjorn:/ Succeeded: s/bodell;/bodell:/ Succeeded: s/mark/marc/ Succeeded: s/bjorn;/bjorn:/ Succeeded: s/burn;/burn:/ Succeeded: s/+1.425.828.aaaa, // Succeeded: s/, +44.208.785.aabb// Succeeded: s/, +1.925.302.aacc// FAILED: s/17:32:59 [Zakim] Zakim has left #htmlspeech// Succeeded: s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/ Found Scribe: Michael_Johnston Found ScribeNick: Michael Default Present: Dan_Burnett, Milan_Young, Marc_Schroeder, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri Present: Dan_Burnett Milan_Young Marc_Schroeder Robert_Brown Patrick_Ehlen Charles_Hemphill Satish_Sampath Glen_Shires Michael_Johnston Olli_Pettay Michael_Bodell Bjorn_Bringert Dan_Druta Debbie_Dahl Raj_Tumuluri Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html Found Date: 02 Jun 2011 Guessing minutes URL: http://www.w3.org/2011/06/02-htmlspeech-minutes.html People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option.[End of scribe.perl diagnostic output]