15:56:51 RRSAgent has joined #htmlspeech
15:56:51 logging to http://www.w3.org/2011/06/02-htmlspeech-irc
15:56:53 +Michael_Bodell
15:56:57 -Robert_Brown
15:57:09 trackbot, start telcon
15:57:11 Zakim, nick smaug is Olli_Pettay
15:57:11 ok, smaug, I now associate you with Olli_Pettay
15:57:11 RRSAgent, make logs public
15:57:13 Zakim, this will be
15:57:13 I don't understand 'this will be', trackbot
15:57:14 Meeting: HTML Speech Incubator Group Teleconference
15:57:14 Date: 02 June 2011
15:57:25 Chair: Dan Burnett
15:57:34 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
15:57:37 mbodell has joined #htmlspeech
15:58:00 zakim, nick mbodell is Michael_Bodell
15:58:00 ok, burn_, I now associate mbodell with Michael_Bodell
15:58:29 zakim, who's here?
15:58:29 On the phone I see Dan_Burnett, Marc_Schroeder, Milan_Young, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell
15:58:32 On IRC I see mbodell, RRSAgent, bringert, smaug, rvid, Michael, Robert, Charles, Milan, marc, Zakim, burn_, trackbot
15:58:58 +Ronald
15:59:26 zakim, Ronald is Bjorn_Bringert
15:59:26 +Bjorn_Bringert; got it
15:59:44 +Dan_Druta
16:00:05 DanD has joined #htmlspeech
16:00:12 +[Microsoft]
16:00:18 +??P56
16:00:19 zakim, nick DanD is Dan_Druta
16:00:19 ok, burn_, I now associate DanD with Dan_Druta
16:00:42 Raj has joined #htmlspeech
16:00:52 can you hear me?
16:01:10 -[Microsoft]
16:01:16 +Debbie_Dahl
16:01:24 zakim, ??P56 is Raj_Tumuluri
16:01:24 +Raj_Tumuluri; got it
16:01:38 ddahl has joined #htmlspeech
16:01:44 -Debbie_Dahl
16:01:52 glen has joined #htmlspeech
16:02:00 +Patrick_Ehlen
16:02:17 +Debbie_Dahl
16:02:18 zakim, nick glen is Glen_Shires
16:02:18 ok, burn_, I now associate glen with Glen_Shires
16:02:22 +[Microsoft]
16:02:41 zakim, [Microsoft] is Robert_Brown
16:02:42 +Robert_Brown; got it
16:05:05 Scribe: Michael_Johnston
16:05:12 ScribeNick: Michael
16:05:35 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
16:06:34 burn: start with review of face to face minutes, will review again next week
16:06:41 burn: comments on minutes
16:07:05 topic: updated final report document
16:08:11 burn: comments on draft at this point?
16:08:18 all: silence
16:08:36 topic: agreed upon design decisions
16:08:49 topic: additional issues to add to list of issues
16:09:55 -Satish_Sampath
16:10:53 michael: does move to have emma document in dom, remove impetus for json variant of emma
16:12:14 bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object
16:12:24 milan: need to do xml parsing?
16:13:04 bodell: will be much the same as other http requests that return xml, dont need to parse
16:13:33 milan: are mobile devices a problem, verbosity of xml
16:13:43 bodell: know
16:13:51 s/know/no
16:14:24 dan: is there pressure from this group to build a json version of emma?
16:14:57 all: agreement: no push for json version of emma
16:15:13 burn: any other issues to add to list for discussion
16:15:37 topic: markup binding
16:16:01 bjorn: no feedback from chrome team yet
16:16:59 bodell: keep html binding lightweight, js constructor, simple for mechanism, small work to define, if dont want then remove the element
16:17:18 bodell: should not mess up js api
16:17:43 s/simple for /simple "for" /
16:17:58 olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case
16:18:39 olli: clarifies issue, need to make clear which elements can be targets and what the semantics is
16:18:58 olli: also content editable areas
16:19:58 michael: have to define semantics when target is e.g. a drop down or radio button
16:20:09 olli: may be new kinds of elements also
16:20:38 bodell: assumption would be to bind to any element, but they would not all have to work,
16:20:55 bodell; some browsers would want to handle more input types
16:21:21 olli: reco would be element in the dom, what is the benefit of the reco
16:21:32 olli: if for is not used
16:21:49 bodell: google desire to have element with microphone click api
16:22:22 bjorn: have proposed several things along the way,
16:22:48 bjorn; most important aspect is to have an element you can click to start speaking without the pop up or info bar
16:22:57 s/bjorn;/bjorn:
16:23:20 olli: no clear what the element gives
16:23:47 robert: follow up with chrome folks
16:23:55 bjorn: still waiting on that
16:24:21 bjorn: do agree that html element discussion does not block the js api discussion
16:24:32 olli: issue may get solved along the way
16:25:06 burn: need to see concrete proposal to make decision
16:25:43 topic: crucial decisions partially discussed
16:26:16 http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
16:27:17 burn: will go through each ...
16:28:21 bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then
16:28:35 burn: audio codecs mandatory
16:28:50 robert: even IP status around speex is unclear also
16:29:15 robert: are only reasonable answers pcm and mulaw, despite their flaws
16:29:24 bjorn: flac, high bandwidth
16:29:27 FLAC
16:29:38 http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
16:29:50 milan: speex is in ietf draft on how to package in rtp
16:29:57 http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
16:30:25 burn: rtp way to send it does not mean there are ip issues
16:30:43 bjorn: need to require some codecs or cant be interoperable
16:30:57 milan: problems sounds similar
16:31:11 bodell; rfc does not have a patent policy
16:31:24 s/bodell;/bodell:/
16:32:51 burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear
16:33:19 bjorn: need protocol for interoperability
16:33:27 milan: protocol for RTC
16:33:53 burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container
16:34:05 burn: can use either one if you have permission
16:35:03 burn: dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this
16:35:40 See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex
16:35:44 robert: will follow up re: speex again
16:37:45 milan: impact on protocol team if need to negotiate codec
16:37:57 mark: speex is not good enough for tts
16:37:59 ogg vorbis
16:38:08 s/mark/marc/
16:38:12 and FLAC
16:38:43 burn: few names as candidates flac, ogg vorbis, speex, pcm
16:38:59 bjorn: already use flac in launched clients
16:39:15 olli: ogg vorbis is core html audio
16:39:49 not core HTML audio. Some browsers just happen to support it
16:39:58 burn: candidates to consider flac, ogg vorbis, speex, pcm
16:40:54 topic: do we support audio streaming and how?
16:41:27 burn: think we expect streaming, less clarity on how
16:41:46 milan: sending audio on regular time intervals as it is collected or generated
16:42:48 bjorn: discussed how to get events while capturing
16:42:59 bjorn: how it is done is a protocol question
16:43:16 burn: asr may begin before the user is
16:43:55 burn: finished speaking, result before engine comes
16:44:43 milan: without regular timed packets, wont get events on regular interval
16:44:54 bjorn: latency is what is app observable
16:45:06 bodell: having multiple events is not a big problem
16:45:13 bodell: data in events can deal with timing
16:45:36 milan: if app is realtime, five seconds ago go this event
16:46:13 bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation
16:46:24 burn: market takes care of product requirements
16:46:47 robert: fair to say that standard should not have inherent limitations
16:46:57 robert: 50 ms or so is the threshold
16:47:23 bjorn: protocol design should not make it impossible to achieve low latency event delivery
16:47:47 marc: audio streaming in the tts case?
16:48:05 marc: send audio while still rendering rest of an long utterance
16:48:32 bodell: tts is generally fast enough that this is not a problem
16:49:54 marc: if tts has to process all text before returning audio, could be a problem,
16:50:30 marc: wants to make sure that what we create here does not prevent an implementation doing this
16:51:03 bjorn: up to engine whether it starts to synthesize
16:51:20 marc: wav format, header has filesize, makes proper streaming
16:51:48 bjorn: protocol should make it possible for the tts to be streamed and start playing before
16:52:03 bjorn: synthesis is complete
16:53:22 burn: issue of supporting format coming back in video and
16:53:37 burn: and playing the audio
16:54:03 bjorn: should not require playing audio from video
16:54:15 robert: api should not prevent this
16:54:50 burn: video with three audio tracks, how does apis select
16:55:15 robert: our proposal separated capture api from reco, could support different kinds of capture
16:55:35 burn: protocol design should not preclude streaming of video codecs
16:56:22 raj: why specify video?
16:56:55 robert: if codec can be packetized in real time should be ok
16:58:51 burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio?
16:59:55 topic: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs?
17:01:13 bodell: issue of latency impacting times
17:01:25 bjorn; agreed UA being basis for the clock
17:01:37 s/bjorn;/bjorn:/
17:02:11 burn: dont have requirements for timing info from server
17:02:23 bodell: tts case?
17:02:45 bjorn: seems reasonable for server to include timing info
17:02:57 robert: could do offset from start
17:03:16 burn: something that UA can convert into UA local timestamp
17:03:25 burn: different ways to achieve that
17:03:50 burn: doesnt say what is made available in the api
17:05:39 bodell; many different times, when the utterance start etc
17:05:49 bodell: when received,
17:06:12 marc: impact on order that events are received
17:06:30 milan: will UA generate these events when using remote service
17:06:58 bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order
17:07:10 milan: start of energy is different than start of speech
17:07:37 milan: hard to write web app if get two start of speech events
17:08:26 bodell: different events,
17:08:38 bodell; was fixed order for the non continuous case
17:09:41 charles: could arrange fixed order delivery, even if times inside do not reflect this
17:09:59 bodell; no practical to hold events and put them in the desired order
17:11:25 burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info
17:11:39 marc: dont want to override better info from remote service
17:12:11 burn: front is for optimization so dont have to send all the audio
17:13:06 bodell: events could be in different orders
17:13:18 bodell: not convinced in having standard order
17:13:35 milan: UA only have sound start, sound end
17:14:25 milan: avoid duplication,
17:14:39 bodell; already have different event names
17:15:17 robert: in name need to make clear some events are from energy
17:15:34 robert: detector others are from speech reco
17:15:44 milan: source of events
17:16:58 bodell: unmake statement about specific ordering
17:17:15 milan: new statement that user agent can insert are energy related events
17:17:31 marc: and probably capture start and end
17:17:50 charles: seems strong since speech service might or might not be remote
17:19:00 burn: removed ordering
17:19:18 burn: energy detector can only generate sound start stop
17:19:44 burn; speech service can only deliver the speech start stop
17:20:21 charles; if not order can be guarantee delivery
17:20:27 burn; how to guarantee it '
17:21:23 s/burn;/burn:/
17:21:58 milan: as long as have single source for events
17:22:08 michael: (need a blackboard for this)
17:22:34 bodell: solved by removing required ordering
17:22:50 bodell: allows all the use cases
17:23:29 bodell: also works with continuous case
17:23:44 bodell: thought had solved the issue
17:23:53 burn: but start before end?
17:24:07 burn: can get end without having seen a start
17:24:41 milan: reluctant to give up the ordering, if have single source for each type of event
17:25:33 burn: agreed speech service can only generate one, can't guarantee that they wont cross in time
17:25:49 milan: use remote speech service as the canonical
17:26:47 bodell: easiest to understand cross for end, UA would raise both events in the order they occurred
17:27:08 milan: it is possible to impose an ordering
17:27:46 milan: pros and cons, flexibility, or predictability for the web app developer
17:29:38 bjorn: events from the same source should be in the same order
17:31:19 -Michael_Bodell
17:31:21 -Raj_Tumuluri
17:31:21 -Dan_Druta
17:31:22 -Milan_Young
17:31:22 -Marc_Schroeder
17:31:23 -Bjorn_Bringert
17:31:23 -Patrick_Ehlen
17:31:24 -Debbie_Dahl
17:31:26 -Olli_Pettay
17:31:31 -Robert_Brown
17:31:36 rrsagent, make log public
17:31:38 -Michael_Johnston
17:31:39 -Charles_Hemphill
17:31:40 rrsagent, draft minutes
17:31:40 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:32:52 -Dan_Burnett
17:32:59 zakim, bye
17:32:59 leaving. As of this point the attendees were Dan_Burnett, Milan_Young, Marc_Schroeder, +1.425.828.aaaa, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, +44.208.785.aabb,
17:32:59 Zakim has left #htmlspeech
17:33:03 ... Satish_Sampath, +1.925.302.aacc, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri
17:33:53 s/+1.425.828.aaaa, //
17:34:08 s/, +44.208.785.aabb//
17:34:21 s/, +1.925.302.aacc//
17:34:26 rrsagent, draft minutes
17:34:26 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:36:56 s/17:32:59 [Zakim] Zakim has left #htmlspeech//
17:37:02 rrsagent, draft minutes
17:37:02 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:39:10 s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/
17:39:14 rrsagent, draft minutes
17:39:14 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:58:30 ddahl has left #htmlspeech
19:26:04 smaug has joined #htmlspeech
21:08:37 smaug has joined #htmlspeech