15:56:51 RRSAgent has joined #htmlspeech 15:56:51 logging to http://www.w3.org/2011/06/02-htmlspeech-irc 15:56:53 +Michael_Bodell 15:56:57 -Robert_Brown 15:57:09 trackbot, start telcon 15:57:11 Zakim, nick smaug is Olli_Pettay 15:57:11 ok, smaug, I now associate you with Olli_Pettay 15:57:11 RRSAgent, make logs public 15:57:13 Zakim, this will be 15:57:13 I don't understand 'this will be', trackbot 15:57:14 Meeting: HTML Speech Incubator Group Teleconference 15:57:14 Date: 02 June 2011 15:57:25 Chair: Dan Burnett 15:57:34 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html 15:57:37 mbodell has joined #htmlspeech 15:58:00 zakim, nick mbodell is Michael_Bodell 15:58:00 ok, burn_, I now associate mbodell with Michael_Bodell 15:58:29 zakim, who's here? 15:58:29 On the phone I see Dan_Burnett, Marc_Schroeder, Milan_Young, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell 15:58:32 On IRC I see mbodell, RRSAgent, bringert, smaug, rvid, Michael, Robert, Charles, Milan, marc, Zakim, burn_, trackbot 15:58:58 +Ronald 15:59:26 zakim, Ronald is Bjorn_Bringert 15:59:26 +Bjorn_Bringert; got it 15:59:44 +Dan_Druta 16:00:05 DanD has joined #htmlspeech 16:00:12 +[Microsoft] 16:00:18 +??P56 16:00:19 zakim, nick DanD is Dan_Druta 16:00:19 ok, burn_, I now associate DanD with Dan_Druta 16:00:42 Raj has joined #htmlspeech 16:00:52 can you hear me? 16:01:10 -[Microsoft] 16:01:16 +Debbie_Dahl 16:01:24 zakim, ??P56 is Raj_Tumuluri 16:01:24 +Raj_Tumuluri; got it 16:01:38 ddahl has joined #htmlspeech 16:01:44 -Debbie_Dahl 16:01:52 glen has joined #htmlspeech 16:02:00 +Patrick_Ehlen 16:02:17 +Debbie_Dahl 16:02:18 zakim, nick glen is Glen_Shires 16:02:18 ok, burn_, I now associate glen with Glen_Shires 16:02:22 +[Microsoft] 16:02:41 zakim, [Microsoft] is Robert_Brown 16:02:42 +Robert_Brown; got it 16:05:05 Scribe: Michael_Johnston 16:05:12 ScribeNick: Michael 16:05:35 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html 16:06:34 burn: start with review of face to face minutes, will review again next week 16:06:41 burn: comments on minutes 16:07:05 topic: updated final report document 16:08:11 burn: comments on draft at this point? 16:08:18 all: silence 16:08:36 topic: agreed upon design decisions 16:08:49 topic: additional issues to add to list of issues 16:09:55 -Satish_Sampath 16:10:53 michael: does move to have emma document in dom, remove impetus for json variant of emma 16:12:14 bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object 16:12:24 milan: need to do xml parsing? 16:13:04 bodell: will be much the same as other http requests that return xml, dont need to parse 16:13:33 milan: are mobile devices a problem, verbosity of xml 16:13:43 bodell: know 16:13:51 s/know/no 16:14:24 dan: is there pressure from this group to build a json version of emma? 16:14:57 all: agreement: no push for json version of emma 16:15:13 burn: any other issues to add to list for discussion 16:15:37 topic: markup binding 16:16:01 bjorn: no feedback from chrome team yet 16:16:59 bodell: keep html binding lightweight, js constructor, simple for mechanism, small work to define, if dont want then remove the element 16:17:18 bodell: should not mess up js api 16:17:43 s/simple for /simple "for" / 16:17:58 olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case 16:18:39 olli: clarifies issue, need to make clear which elements can be targets and what the semantics is 16:18:58 olli: also content editable areas 16:19:58 michael: have to define semantics when target is e.g. a drop down or radio button 16:20:09 olli: may be new kinds of elements also 16:20:38 bodell: assumption would be to bind to any element, but they would not all have to work, 16:20:55 bodell; some browsers would want to handle more input types 16:21:21 olli: reco would be element in the dom, what is the benefit of the reco 16:21:32 olli: if for is not used 16:21:49 bodell: google desire to have element with microphone click api 16:22:22 bjorn: have proposed several things along the way, 16:22:48 bjorn; most important aspect is to have an element you can click to start speaking without the pop up or info bar 16:22:57 s/bjorn;/bjorn: 16:23:20 olli: no clear what the element gives 16:23:47 robert: follow up with chrome folks 16:23:55 bjorn: still waiting on that 16:24:21 bjorn: do agree that html element discussion does not block the js api discussion 16:24:32 olli: issue may get solved along the way 16:25:06 burn: need to see concrete proposal to make decision 16:25:43 topic: crucial decisions partially discussed 16:26:16 http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html 16:27:17 burn: will go through each ... 16:28:21 bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then 16:28:35 burn: audio codecs mandatory 16:28:50 robert: even IP status around speex is unclear also 16:29:15 robert: are only reasonable answers pcm and mulaw, despite their flaws 16:29:24 bjorn: flac, high bandwidth 16:29:27 FLAC 16:29:38 http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec 16:29:50 milan: speex is in ietf draft on how to package in rtp 16:29:57 http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07 16:30:25 burn: rtp way to send it does not mean there are ip issues 16:30:43 bjorn: need to require some codecs or cant be interoperable 16:30:57 milan: problems sounds similar 16:31:11 bodell; rfc does not have a patent policy 16:31:24 s/bodell;/bodell:/ 16:32:51 burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear 16:33:19 bjorn: need protocol for interoperability 16:33:27 milan: protocol for RTC 16:33:53 burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container 16:34:05 burn: can use either one if you have permission 16:35:03 burn: dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this 16:35:40 See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex 16:35:44 robert: will follow up re: speex again 16:37:45 milan: impact on protocol team if need to negotiate codec 16:37:57 mark: speex is not good enough for tts 16:37:59 ogg vorbis 16:38:08 s/mark/marc/ 16:38:12 and FLAC 16:38:43 burn: few names as candidates flac, ogg vorbis, speex, pcm 16:38:59 bjorn: already use flac in launched clients 16:39:15 olli: ogg vorbis is core html audio 16:39:49 not core HTML audio. Some browsers just happen to support it 16:39:58 burn: candidates to consider flac, ogg vorbis, speex, pcm 16:40:54 topic: do we support audio streaming and how? 16:41:27 burn: think we expect streaming, less clarity on how 16:41:46 milan: sending audio on regular time intervals as it is collected or generated 16:42:48 bjorn: discussed how to get events while capturing 16:42:59 bjorn: how it is done is a protocol question 16:43:16 burn: asr may begin before the user is 16:43:55 burn: finished speaking, result before engine comes 16:44:43 milan: without regular timed packets, wont get events on regular interval 16:44:54 bjorn: latency is what is app observable 16:45:06 bodell: having multiple events is not a big problem 16:45:13 bodell: data in events can deal with timing 16:45:36 milan: if app is realtime, five seconds ago go this event 16:46:13 bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation 16:46:24 burn: market takes care of product requirements 16:46:47 robert: fair to say that standard should not have inherent limitations 16:46:57 robert: 50 ms or so is the threshold 16:47:23 bjorn: protocol design should not make it impossible to achieve low latency event delivery 16:47:47 marc: audio streaming in the tts case? 16:48:05 marc: send audio while still rendering rest of an long utterance 16:48:32 bodell: tts is generally fast enough that this is not a problem 16:49:54 marc: if tts has to process all text before returning audio, could be a problem, 16:50:30 marc: wants to make sure that what we create here does not prevent an implementation doing this 16:51:03 bjorn: up to engine whether it starts to synthesize 16:51:20 marc: wav format, header has filesize, makes proper streaming 16:51:48 bjorn: protocol should make it possible for the tts to be streamed and start playing before 16:52:03 bjorn: synthesis is complete 16:53:22 burn: issue of supporting format coming back in video and 16:53:37 burn: and playing the audio 16:54:03 bjorn: should not require playing audio from video 16:54:15 robert: api should not prevent this 16:54:50 burn: video with three audio tracks, how does apis select 16:55:15 robert: our proposal separated capture api from reco, could support different kinds of capture 16:55:35 burn: protocol design should not preclude streaming of video codecs 16:56:22 raj: why specify video? 16:56:55 robert: if codec can be packetized in real time should be ok 16:58:51 burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio? 16:59:55 topic: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs? 17:01:13 bodell: issue of latency impacting times 17:01:25 bjorn; agreed UA being basis for the clock 17:01:37 s/bjorn;/bjorn:/ 17:02:11 burn: dont have requirements for timing info from server 17:02:23 bodell: tts case? 17:02:45 bjorn: seems reasonable for server to include timing info 17:02:57 robert: could do offset from start 17:03:16 burn: something that UA can convert into UA local timestamp 17:03:25 burn: different ways to achieve that 17:03:50 burn: doesnt say what is made available in the api 17:05:39 bodell; many different times, when the utterance start etc 17:05:49 bodell: when received, 17:06:12 marc: impact on order that events are received 17:06:30 milan: will UA generate these events when using remote service 17:06:58 bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order 17:07:10 milan: start of energy is different than start of speech 17:07:37 milan: hard to write web app if get two start of speech events 17:08:26 bodell: different events, 17:08:38 bodell; was fixed order for the non continuous case 17:09:41 charles: could arrange fixed order delivery, even if times inside do not reflect this 17:09:59 bodell; no practical to hold events and put them in the desired order 17:11:25 burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info 17:11:39 marc: dont want to override better info from remote service 17:12:11 burn: front is for optimization so dont have to send all the audio 17:13:06 bodell: events could be in different orders 17:13:18 bodell: not convinced in having standard order 17:13:35 milan: UA only have sound start, sound end 17:14:25 milan: avoid duplication, 17:14:39 bodell; already have different event names 17:15:17 robert: in name need to make clear some events are from energy 17:15:34 robert: detector others are from speech reco 17:15:44 milan: source of events 17:16:58 bodell: unmake statement about specific ordering 17:17:15 milan: new statement that user agent can insert are energy related events 17:17:31 marc: and probably capture start and end 17:17:50 charles: seems strong since speech service might or might not be remote 17:19:00 burn: removed ordering 17:19:18 burn: energy detector can only generate sound start stop 17:19:44 burn; speech service can only deliver the speech start stop 17:20:21 charles; if not order can be guarantee delivery 17:20:27 burn; how to guarantee it ' 17:21:23 s/burn;/burn:/ 17:21:58 milan: as long as have single source for events 17:22:08 michael: (need a blackboard for this) 17:22:34 bodell: solved by removing required ordering 17:22:50 bodell: allows all the use cases 17:23:29 bodell: also works with continuous case 17:23:44 bodell: thought had solved the issue 17:23:53 burn: but start before end? 17:24:07 burn: can get end without having seen a start 17:24:41 milan: reluctant to give up the ordering, if have single source for each type of event 17:25:33 burn: agreed speech service can only generate one, can't guarantee that they wont cross in time 17:25:49 milan: use remote speech service as the canonical 17:26:47 bodell: easiest to understand cross for end, UA would raise both events in the order they occurred 17:27:08 milan: it is possible to impose an ordering 17:27:46 milan: pros and cons, flexibility, or predictability for the web app developer 17:29:38 bjorn: events from the same source should be in the same order 17:31:19 -Michael_Bodell 17:31:21 -Raj_Tumuluri 17:31:21 -Dan_Druta 17:31:22 -Milan_Young 17:31:22 -Marc_Schroeder 17:31:23 -Bjorn_Bringert 17:31:23 -Patrick_Ehlen 17:31:24 -Debbie_Dahl 17:31:26 -Olli_Pettay 17:31:31 -Robert_Brown 17:31:36 rrsagent, make log public 17:31:38 -Michael_Johnston 17:31:39 -Charles_Hemphill 17:31:40 rrsagent, draft minutes 17:31:40 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_ 17:32:52 -Dan_Burnett 17:32:59 zakim, bye 17:32:59 leaving. As of this point the attendees were Dan_Burnett, Milan_Young, Marc_Schroeder, +1.425.828.aaaa, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, +44.208.785.aabb, 17:32:59 Zakim has left #htmlspeech 17:33:03 ... Satish_Sampath, +1.925.302.aacc, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri 17:33:53 s/+1.425.828.aaaa, // 17:34:08 s/, +44.208.785.aabb// 17:34:21 s/, +1.925.302.aacc// 17:34:26 rrsagent, draft minutes 17:34:26 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_ 17:36:56 s/17:32:59 [Zakim] Zakim has left #htmlspeech// 17:37:02 rrsagent, draft minutes 17:37:02 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_ 17:39:10 s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/ 17:39:14 rrsagent, draft minutes 17:39:14 I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_ 17:58:30 ddahl has left #htmlspeech 19:26:04 smaug has joined #htmlspeech 21:08:37 smaug has joined #htmlspeech