IRC log of htmlspeech on 2011-06-02

Timestamps are in UTC.

15:56:51 [RRSAgent]: RRSAgent has joined #htmlspeech
15:56:51 [RRSAgent]: logging to http://www.w3.org/2011/06/02-htmlspeech-irc
15:56:53 [Zakim]: +Michael_Bodell
15:56:57 [Zakim]: -Robert_Brown
15:57:09 [burn_]: trackbot, start telcon
15:57:11 [smaug]: Zakim, nick smaug is Olli_Pettay
15:57:11 [Zakim]: ok, smaug, I now associate you with Olli_Pettay
15:57:11 [trackbot]: RRSAgent, make logs public
15:57:13 [trackbot]: Zakim, this will be
15:57:13 [Zakim]: I don't understand 'this will be', trackbot
15:57:14 [trackbot]: Meeting: HTML Speech Incubator Group Teleconference
15:57:14 [trackbot]: Date: 02 June 2011
15:57:25 [burn_]: Chair: Dan Burnett
15:57:34 [burn_]: Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
15:57:37 [mbodell]: mbodell has joined #htmlspeech
15:58:00 [burn_]: zakim, nick mbodell is Michael_Bodell
15:58:00 [Zakim]: ok, burn_, I now associate mbodell with Michael_Bodell
15:58:29 [burn_]: zakim, who's here?
15:58:29 [Zakim]: On the phone I see Dan_Burnett, Marc_Schroeder, Milan_Young, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell
15:58:32 [Zakim]: On IRC I see mbodell, RRSAgent, bringert, smaug, rvid, Michael, Robert, Charles, Milan, marc, Zakim, burn_, trackbot
15:58:58 [Zakim]: +Ronald
15:59:26 [burn_]: zakim, Ronald is Bjorn_Bringert
15:59:26 [Zakim]: +Bjorn_Bringert; got it
15:59:44 [Zakim]: +Dan_Druta
16:00:05 [DanD]: DanD has joined #htmlspeech
16:00:12 [Zakim]: +[Microsoft]
16:00:18 [Zakim]: +??P56
16:00:19 [burn_]: zakim, nick DanD is Dan_Druta
16:00:19 [Zakim]: ok, burn_, I now associate DanD with Dan_Druta
16:00:42 [Raj]: Raj has joined #htmlspeech
16:00:52 [Robert]: can you hear me?
16:01:10 [Zakim]: -[Microsoft]
16:01:16 [Zakim]: +Debbie_Dahl
16:01:24 [burn_]: zakim, ??P56 is Raj_Tumuluri
16:01:24 [Zakim]: +Raj_Tumuluri; got it
16:01:38 [ddahl]: ddahl has joined #htmlspeech
16:01:44 [Zakim]: -Debbie_Dahl
16:01:52 [glen]: glen has joined #htmlspeech
16:02:00 [Zakim]: +Patrick_Ehlen
16:02:17 [Zakim]: +Debbie_Dahl
16:02:18 [burn_]: zakim, nick glen is Glen_Shires
16:02:18 [Zakim]: ok, burn_, I now associate glen with Glen_Shires
16:02:22 [Zakim]: +[Microsoft]
16:02:41 [burn_]: zakim, [Microsoft] is Robert_Brown
16:02:42 [Zakim]: +Robert_Brown; got it
16:05:05 [burn_]: Scribe: Michael_Johnston
16:05:12 [burn_]: ScribeNick: Michael
16:05:35 [burn_]: Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
16:06:34 [Michael]: burn: start with review of face to face minutes, will review again next week
16:06:41 [Michael]: burn: comments on minutes
16:07:05 [Michael]: topic: updated final report document
16:08:11 [Michael]: burn: comments on draft at this point?
16:08:18 [Michael]: all: silence
16:08:36 [Michael]: topic: agreed upon design decisions
16:08:49 [Michael]: topic: additional issues to add to list of issues
16:09:55 [Zakim]: -Satish_Sampath
16:10:53 [Michael]: michael: does move to have emma document in dom, remove impetus for json variant of emma
16:12:14 [Michael]: bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object
16:12:24 [Michael]: milan: need to do xml parsing?
16:13:04 [Michael]: bodell: will be much the same as other http requests that return xml, dont need to parse
16:13:33 [Michael]: milan: are mobile devices a problem, verbosity of xml
16:13:43 [Michael]: bodell: know
16:13:51 [Michael]: s/know/no
16:14:24 [Michael]: dan: is there pressure from this group to build a json version of emma?
16:14:57 [Michael]: all: agreement: no push for json version of emma
16:15:13 [Michael]: burn: any other issues to add to list for discussion
16:15:37 [Michael]: topic: markup binding
16:16:01 [Michael]: bjorn: no feedback from chrome team yet
16:16:59 [Michael]: bodell: keep html binding lightweight, js constructor, simple for mechanism, small work to define, if dont want then remove the element
16:17:18 [Michael]: bodell: should not mess up js api
16:17:43 [burn_]: s/simple for /simple "for" /
16:17:58 [Michael]: olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case
16:18:39 [Michael]: olli: clarifies issue, need to make clear which elements can be targets and what the semantics is
16:18:58 [Michael]: olli: also content editable areas
16:19:58 [Michael]: michael: have to define semantics when target is e.g. a drop down or radio button
16:20:09 [Michael]: olli: may be new kinds of elements also
16:20:38 [Michael]: bodell: assumption would be to bind to any element, but they would not all have to work,
16:20:55 [Michael]: bodell; some browsers would want to handle more input types
16:21:21 [Michael]: olli: reco would be element in the dom, what is the benefit of the reco
16:21:32 [Michael]: olli: if for is not used
16:21:49 [Michael]: bodell: google desire to have element with microphone click api
16:22:22 [Michael]: bjorn: have proposed several things along the way,
16:22:48 [Michael]: bjorn; most important aspect is to have an element you can click to start speaking without the pop up or info bar
16:22:57 [Michael]: s/bjorn;/bjorn:
16:23:20 [Michael]: olli: no clear what the element gives
16:23:47 [Michael]: robert: follow up with chrome folks
16:23:55 [Michael]: bjorn: still waiting on that
16:24:21 [Michael]: bjorn: do agree that html element discussion does not block the js api discussion
16:24:32 [Michael]: olli: issue may get solved along the way
16:25:06 [Michael]: burn: need to see concrete proposal to make decision
16:25:43 [Michael]: topic: crucial decisions partially discussed
16:26:16 [marc]: http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
16:27:17 [Michael]: burn: will go through each ...
16:28:21 [Michael]: bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then
16:28:35 [Michael]: burn: audio codecs mandatory
16:28:50 [Michael]: robert: even IP status around speex is unclear also
16:29:15 [Michael]: robert: are only reasonable answers pcm and mulaw, despite their flaws
16:29:24 [Michael]: bjorn: flac, high bandwidth
16:29:27 [bringert]: FLAC
16:29:38 [bringert]: http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
16:29:50 [Michael]: milan: speex is in ietf draft on how to package in rtp
16:29:57 [Milan]: http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
16:30:25 [Michael]: burn: rtp way to send it does not mean there are ip issues
16:30:43 [Michael]: bjorn: need to require some codecs or cant be interoperable
16:30:57 [Michael]: milan: problems sounds similar
16:31:11 [Michael]: bodell; rfc does not have a patent policy
16:31:24 [Michael]: s/bodell;/bodell:/
16:32:51 [Michael]: burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear
16:33:19 [Michael]: bjorn: need protocol for interoperability
16:33:27 [Michael]: milan: protocol for RTC
16:33:53 [Michael]: burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container
16:34:05 [Michael]: burn: can use either one if you have permission
16:35:03 [Michael]: burn: dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this
16:35:40 [mbodell]: See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex
16:35:44 [Michael]: robert: will follow up re: speex again
16:37:45 [Michael]: milan: impact on protocol team if need to negotiate codec
16:37:57 [Michael]: mark: speex is not good enough for tts
16:37:59 [marc]: ogg vorbis
16:38:08 [Michael]: s/mark/marc/
16:38:12 [marc]: and FLAC
16:38:43 [Michael]: burn: few names as candidates flac, ogg vorbis, speex, pcm
16:38:59 [Michael]: bjorn: already use flac in launched clients
16:39:15 [Michael]: olli: ogg vorbis is core html audio
16:39:49 [smaug]: not core HTML audio. Some browsers just happen to support it
16:39:58 [Michael]: burn: candidates to consider flac, ogg vorbis, speex, pcm
16:40:54 [Michael]: topic: do we support audio streaming and how?
16:41:27 [Michael]: burn: think we expect streaming, less clarity on how
16:41:46 [Michael]: milan: sending audio on regular time intervals as it is collected or generated
16:42:48 [Michael]: bjorn: discussed how to get events while capturing
16:42:59 [Michael]: bjorn: how it is done is a protocol question
16:43:16 [Michael]: burn: asr may begin before the user is
16:43:55 [Michael]: burn: finished speaking, result before engine comes
16:44:43 [Michael]: milan: without regular timed packets, wont get events on regular interval
16:44:54 [Michael]: bjorn: latency is what is app observable
16:45:06 [Michael]: bodell: having multiple events is not a big problem
16:45:13 [Michael]: bodell: data in events can deal with timing
16:45:36 [Michael]: milan: if app is realtime, five seconds ago go this event
16:46:13 [Michael]: bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation
16:46:24 [Michael]: burn: market takes care of product requirements
16:46:47 [Michael]: robert: fair to say that standard should not have inherent limitations
16:46:57 [Michael]: robert: 50 ms or so is the threshold
16:47:23 [Michael]: bjorn: protocol design should not make it impossible to achieve low latency event delivery
16:47:47 [Michael]: marc: audio streaming in the tts case?
16:48:05 [Michael]: marc: send audio while still rendering rest of an long utterance
16:48:32 [Michael]: bodell: tts is generally fast enough that this is not a problem
16:49:54 [Michael]: marc: if tts has to process all text before returning audio, could be a problem,
16:50:30 [Michael]: marc: wants to make sure that what we create here does not prevent an implementation doing this
16:51:03 [Michael]: bjorn: up to engine whether it starts to synthesize
16:51:20 [Michael]: marc: wav format, header has filesize, makes proper streaming
16:51:48 [Michael]: bjorn: protocol should make it possible for the tts to be streamed and start playing before
16:52:03 [Michael]: bjorn: synthesis is complete
16:53:22 [Michael]: burn: issue of supporting format coming back in video and
16:53:37 [Michael]: burn: and playing the audio
16:54:03 [Michael]: bjorn: should not require playing audio from video
16:54:15 [Michael]: robert: api should not prevent this
16:54:50 [Michael]: burn: video with three audio tracks, how does apis select
16:55:15 [Michael]: robert: our proposal separated capture api from reco, could support different kinds of capture
16:55:35 [Michael]: burn: protocol design should not preclude streaming of video codecs
16:56:22 [Michael]: raj: why specify video?
16:56:55 [Michael]: robert: if codec can be packetized in real time should be ok
16:58:51 [Michael]: burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio?
16:59:55 [Michael]: topic: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs?
17:01:13 [Michael]: bodell: issue of latency impacting times
17:01:25 [Michael]: bjorn; agreed UA being basis for the clock
17:01:37 [Michael]: s/bjorn;/bjorn:/
17:02:11 [Michael]: burn: dont have requirements for timing info from server
17:02:23 [Michael]: bodell: tts case?
17:02:45 [Michael]: bjorn: seems reasonable for server to include timing info
17:02:57 [Michael]: robert: could do offset from start
17:03:16 [Michael]: burn: something that UA can convert into UA local timestamp
17:03:25 [Michael]: burn: different ways to achieve that
17:03:50 [Michael]: burn: doesnt say what is made available in the api
17:05:39 [Michael]: bodell; many different times, when the utterance start etc
17:05:49 [Michael]: bodell: when received,
17:06:12 [Michael]: marc: impact on order that events are received
17:06:30 [Michael]: milan: will UA generate these events when using remote service
17:06:58 [Michael]: bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order
17:07:10 [Michael]: milan: start of energy is different than start of speech
17:07:37 [Michael]: milan: hard to write web app if get two start of speech events
17:08:26 [Michael]: bodell: different events,
17:08:38 [Michael]: bodell; was fixed order for the non continuous case
17:09:41 [Michael]: charles: could arrange fixed order delivery, even if times inside do not reflect this
17:09:59 [Michael]: bodell; no practical to hold events and put them in the desired order
17:11:25 [Michael]: burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info
17:11:39 [Michael]: marc: dont want to override better info from remote service
17:12:11 [Michael]: burn: front is for optimization so dont have to send all the audio
17:13:06 [Michael]: bodell: events could be in different orders
17:13:18 [Michael]: bodell: not convinced in having standard order
17:13:35 [Michael]: milan: UA only have sound start, sound end
17:14:25 [Michael]: milan: avoid duplication,
17:14:39 [Michael]: bodell; already have different event names
17:15:17 [Michael]: robert: in name need to make clear some events are from energy
17:15:34 [Michael]: robert: detector others are from speech reco
17:15:44 [Michael]: milan: source of events
17:16:58 [Michael]: bodell: unmake statement about specific ordering
17:17:15 [Michael]: milan: new statement that user agent can insert are energy related events
17:17:31 [Michael]: marc: and probably capture start and end
17:17:50 [Michael]: charles: seems strong since speech service might or might not be remote
17:19:00 [Michael]: burn: removed ordering
17:19:18 [Michael]: burn: energy detector can only generate sound start stop
17:19:44 [Michael]: burn; speech service can only deliver the speech start stop
17:20:21 [Michael]: charles; if not order can be guarantee delivery
17:20:27 [Michael]: burn; how to guarantee it '
17:21:23 [Michael]: s/burn;/burn:/
17:21:58 [Michael]: milan: as long as have single source for events
17:22:08 [Michael]: michael: (need a blackboard for this)
17:22:34 [Michael]: bodell: solved by removing required ordering
17:22:50 [Michael]: bodell: allows all the use cases
17:23:29 [Michael]: bodell: also works with continuous case
17:23:44 [Michael]: bodell: thought had solved the issue
17:23:53 [Michael]: burn: but start before end?
17:24:07 [Michael]: burn: can get end without having seen a start
17:24:41 [Michael]: milan: reluctant to give up the ordering, if have single source for each type of event
17:25:33 [Michael]: burn: agreed speech service can only generate one, can't guarantee that they wont cross in time
17:25:49 [Michael]: milan: use remote speech service as the canonical
17:26:47 [Michael]: bodell: easiest to understand cross for end, UA would raise both events in the order they occurred
17:27:08 [Michael]: milan: it is possible to impose an ordering
17:27:46 [Michael]: milan: pros and cons, flexibility, or predictability for the web app developer
17:29:38 [Michael]: bjorn: events from the same source should be in the same order
17:31:19 [Zakim]: -Michael_Bodell
17:31:21 [Zakim]: -Raj_Tumuluri
17:31:21 [Zakim]: -Dan_Druta
17:31:22 [Zakim]: -Milan_Young
17:31:22 [Zakim]: -Marc_Schroeder
17:31:23 [Zakim]: -Bjorn_Bringert
17:31:23 [Zakim]: -Patrick_Ehlen
17:31:24 [Zakim]: -Debbie_Dahl
17:31:26 [Zakim]: -Olli_Pettay
17:31:31 [Zakim]: -Robert_Brown
17:31:36 [burn_]: rrsagent, make log public
17:31:38 [Zakim]: -Michael_Johnston
17:31:39 [Zakim]: -Charles_Hemphill
17:31:40 [burn_]: rrsagent, draft minutes
17:31:40 [RRSAgent]: I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:32:52 [Zakim]: -Dan_Burnett
17:32:59 [burn_]: zakim, bye
17:32:59 [Zakim]: leaving. As of this point the attendees were Dan_Burnett, Milan_Young, Marc_Schroeder, +1.425.828.aaaa, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, +44.208.785.aabb,
17:32:59 [Zakim]: Zakim has left #htmlspeech
17:33:03 [Zakim]: ... Satish_Sampath, +1.925.302.aacc, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri
17:33:53 [burn_]: s/+1.425.828.aaaa, //
17:34:08 [burn_]: s/, +44.208.785.aabb//
17:34:21 [burn_]: s/, +1.925.302.aacc//
17:34:26 [burn_]: rrsagent, draft minutes
17:34:26 [RRSAgent]: I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:36:56 [burn_]: s/17:32:59 [Zakim] Zakim has left #htmlspeech//
17:37:02 [burn_]: rrsagent, draft minutes
17:37:02 [RRSAgent]: I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:39:10 [burn_]: s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/
17:39:14 [burn_]: rrsagent, draft minutes
17:39:14 [RRSAgent]: I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:58:30 [ddahl]: ddahl has left #htmlspeech
19:26:04 [smaug]: smaug has joined #htmlspeech
21:08:37 [smaug]: smaug has joined #htmlspeech