IRC log of htmlspeech on 2011-06-02
Timestamps are in UTC.
- 15:56:51 [RRSAgent]
- RRSAgent has joined #htmlspeech
- 15:56:51 [RRSAgent]
- logging to http://www.w3.org/2011/06/02-htmlspeech-irc
- 15:56:53 [Zakim]
- +Michael_Bodell
- 15:56:57 [Zakim]
- -Robert_Brown
- 15:57:09 [burn_]
- trackbot, start telcon
- 15:57:11 [smaug]
- Zakim, nick smaug is Olli_Pettay
- 15:57:11 [Zakim]
- ok, smaug, I now associate you with Olli_Pettay
- 15:57:11 [trackbot]
- RRSAgent, make logs public
- 15:57:13 [trackbot]
- Zakim, this will be
- 15:57:13 [Zakim]
- I don't understand 'this will be', trackbot
- 15:57:14 [trackbot]
- Meeting: HTML Speech Incubator Group Teleconference
- 15:57:14 [trackbot]
- Date: 02 June 2011
- 15:57:25 [burn_]
- Chair: Dan Burnett
- 15:57:34 [burn_]
- Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
- 15:57:37 [mbodell]
- mbodell has joined #htmlspeech
- 15:58:00 [burn_]
- zakim, nick mbodell is Michael_Bodell
- 15:58:00 [Zakim]
- ok, burn_, I now associate mbodell with Michael_Bodell
- 15:58:29 [burn_]
- zakim, who's here?
- 15:58:29 [Zakim]
- On the phone I see Dan_Burnett, Marc_Schroeder, Milan_Young, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell
- 15:58:32 [Zakim]
- On IRC I see mbodell, RRSAgent, bringert, smaug, rvid, Michael, Robert, Charles, Milan, marc, Zakim, burn_, trackbot
- 15:58:58 [Zakim]
- +Ronald
- 15:59:26 [burn_]
- zakim, Ronald is Bjorn_Bringert
- 15:59:26 [Zakim]
- +Bjorn_Bringert; got it
- 15:59:44 [Zakim]
- +Dan_Druta
- 16:00:05 [DanD]
- DanD has joined #htmlspeech
- 16:00:12 [Zakim]
- +[Microsoft]
- 16:00:18 [Zakim]
- +??P56
- 16:00:19 [burn_]
- zakim, nick DanD is Dan_Druta
- 16:00:19 [Zakim]
- ok, burn_, I now associate DanD with Dan_Druta
- 16:00:42 [Raj]
- Raj has joined #htmlspeech
- 16:00:52 [Robert]
- can you hear me?
- 16:01:10 [Zakim]
- -[Microsoft]
- 16:01:16 [Zakim]
- +Debbie_Dahl
- 16:01:24 [burn_]
- zakim, ??P56 is Raj_Tumuluri
- 16:01:24 [Zakim]
- +Raj_Tumuluri; got it
- 16:01:38 [ddahl]
- ddahl has joined #htmlspeech
- 16:01:44 [Zakim]
- -Debbie_Dahl
- 16:01:52 [glen]
- glen has joined #htmlspeech
- 16:02:00 [Zakim]
- +Patrick_Ehlen
- 16:02:17 [Zakim]
- +Debbie_Dahl
- 16:02:18 [burn_]
- zakim, nick glen is Glen_Shires
- 16:02:18 [Zakim]
- ok, burn_, I now associate glen with Glen_Shires
- 16:02:22 [Zakim]
- +[Microsoft]
- 16:02:41 [burn_]
- zakim, [Microsoft] is Robert_Brown
- 16:02:42 [Zakim]
- +Robert_Brown; got it
- 16:05:05 [burn_]
- Scribe: Michael_Johnston
- 16:05:12 [burn_]
- ScribeNick: Michael
- 16:05:35 [burn_]
- Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
- 16:06:34 [Michael]
- burn: start with review of face to face minutes, will review again next week
- 16:06:41 [Michael]
- burn: comments on minutes
- 16:07:05 [Michael]
- topic: updated final report document
- 16:08:11 [Michael]
- burn: comments on draft at this point?
- 16:08:18 [Michael]
- all: silence
- 16:08:36 [Michael]
- topic: agreed upon design decisions
- 16:08:49 [Michael]
- topic: additional issues to add to list of issues
- 16:09:55 [Zakim]
- -Satish_Sampath
- 16:10:53 [Michael]
- michael: does move to have emma document in dom, remove impetus for json variant of emma
- 16:12:14 [Michael]
- bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object
- 16:12:24 [Michael]
- milan: need to do xml parsing?
- 16:13:04 [Michael]
- bodell: will be much the same as other http requests that return xml, dont need to parse
- 16:13:33 [Michael]
- milan: are mobile devices a problem, verbosity of xml
- 16:13:43 [Michael]
- bodell: know
- 16:13:51 [Michael]
- s/know/no
- 16:14:24 [Michael]
- dan: is there pressure from this group to build a json version of emma?
- 16:14:57 [Michael]
- all: agreement: no push for json version of emma
- 16:15:13 [Michael]
- burn: any other issues to add to list for discussion
- 16:15:37 [Michael]
- topic: markup binding
- 16:16:01 [Michael]
- bjorn: no feedback from chrome team yet
- 16:16:59 [Michael]
- bodell: keep html binding lightweight, js constructor, simple for mechanism, small work to define, if dont want then remove the element
- 16:17:18 [Michael]
- bodell: should not mess up js api
- 16:17:43 [burn_]
- s/simple for /simple "for" /
- 16:17:58 [Michael]
- olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case
- 16:18:39 [Michael]
- olli: clarifies issue, need to make clear which elements can be targets and what the semantics is
- 16:18:58 [Michael]
- olli: also content editable areas
- 16:19:58 [Michael]
- michael: have to define semantics when target is e.g. a drop down or radio button
- 16:20:09 [Michael]
- olli: may be new kinds of elements also
- 16:20:38 [Michael]
- bodell: assumption would be to bind to any element, but they would not all have to work,
- 16:20:55 [Michael]
- bodell; some browsers would want to handle more input types
- 16:21:21 [Michael]
- olli: reco would be element in the dom, what is the benefit of the reco
- 16:21:32 [Michael]
- olli: if for is not used
- 16:21:49 [Michael]
- bodell: google desire to have element with microphone click api
- 16:22:22 [Michael]
- bjorn: have proposed several things along the way,
- 16:22:48 [Michael]
- bjorn; most important aspect is to have an element you can click to start speaking without the pop up or info bar
- 16:22:57 [Michael]
- s/bjorn;/bjorn:
- 16:23:20 [Michael]
- olli: no clear what the element gives
- 16:23:47 [Michael]
- robert: follow up with chrome folks
- 16:23:55 [Michael]
- bjorn: still waiting on that
- 16:24:21 [Michael]
- bjorn: do agree that html element discussion does not block the js api discussion
- 16:24:32 [Michael]
- olli: issue may get solved along the way
- 16:25:06 [Michael]
- burn: need to see concrete proposal to make decision
- 16:25:43 [Michael]
- topic: crucial decisions partially discussed
- 16:26:16 [marc]
- http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
- 16:27:17 [Michael]
- burn: will go through each ...
- 16:28:21 [Michael]
- bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then
- 16:28:35 [Michael]
- burn: audio codecs mandatory
- 16:28:50 [Michael]
- robert: even IP status around speex is unclear also
- 16:29:15 [Michael]
- robert: are only reasonable answers pcm and mulaw, despite their flaws
- 16:29:24 [Michael]
- bjorn: flac, high bandwidth
- 16:29:27 [bringert]
- FLAC
- 16:29:38 [bringert]
- http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
- 16:29:50 [Michael]
- milan: speex is in ietf draft on how to package in rtp
- 16:29:57 [Milan]
- http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
- 16:30:25 [Michael]
- burn: rtp way to send it does not mean there are ip issues
- 16:30:43 [Michael]
- bjorn: need to require some codecs or cant be interoperable
- 16:30:57 [Michael]
- milan: problems sounds similar
- 16:31:11 [Michael]
- bodell; rfc does not have a patent policy
- 16:31:24 [Michael]
- s/bodell;/bodell:/
- 16:32:51 [Michael]
- burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear
- 16:33:19 [Michael]
- bjorn: need protocol for interoperability
- 16:33:27 [Michael]
- milan: protocol for RTC
- 16:33:53 [Michael]
- burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container
- 16:34:05 [Michael]
- burn: can use either one if you have permission
- 16:35:03 [Michael]
- burn: dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this
- 16:35:40 [mbodell]
- See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex
- 16:35:44 [Michael]
- robert: will follow up re: speex again
- 16:37:45 [Michael]
- milan: impact on protocol team if need to negotiate codec
- 16:37:57 [Michael]
- mark: speex is not good enough for tts
- 16:37:59 [marc]
- ogg vorbis
- 16:38:08 [Michael]
- s/mark/marc/
- 16:38:12 [marc]
- and FLAC
- 16:38:43 [Michael]
- burn: few names as candidates flac, ogg vorbis, speex, pcm
- 16:38:59 [Michael]
- bjorn: already use flac in launched clients
- 16:39:15 [Michael]
- olli: ogg vorbis is core html audio
- 16:39:49 [smaug]
- not core HTML audio. Some browsers just happen to support it
- 16:39:58 [Michael]
- burn: candidates to consider flac, ogg vorbis, speex, pcm
- 16:40:54 [Michael]
- topic: do we support audio streaming and how?
- 16:41:27 [Michael]
- burn: think we expect streaming, less clarity on how
- 16:41:46 [Michael]
- milan: sending audio on regular time intervals as it is collected or generated
- 16:42:48 [Michael]
- bjorn: discussed how to get events while capturing
- 16:42:59 [Michael]
- bjorn: how it is done is a protocol question
- 16:43:16 [Michael]
- burn: asr may begin before the user is
- 16:43:55 [Michael]
- burn: finished speaking, result before engine comes
- 16:44:43 [Michael]
- milan: without regular timed packets, wont get events on regular interval
- 16:44:54 [Michael]
- bjorn: latency is what is app observable
- 16:45:06 [Michael]
- bodell: having multiple events is not a big problem
- 16:45:13 [Michael]
- bodell: data in events can deal with timing
- 16:45:36 [Michael]
- milan: if app is realtime, five seconds ago go this event
- 16:46:13 [Michael]
- bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation
- 16:46:24 [Michael]
- burn: market takes care of product requirements
- 16:46:47 [Michael]
- robert: fair to say that standard should not have inherent limitations
- 16:46:57 [Michael]
- robert: 50 ms or so is the threshold
- 16:47:23 [Michael]
- bjorn: protocol design should not make it impossible to achieve low latency event delivery
- 16:47:47 [Michael]
- marc: audio streaming in the tts case?
- 16:48:05 [Michael]
- marc: send audio while still rendering rest of an long utterance
- 16:48:32 [Michael]
- bodell: tts is generally fast enough that this is not a problem
- 16:49:54 [Michael]
- marc: if tts has to process all text before returning audio, could be a problem,
- 16:50:30 [Michael]
- marc: wants to make sure that what we create here does not prevent an implementation doing this
- 16:51:03 [Michael]
- bjorn: up to engine whether it starts to synthesize
- 16:51:20 [Michael]
- marc: wav format, header has filesize, makes proper streaming
- 16:51:48 [Michael]
- bjorn: protocol should make it possible for the tts to be streamed and start playing before
- 16:52:03 [Michael]
- bjorn: synthesis is complete
- 16:53:22 [Michael]
- burn: issue of supporting format coming back in video and
- 16:53:37 [Michael]
- burn: and playing the audio
- 16:54:03 [Michael]
- bjorn: should not require playing audio from video
- 16:54:15 [Michael]
- robert: api should not prevent this
- 16:54:50 [Michael]
- burn: video with three audio tracks, how does apis select
- 16:55:15 [Michael]
- robert: our proposal separated capture api from reco, could support different kinds of capture
- 16:55:35 [Michael]
- burn: protocol design should not preclude streaming of video codecs
- 16:56:22 [Michael]
- raj: why specify video?
- 16:56:55 [Michael]
- robert: if codec can be packetized in real time should be ok
- 16:58:51 [Michael]
- burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio?
- 16:59:55 [Michael]
- topic: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs?
- 17:01:13 [Michael]
- bodell: issue of latency impacting times
- 17:01:25 [Michael]
- bjorn; agreed UA being basis for the clock
- 17:01:37 [Michael]
- s/bjorn;/bjorn:/
- 17:02:11 [Michael]
- burn: dont have requirements for timing info from server
- 17:02:23 [Michael]
- bodell: tts case?
- 17:02:45 [Michael]
- bjorn: seems reasonable for server to include timing info
- 17:02:57 [Michael]
- robert: could do offset from start
- 17:03:16 [Michael]
- burn: something that UA can convert into UA local timestamp
- 17:03:25 [Michael]
- burn: different ways to achieve that
- 17:03:50 [Michael]
- burn: doesnt say what is made available in the api
- 17:05:39 [Michael]
- bodell; many different times, when the utterance start etc
- 17:05:49 [Michael]
- bodell: when received,
- 17:06:12 [Michael]
- marc: impact on order that events are received
- 17:06:30 [Michael]
- milan: will UA generate these events when using remote service
- 17:06:58 [Michael]
- bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order
- 17:07:10 [Michael]
- milan: start of energy is different than start of speech
- 17:07:37 [Michael]
- milan: hard to write web app if get two start of speech events
- 17:08:26 [Michael]
- bodell: different events,
- 17:08:38 [Michael]
- bodell; was fixed order for the non continuous case
- 17:09:41 [Michael]
- charles: could arrange fixed order delivery, even if times inside do not reflect this
- 17:09:59 [Michael]
- bodell; no practical to hold events and put them in the desired order
- 17:11:25 [Michael]
- burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info
- 17:11:39 [Michael]
- marc: dont want to override better info from remote service
- 17:12:11 [Michael]
- burn: front is for optimization so dont have to send all the audio
- 17:13:06 [Michael]
- bodell: events could be in different orders
- 17:13:18 [Michael]
- bodell: not convinced in having standard order
- 17:13:35 [Michael]
- milan: UA only have sound start, sound end
- 17:14:25 [Michael]
- milan: avoid duplication,
- 17:14:39 [Michael]
- bodell; already have different event names
- 17:15:17 [Michael]
- robert: in name need to make clear some events are from energy
- 17:15:34 [Michael]
- robert: detector others are from speech reco
- 17:15:44 [Michael]
- milan: source of events
- 17:16:58 [Michael]
- bodell: unmake statement about specific ordering
- 17:17:15 [Michael]
- milan: new statement that user agent can insert are energy related events
- 17:17:31 [Michael]
- marc: and probably capture start and end
- 17:17:50 [Michael]
- charles: seems strong since speech service might or might not be remote
- 17:19:00 [Michael]
- burn: removed ordering
- 17:19:18 [Michael]
- burn: energy detector can only generate sound start stop
- 17:19:44 [Michael]
- burn; speech service can only deliver the speech start stop
- 17:20:21 [Michael]
- charles; if not order can be guarantee delivery
- 17:20:27 [Michael]
- burn; how to guarantee it '
- 17:21:23 [Michael]
- s/burn;/burn:/
- 17:21:58 [Michael]
- milan: as long as have single source for events
- 17:22:08 [Michael]
- michael: (need a blackboard for this)
- 17:22:34 [Michael]
- bodell: solved by removing required ordering
- 17:22:50 [Michael]
- bodell: allows all the use cases
- 17:23:29 [Michael]
- bodell: also works with continuous case
- 17:23:44 [Michael]
- bodell: thought had solved the issue
- 17:23:53 [Michael]
- burn: but start before end?
- 17:24:07 [Michael]
- burn: can get end without having seen a start
- 17:24:41 [Michael]
- milan: reluctant to give up the ordering, if have single source for each type of event
- 17:25:33 [Michael]
- burn: agreed speech service can only generate one, can't guarantee that they wont cross in time
- 17:25:49 [Michael]
- milan: use remote speech service as the canonical
- 17:26:47 [Michael]
- bodell: easiest to understand cross for end, UA would raise both events in the order they occurred
- 17:27:08 [Michael]
- milan: it is possible to impose an ordering
- 17:27:46 [Michael]
- milan: pros and cons, flexibility, or predictability for the web app developer
- 17:29:38 [Michael]
- bjorn: events from the same source should be in the same order
- 17:31:19 [Zakim]
- -Michael_Bodell
- 17:31:21 [Zakim]
- -Raj_Tumuluri
- 17:31:21 [Zakim]
- -Dan_Druta
- 17:31:22 [Zakim]
- -Milan_Young
- 17:31:22 [Zakim]
- -Marc_Schroeder
- 17:31:23 [Zakim]
- -Bjorn_Bringert
- 17:31:23 [Zakim]
- -Patrick_Ehlen
- 17:31:24 [Zakim]
- -Debbie_Dahl
- 17:31:26 [Zakim]
- -Olli_Pettay
- 17:31:31 [Zakim]
- -Robert_Brown
- 17:31:36 [burn_]
- rrsagent, make log public
- 17:31:38 [Zakim]
- -Michael_Johnston
- 17:31:39 [Zakim]
- -Charles_Hemphill
- 17:31:40 [burn_]
- rrsagent, draft minutes
- 17:31:40 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
- 17:32:52 [Zakim]
- -Dan_Burnett
- 17:32:59 [burn_]
- zakim, bye
- 17:32:59 [Zakim]
- leaving. As of this point the attendees were Dan_Burnett, Milan_Young, Marc_Schroeder, +1.425.828.aaaa, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, +44.208.785.aabb,
- 17:32:59 [Zakim]
- Zakim has left #htmlspeech
- 17:33:03 [Zakim]
- ... Satish_Sampath, +1.925.302.aacc, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri
- 17:33:53 [burn_]
- s/+1.425.828.aaaa, //
- 17:34:08 [burn_]
- s/, +44.208.785.aabb//
- 17:34:21 [burn_]
- s/, +1.925.302.aacc//
- 17:34:26 [burn_]
- rrsagent, draft minutes
- 17:34:26 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
- 17:36:56 [burn_]
- s/17:32:59 [Zakim] Zakim has left #htmlspeech//
- 17:37:02 [burn_]
- rrsagent, draft minutes
- 17:37:02 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
- 17:39:10 [burn_]
- s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/
- 17:39:14 [burn_]
- rrsagent, draft minutes
- 17:39:14 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
- 17:58:30 [ddahl]
- ddahl has left #htmlspeech
- 19:26:04 [smaug]
- smaug has joined #htmlspeech
- 21:08:37 [smaug]
- smaug has joined #htmlspeech