IRC log of htmlspeech on 2011-06-02

Timestamps are in UTC.

15:56:51 [RRSAgent]
RRSAgent has joined #htmlspeech
15:56:51 [RRSAgent]
logging to http://www.w3.org/2011/06/02-htmlspeech-irc
15:56:53 [Zakim]
+Michael_Bodell
15:56:57 [Zakim]
-Robert_Brown
15:57:09 [burn_]
trackbot, start telcon
15:57:11 [smaug]
Zakim, nick smaug is Olli_Pettay
15:57:11 [Zakim]
ok, smaug, I now associate you with Olli_Pettay
15:57:11 [trackbot]
RRSAgent, make logs public
15:57:13 [trackbot]
Zakim, this will be
15:57:13 [Zakim]
I don't understand 'this will be', trackbot
15:57:14 [trackbot]
Meeting: HTML Speech Incubator Group Teleconference
15:57:14 [trackbot]
Date: 02 June 2011
15:57:25 [burn_]
Chair: Dan Burnett
15:57:34 [burn_]
Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
15:57:37 [mbodell]
mbodell has joined #htmlspeech
15:58:00 [burn_]
zakim, nick mbodell is Michael_Bodell
15:58:00 [Zakim]
ok, burn_, I now associate mbodell with Michael_Bodell
15:58:29 [burn_]
zakim, who's here?
15:58:29 [Zakim]
On the phone I see Dan_Burnett, Marc_Schroeder, Milan_Young, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell
15:58:32 [Zakim]
On IRC I see mbodell, RRSAgent, bringert, smaug, rvid, Michael, Robert, Charles, Milan, marc, Zakim, burn_, trackbot
15:58:58 [Zakim]
+Ronald
15:59:26 [burn_]
zakim, Ronald is Bjorn_Bringert
15:59:26 [Zakim]
+Bjorn_Bringert; got it
15:59:44 [Zakim]
+Dan_Druta
16:00:05 [DanD]
DanD has joined #htmlspeech
16:00:12 [Zakim]
+[Microsoft]
16:00:18 [Zakim]
+??P56
16:00:19 [burn_]
zakim, nick DanD is Dan_Druta
16:00:19 [Zakim]
ok, burn_, I now associate DanD with Dan_Druta
16:00:42 [Raj]
Raj has joined #htmlspeech
16:00:52 [Robert]
can you hear me?
16:01:10 [Zakim]
-[Microsoft]
16:01:16 [Zakim]
+Debbie_Dahl
16:01:24 [burn_]
zakim, ??P56 is Raj_Tumuluri
16:01:24 [Zakim]
+Raj_Tumuluri; got it
16:01:38 [ddahl]
ddahl has joined #htmlspeech
16:01:44 [Zakim]
-Debbie_Dahl
16:01:52 [glen]
glen has joined #htmlspeech
16:02:00 [Zakim]
+Patrick_Ehlen
16:02:17 [Zakim]
+Debbie_Dahl
16:02:18 [burn_]
zakim, nick glen is Glen_Shires
16:02:18 [Zakim]
ok, burn_, I now associate glen with Glen_Shires
16:02:22 [Zakim]
+[Microsoft]
16:02:41 [burn_]
zakim, [Microsoft] is Robert_Brown
16:02:42 [Zakim]
+Robert_Brown; got it
16:05:05 [burn_]
Scribe: Michael_Johnston
16:05:12 [burn_]
ScribeNick: Michael
16:05:35 [burn_]
Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0006.html
16:06:34 [Michael]
burn: start with review of face to face minutes, will review again next week
16:06:41 [Michael]
burn: comments on minutes
16:07:05 [Michael]
topic: updated final report document
16:08:11 [Michael]
burn: comments on draft at this point?
16:08:18 [Michael]
all: silence
16:08:36 [Michael]
topic: agreed upon design decisions
16:08:49 [Michael]
topic: additional issues to add to list of issues
16:09:55 [Zakim]
-Satish_Sampath
16:10:53 [Michael]
michael: does move to have emma document in dom, remove impetus for json variant of emma
16:12:14 [Michael]
bjorn: have simple javascript api for accessing most common elements, dont need json variant of emma, for details can access emma object
16:12:24 [Michael]
milan: need to do xml parsing?
16:13:04 [Michael]
bodell: will be much the same as other http requests that return xml, dont need to parse
16:13:33 [Michael]
milan: are mobile devices a problem, verbosity of xml
16:13:43 [Michael]
bodell: know
16:13:51 [Michael]
s/know/no
16:14:24 [Michael]
dan: is there pressure from this group to build a json version of emma?
16:14:57 [Michael]
all: agreement: no push for json version of emma
16:15:13 [Michael]
burn: any other issues to add to list for discussion
16:15:37 [Michael]
topic: markup binding
16:16:01 [Michael]
bjorn: no feedback from chrome team yet
16:16:59 [Michael]
bodell: keep html binding lightweight, js constructor, simple for mechanism, small work to define, if dont want then remove the element
16:17:18 [Michael]
bodell: should not mess up js api
16:17:43 [burn_]
s/simple for /simple "for" /
16:17:58 [Michael]
olli: problem with for attribute it what it can point to, what elements can be used as target, doesnt quite work with content editable, important use case
16:18:39 [Michael]
olli: clarifies issue, need to make clear which elements can be targets and what the semantics is
16:18:58 [Michael]
olli: also content editable areas
16:19:58 [Michael]
michael: have to define semantics when target is e.g. a drop down or radio button
16:20:09 [Michael]
olli: may be new kinds of elements also
16:20:38 [Michael]
bodell: assumption would be to bind to any element, but they would not all have to work,
16:20:55 [Michael]
bodell; some browsers would want to handle more input types
16:21:21 [Michael]
olli: reco would be element in the dom, what is the benefit of the reco
16:21:32 [Michael]
olli: if for is not used
16:21:49 [Michael]
bodell: google desire to have element with microphone click api
16:22:22 [Michael]
bjorn: have proposed several things along the way,
16:22:48 [Michael]
bjorn; most important aspect is to have an element you can click to start speaking without the pop up or info bar
16:22:57 [Michael]
s/bjorn;/bjorn:
16:23:20 [Michael]
olli: no clear what the element gives
16:23:47 [Michael]
robert: follow up with chrome folks
16:23:55 [Michael]
bjorn: still waiting on that
16:24:21 [Michael]
bjorn: do agree that html element discussion does not block the js api discussion
16:24:32 [Michael]
olli: issue may get solved along the way
16:25:06 [Michael]
burn: need to see concrete proposal to make decision
16:25:43 [Michael]
topic: crucial decisions partially discussed
16:26:16 [marc]
http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html
16:27:17 [Michael]
burn: will go through each ...
16:28:21 [Michael]
bjorn: audio capture topic is dealt with, should be default way, if there is an audio capture api will deal with then
16:28:35 [Michael]
burn: audio codecs mandatory
16:28:50 [Michael]
robert: even IP status around speex is unclear also
16:29:15 [Michael]
robert: are only reasonable answers pcm and mulaw, despite their flaws
16:29:24 [Michael]
bjorn: flac, high bandwidth
16:29:27 [bringert]
FLAC
16:29:38 [bringert]
http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec
16:29:50 [Michael]
milan: speex is in ietf draft on how to package in rtp
16:29:57 [Milan]
http://tools.ietf.org/html/draft-ietf-avt-rtp-speex-07
16:30:25 [Michael]
burn: rtp way to send it does not mean there are ip issues
16:30:43 [Michael]
bjorn: need to require some codecs or cant be interoperable
16:30:57 [Michael]
milan: problems sounds similar
16:31:11 [Michael]
bodell; rfc does not have a patent policy
16:31:24 [Michael]
s/bodell;/bodell:/
16:32:51 [Michael]
burn: if something is necessary to implement the spec, and it is encumbered with IP, need to make that clear
16:33:19 [Michael]
bjorn: need protocol for interoperability
16:33:27 [Michael]
milan: protocol for RTC
16:33:53 [Michael]
burn: opus, codecs from two organizations, trying to blend, not clear if IP issues are being resolved, making container
16:34:05 [Michael]
burn: can use either one if you have permission
16:35:03 [Michael]
burn: dont have an answer yet, really need one, industry wide problem, may not be ours to solve, return to this
16:35:40 [mbodell]
See http://lists.xiph.org/pipermail/speex-dev/2003-November/000753.html for some similar discussion on patent of speex
16:35:44 [Michael]
robert: will follow up re: speex again
16:37:45 [Michael]
milan: impact on protocol team if need to negotiate codec
16:37:57 [Michael]
mark: speex is not good enough for tts
16:37:59 [marc]
ogg vorbis
16:38:08 [Michael]
s/mark/marc/
16:38:12 [marc]
and FLAC
16:38:43 [Michael]
burn: few names as candidates flac, ogg vorbis, speex, pcm
16:38:59 [Michael]
bjorn: already use flac in launched clients
16:39:15 [Michael]
olli: ogg vorbis is core html audio
16:39:49 [smaug]
not core HTML audio. Some browsers just happen to support it
16:39:58 [Michael]
burn: candidates to consider flac, ogg vorbis, speex, pcm
16:40:54 [Michael]
topic: do we support audio streaming and how?
16:41:27 [Michael]
burn: think we expect streaming, less clarity on how
16:41:46 [Michael]
milan: sending audio on regular time intervals as it is collected or generated
16:42:48 [Michael]
bjorn: discussed how to get events while capturing
16:42:59 [Michael]
bjorn: how it is done is a protocol question
16:43:16 [Michael]
burn: asr may begin before the user is
16:43:55 [Michael]
burn: finished speaking, result before engine comes
16:44:43 [Michael]
milan: without regular timed packets, wont get events on regular interval
16:44:54 [Michael]
bjorn: latency is what is app observable
16:45:06 [Michael]
bodell: having multiple events is not a big problem
16:45:13 [Michael]
bodell: data in events can deal with timing
16:45:36 [Michael]
milan: if app is realtime, five seconds ago go this event
16:46:13 [Michael]
bjorn; agree, what we need is low latency, not sure what we can require, part of being a good implementation
16:46:24 [Michael]
burn: market takes care of product requirements
16:46:47 [Michael]
robert: fair to say that standard should not have inherent limitations
16:46:57 [Michael]
robert: 50 ms or so is the threshold
16:47:23 [Michael]
bjorn: protocol design should not make it impossible to achieve low latency event delivery
16:47:47 [Michael]
marc: audio streaming in the tts case?
16:48:05 [Michael]
marc: send audio while still rendering rest of an long utterance
16:48:32 [Michael]
bodell: tts is generally fast enough that this is not a problem
16:49:54 [Michael]
marc: if tts has to process all text before returning audio, could be a problem,
16:50:30 [Michael]
marc: wants to make sure that what we create here does not prevent an implementation doing this
16:51:03 [Michael]
bjorn: up to engine whether it starts to synthesize
16:51:20 [Michael]
marc: wav format, header has filesize, makes proper streaming
16:51:48 [Michael]
bjorn: protocol should make it possible for the tts to be streamed and start playing before
16:52:03 [Michael]
bjorn: synthesis is complete
16:53:22 [Michael]
burn: issue of supporting format coming back in video and
16:53:37 [Michael]
burn: and playing the audio
16:54:03 [Michael]
bjorn: should not require playing audio from video
16:54:15 [Michael]
robert: api should not prevent this
16:54:50 [Michael]
burn: video with three audio tracks, how does apis select
16:55:15 [Michael]
robert: our proposal separated capture api from reco, could support different kinds of capture
16:55:35 [Michael]
burn: protocol design should not preclude streaming of video codecs
16:56:22 [Michael]
raj: why specify video?
16:56:55 [Michael]
robert: if codec can be packetized in real time should be ok
16:58:51 [Michael]
burn: the protocol should not inhibit the tranmission of codecs that have similar requirements to audio?
16:59:55 [Michael]
topic: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs?
17:01:13 [Michael]
bodell: issue of latency impacting times
17:01:25 [Michael]
bjorn; agreed UA being basis for the clock
17:01:37 [Michael]
s/bjorn;/bjorn:/
17:02:11 [Michael]
burn: dont have requirements for timing info from server
17:02:23 [Michael]
bodell: tts case?
17:02:45 [Michael]
bjorn: seems reasonable for server to include timing info
17:02:57 [Michael]
robert: could do offset from start
17:03:16 [Michael]
burn: something that UA can convert into UA local timestamp
17:03:25 [Michael]
burn: different ways to achieve that
17:03:50 [Michael]
burn: doesnt say what is made available in the api
17:05:39 [Michael]
bodell; many different times, when the utterance start etc
17:05:49 [Michael]
bodell: when received,
17:06:12 [Michael]
marc: impact on order that events are received
17:06:30 [Michael]
milan: will UA generate these events when using remote service
17:06:58 [Michael]
bodell: may assume energy detector gives you end of speech, before reco gives end of speech, hard to guarantee order
17:07:10 [Michael]
milan: start of energy is different than start of speech
17:07:37 [Michael]
milan: hard to write web app if get two start of speech events
17:08:26 [Michael]
bodell: different events,
17:08:38 [Michael]
bodell; was fixed order for the non continuous case
17:09:41 [Michael]
charles: could arrange fixed order delivery, even if times inside do not reflect this
17:09:59 [Michael]
bodell; no practical to hold events and put them in the desired order
17:11:25 [Michael]
burn: energy detector gets end of sound, then will get actual end of speech with better timing info, either get two or through away better info
17:11:39 [Michael]
marc: dont want to override better info from remote service
17:12:11 [Michael]
burn: front is for optimization so dont have to send all the audio
17:13:06 [Michael]
bodell: events could be in different orders
17:13:18 [Michael]
bodell: not convinced in having standard order
17:13:35 [Michael]
milan: UA only have sound start, sound end
17:14:25 [Michael]
milan: avoid duplication,
17:14:39 [Michael]
bodell; already have different event names
17:15:17 [Michael]
robert: in name need to make clear some events are from energy
17:15:34 [Michael]
robert: detector others are from speech reco
17:15:44 [Michael]
milan: source of events
17:16:58 [Michael]
bodell: unmake statement about specific ordering
17:17:15 [Michael]
milan: new statement that user agent can insert are energy related events
17:17:31 [Michael]
marc: and probably capture start and end
17:17:50 [Michael]
charles: seems strong since speech service might or might not be remote
17:19:00 [Michael]
burn: removed ordering
17:19:18 [Michael]
burn: energy detector can only generate sound start stop
17:19:44 [Michael]
burn; speech service can only deliver the speech start stop
17:20:21 [Michael]
charles; if not order can be guarantee delivery
17:20:27 [Michael]
burn; how to guarantee it '
17:21:23 [Michael]
s/burn;/burn:/
17:21:58 [Michael]
milan: as long as have single source for events
17:22:08 [Michael]
michael: (need a blackboard for this)
17:22:34 [Michael]
bodell: solved by removing required ordering
17:22:50 [Michael]
bodell: allows all the use cases
17:23:29 [Michael]
bodell: also works with continuous case
17:23:44 [Michael]
bodell: thought had solved the issue
17:23:53 [Michael]
burn: but start before end?
17:24:07 [Michael]
burn: can get end without having seen a start
17:24:41 [Michael]
milan: reluctant to give up the ordering, if have single source for each type of event
17:25:33 [Michael]
burn: agreed speech service can only generate one, can't guarantee that they wont cross in time
17:25:49 [Michael]
milan: use remote speech service as the canonical
17:26:47 [Michael]
bodell: easiest to understand cross for end, UA would raise both events in the order they occurred
17:27:08 [Michael]
milan: it is possible to impose an ordering
17:27:46 [Michael]
milan: pros and cons, flexibility, or predictability for the web app developer
17:29:38 [Michael]
bjorn: events from the same source should be in the same order
17:31:19 [Zakim]
-Michael_Bodell
17:31:21 [Zakim]
-Raj_Tumuluri
17:31:21 [Zakim]
-Dan_Druta
17:31:22 [Zakim]
-Milan_Young
17:31:22 [Zakim]
-Marc_Schroeder
17:31:23 [Zakim]
-Bjorn_Bringert
17:31:23 [Zakim]
-Patrick_Ehlen
17:31:24 [Zakim]
-Debbie_Dahl
17:31:26 [Zakim]
-Olli_Pettay
17:31:31 [Zakim]
-Robert_Brown
17:31:36 [burn_]
rrsagent, make log public
17:31:38 [Zakim]
-Michael_Johnston
17:31:39 [Zakim]
-Charles_Hemphill
17:31:40 [burn_]
rrsagent, draft minutes
17:31:40 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:32:52 [Zakim]
-Dan_Burnett
17:32:59 [burn_]
zakim, bye
17:32:59 [Zakim]
leaving. As of this point the attendees were Dan_Burnett, Milan_Young, Marc_Schroeder, +1.425.828.aaaa, Robert_Brown, Patrick_Ehlen, Charles_Hemphill, +44.208.785.aabb,
17:32:59 [Zakim]
Zakim has left #htmlspeech
17:33:03 [Zakim]
... Satish_Sampath, +1.925.302.aacc, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri
17:33:53 [burn_]
s/+1.425.828.aaaa, //
17:34:08 [burn_]
s/, +44.208.785.aabb//
17:34:21 [burn_]
s/, +1.925.302.aacc//
17:34:26 [burn_]
rrsagent, draft minutes
17:34:26 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:36:56 [burn_]
s/17:32:59 [Zakim] Zakim has left #htmlspeech//
17:37:02 [burn_]
rrsagent, draft minutes
17:37:02 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:39:10 [burn_]
s/, Charles_Hemphill,/, Charles_Hemphill, Satish_Sampath, Glen_Shires, Michael_Johnston, Olli_Pettay, Michael_Bodell, Bjorn_Bringert, Dan_Druta, Debbie_Dahl, Raj_Tumuluri/
17:39:14 [burn_]
rrsagent, draft minutes
17:39:14 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/06/02-htmlspeech-minutes.html burn_
17:58:30 [ddahl]
ddahl has left #htmlspeech
19:26:04 [smaug]
smaug has joined #htmlspeech
21:08:37 [smaug]
smaug has joined #htmlspeech