15:50:00 RRSAgent has joined #htmlspeech 15:50:00 logging to http://www.w3.org/2011/04/07-htmlspeech-irc 15:50:09 Zakim has joined #htmlspeech 15:50:17 trackbot, start telcon 15:50:19 RRSAgent, make logs public 15:50:21 Zakim, this will be 15:50:21 I don't understand 'this will be', trackbot 15:50:22 Meeting: HTML Speech Incubator Group Teleconference 15:50:22 Date: 07 April 2011 15:52:16 zakim, this will be htmlspeech 15:52:16 ok, burn; I see INC_(HTMLSPEECH)12:00PM scheduled to start in 8 minutes 15:53:17 Chair: Dan Burnett 15:53:30 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0002.html 15:54:04 INC_(HTMLSPEECH)12:00PM has now started 15:54:10 +??P26 15:54:25 zakim, code? 15:54:25 the conference code is 48657 (tel:+1.617.761.6200 tel:+33.4.26.46.79.03 tel:+44.203.318.0479), burn 15:54:34 good, apparently the network is good enough here 15:54:43 +Dan_Burnett 15:54:45 Zakim, ??P26 is Olli_Pettay 15:54:45 +Olli_Pettay; got it 15:54:53 zakim, I am Dan_Burnett 15:54:53 ok, burn, I now associate you with Dan_Burnett 15:55:36 zakim, nick smaug_ is Olli_Pettay 15:55:36 ok, burn, I now associate smaug_ with Olli_Pettay 15:57:09 +Milan_Young 15:57:46 Milan has joined #htmlspeech 15:57:51 +Michael_Bodell 15:58:09 zakim, nick Milan is Milan_Young 15:58:09 ok, burn, I now associate Milan with Milan_Young 15:59:43 +??P32 15:59:56 + +1.818.237.aaaa 15:59:56 zakim, ??P32 is Raj_Tumuluri 15:59:57 +Raj_Tumuluri; got it 16:00:28 +AZ 16:00:29 zakim, aaaa is Patrick_Ehlen 16:00:29 +Patrick_Ehlen; got it 16:00:36 + +1.425.421.aabb 16:00:37 +Jerry_Carter 16:00:50 + +1.425.391.aacc 16:01:40 zakim, aabb is Robert_Brown 16:01:40 +Robert_Brown; got it 16:01:44 ddahl has joined #htmlspeech 16:01:46 zakim, aacc is Dan_Druta 16:01:46 +Dan_Druta; got it 16:01:59 zakim, AZ is bringert 16:01:59 +bringert; got it 16:02:21 zakim, AZ is Bjorn_Bringert 16:02:22 sorry, burn, I do not recognize a party named 'AZ' 16:02:25 +Debbie_Dahl 16:02:36 zakim, bringert is Bjorn_Bringert 16:02:36 +Bjorn_Bringert; got it 16:02:42 Robert has joined #htmlspeech 16:02:59 zakim, mute Bjorn_Bringert 16:02:59 Bjorn_Bringert should now be muted 16:03:20 zakim, unmute Bjorn_Bringert 16:03:20 Bjorn_Bringert should no longer be muted 16:03:39 zakim, nick Robert is Robert_Brown 16:03:46 ok, burn, I now associate Robert with Robert_Brown 16:03:47 zakim, nick ddahl is Debbie_Dahl 16:03:50 ok, burn, I now associate ddahl with Debbie_Dahl 16:05:03 Scribe: Debbie Dahl 16:05:07 ScribeNick: ddahl 16:05:08 MichaelBodell has joined #htmlspeech 16:05:42 Meeting: HTML Speech XG 16:05:44 zakim, nick MichaelBodell is Michael_Bodell 16:05:44 ok, burn, I now associate MichaelBodell with Michael_Bodell 16:06:26 topic: f2f logistics and planning 16:06:43 bjorn: several people have asked for rooms 16:06:55 ...is there anyone else? 16:07:04 dan: I will need a room 16:07:24 +Michael_Johnston 16:07:28 bjorn: I need the maximum number of days that you'll stay there. 16:07:51 ...is anyone opposed to a better hotel, costs 7 GBP more? 16:08:09 -Olli_Pettay 16:08:15 bjorn: i will switch us to a better hotel 16:08:41 ...will send out a form to see how many are coming 16:09:03 dan_druta: will come and let you know. 16:09:15 raj: will come 16:09:29 bjorn: nothing else about arrangements 16:09:35 +??P14 16:09:41 topic: open questions about proposals 16:10:00 zakim, ??P14 is Olli_Pettay 16:10:00 +Olli_Pettay; got it 16:10:33 ehlen has joined #htmlspeech 16:10:35 Raj has joined #htmlspeech 16:11:24 danB: new person 16:11:35 patrick: Patrick Ehlen from ATT 16:12:11 dan: for each proposal would like to hear a quick summary of what your proposal does and doesn't do with respect to the other proposals. 16:13:06 ... proposers should just take the floor and discuss, even if other proposers may want to make a correction. 16:13:24 ...bjorn starts. 16:13:39 zakim, nick ehlen is Patrick_Ehlen 16:13:39 ok, burn, I now associate ehlen with Patrick_Ehlen 16:13:53 bjorn: MS proposal, there aren't a lot of commonalities between ASR and TTS. 16:13:56 satish has joined #htmlspeech 16:14:03 ...is that correct? 16:14:19 danB: will discuss later 16:14:38 zakim, nick Raj is Raj_Tumuluri 16:14:38 ok, burn, I now associate Raj with Raj_Tumuluri 16:14:44 bjorn: MS includes both a Javascript API and a browser-server protocol 16:14:53 zakim, who is on the phone? 16:14:53 On the phone I see Dan_Burnett, Milan_Young, Michael_Bodell, Raj_Tumuluri, Patrick_Ehlen, Bjorn_Bringert, Robert_Brown, Jerry_Carter (muted), Dan_Druta, Debbie_Dahl, 16:14:57 ... Michael_Johnston, Olli_Pettay 16:15:03 ...would like to break these apart 16:15:31 ...sums up MS proposal, but thinks that MS API and Google proposal could be merged. 16:15:51 zakim, nick satish is Satish_Sampath 16:15:51 sorry, satish, I do not see a party named 'Satish_Sampath' 16:17:28 ...Mozilla proposal is similar to Google, but Mozilla doesn't allow user-initiated recognition without a permission prompt, but Google does, and this is an important use case for us. 16:17:46 DanD has joined #htmlspeech 16:18:10 zakim, nick DanD is Dan_Druta 16:18:10 ok, burn, I now associate DanD with Dan_Druta 16:18:27 bjorn: the proposal for the WebApp API could say what implementation is used, and there could be a different proposal for how the browser talks to that implementation. 16:18:57 olli: how does Google's proposal do that without click-checking? 16:19:22 bjorn: the browser must make it clear that it's starting recognition 16:19:32 dan: click jacking or click checking? 16:19:51 bjorn: should be click jacking, not checking 16:20:15 ...there should also be click checking to make sure that it was really the user. 16:20:37 dan: switch to olli's discussion now. 16:21:57 olli: the differences are minor. wasn't thinking much about the network engine. 16:22:18 ...about Google's proposal it seems that it would be difficult to handle multiple fields at once 16:22:37 ...that's one reason why X+V was so difficult 16:23:14 ...wouldn't like to bind recognition results to one input field 16:24:29 ...also, user-initiated recognition, i don't see the difference if the user is clicking something and that starts recognition, that could be ok at first, so I don't see the difference between Mozillas and Google's proposals. 16:25:14 ...MS proposal using Web Sockets is minor but could be good if we want to allow remote speech engines 16:26:24 ???: question for Olli, you said that we must handle click-jacking? 16:26:32 that was Milan 16:26:43 s/???/milan 16:27:13 olli: not sure how Google's proposal handles this 16:27:28 milan: in summary, you don't find that sufficient? 16:27:49 olli: no 16:28:43 robert: I agree that if you look at high level scripting API the proposals are similar. 16:29:14 ...the high level speech semantics are very similar and we should be able to converge pretty easily. 16:29:28 ...there are only so many ways to build a speech API 16:30:04 ...one of the things that we're trying to achieve is to allow a lot of openness so that the ASR and TTS is not determined by the manufacturer of the browser. 16:30:54 ...one thing I'm concerned about with Google and Mozilla is that there's an intent to handle that later, but I think it needs to be handled now. we need in the first version to handle some interopability. 16:31:46 ...what could we do to provide a simple protocol with existing API's. we proposed XHR, but Web Sockets would be fine. we wanted to say that it's not a hard problem. 16:32:36 ...the second comment is that we tried to take a scenario-focused approach. our document specified a half-dozen or so apps, and tried to think about requirements. 16:32:47 ...this is why we put ASR and TTS into the same spec 16:33:17 ...there are a number of scenarios that would be difficult if the speech was just built into the browser 16:34:00 ...a comment about user-initiated speech. we're skeptical that just having a button that the user pushes insures privacy. 16:34:25 ...there will be many kinds of devices, we believe that consent should be built into the browser implementation. 16:35:07 ...we don't want the speech API to be a de facto microphone API, we should provide microphone requirements into an existing effort. 16:35:52 ...on the question of v1 vs v2. we aren't opposed to a second version, but we don't want v1 just to be the easy things, it should include the important things. 16:36:38 ...regarding TTS, it takes the things that seem to work from the media element, but not the things that don't apply, like multiple tracks. 16:36:50 -Olli_Pettay 16:36:59 dan: questions for Robert? 16:37:17 milan: robert, how does your proposal handle a default recognizer? 16:37:56 +??P0 16:38:07 zakim, P0 is Olli_Pettay 16:38:07 sorry, burn, I do not recognize a party named 'P0' 16:38:14 zakim, ??P0 is Olli_Pettay 16:38:14 +Olli_Pettay; got it 16:38:19 robert: you use a constructer without that parameter, if there are multiple recognizers availble you could use those parameters to select an appropriate one. 16:39:38 danB: this disussion will be more unstructured and open. next week we'll have a more structured discussion. first bjorn will get a chance to respond. 16:40:41 bjorn: regarding olli's point about multiple input fields, you could do that with scripting, or we could use something like MS. 16:40:55 ...(missed comment about random selection) 16:42:29 ...on the question of whether clicking implies consent, we say that clicking could imply consent, but there could be other ways. Also agree that other engines could be used, but one way to do that would be, for example, Nuance, to write a plugin. 16:43:01 ...you could have a Javascript API with a parameter that says which engine would be used. 16:43:37 ...would like some clarification on what use cases couldn't be supported by default recognizer 16:44:00 ...we agree that we don't want to work on microphone API 16:44:26 dan: the floor is open. question for bjorn about the click-to-speak issue 16:45:22 ...there could be a button to click but that doesn't necessariy imply consent. it is still the browser's responsibility to insure consent. 16:45:45 bjorn: a button could insure consent. 16:46:30 dan: the browser could even treat lack of clicking for consent with some use cases. 16:47:05 raj: another use case for not using the default recognizer might be if you have an SLM, which aren't interoperable. 16:47:50 danB: does Google.com want the default recognizer in IE to be the MS recognizer? 16:48:20 ...individual sites may have a strong preference for a recognizer to be used. 16:49:11 robert: for example, Nuance have a lot of enterprise customer care speech applications, and customers will want to leverage that investment. 16:50:26 danD: if the web developer wants to specify an engine they should be able to do that. the browser should provide a default. also the user should be able to specify a recognizer. 16:51:20 danB: if the user has asked for another recognizer, then web application should be able to not render. 16:51:37 robert: we've already agreed to this 16:51:59 bjorn: doesn't disagree 16:52:11 jerry: what about local resources? 16:52:19 bjorn: everyone agrees on that 16:52:52 jerry: many free-form grammars would only work with certain engines 16:53:52 bjorn: we have broad agreement. with MS proposal we could split control of recognizer from selection of recognizer. 16:54:55 robert: in principle that would be reasonable, but don't want to lose track of one of those topics. 16:56:01 danB: what we do with TTS and ASR should be synchronized. 16:56:58 ...some use cases only involve TTS, for example. 16:57:18 Milan: reluctant to split the solution (tts, asr, protocol) into many documents because vendors may choose to implment only select pieces 16:57:31 bjorn: two different things, ASR vs. TTS and web app vs. server 16:57:44 ...does anyone have concerns about splitting? 16:58:08 milan: only that browsers might cherry-pick specs 16:58:22 robert: would not ratify one spec if the other wasn't satisfactory 16:58:48 michaelB: if they were together it would be easier to keep things in synch. 16:59:27 bjorn: the web app api could be done, and then the server-side one could depend on that. 16:59:48 michael: the questions about synch and ratifying at the same time argue for one proposal. 17:00:17 danD: if we had two efforts it would speed up adoption but it still should be one spec. 17:01:06 bjorn: there should be a single API for the web app and another one for how the browser talks to the engine. 17:02:06 milan: I had a proposal for a way to unify the Mozilla proposal and MS proposal by using macros over the MS proposal to make it look more like Mozilla. 17:02:23 bjorn: that's mostly syntactic 17:02:30 Milan's email (thread): http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Mar/0040.html 17:02:39 milan: the MS proposal talks to the server in the web app 17:03:11 danD: we should move away from syntax to more declarative types of statements. 17:04:08 robert: not sure, for example, in SALT, html elements were simple, but then you had to write a lot of scripting 17:04:20 bjorn: this is a different discussion 17:05:24 michaelJ: it's important to keep specification of API to the server because there will be a lot of overlap between the API's. we don't want to end up with different names for things. 17:06:05 milan: but we had separate specification, they should be ratified together. 17:06:24 bjorn: that seems reasonable, and they should be developed in parallel 17:07:08 ...they could be separate so that people who write webapps only have to look at one thing. Also, they could go to different standards organizations. 17:08:27 danB: at IETF, talked about real-time collaboration between web browsers (RTC web) won't be working on new protocols. the interface from browser to engine will introduce some requirements. 17:08:50 bjorn: there could be several protocols for talking to servers, so that more could be added later 17:09:16 ...for example, VoiceXML and SRGS aren't in the same spec 17:10:13 milan: we agreed that there should be a protocol for communicating to a speech server. 17:11:17 danB: there could be a "mandatory to implement" requirement. any web app API is not complete unless it includes a "mandatory to implement" requirement for server communication that is defined by this group. 17:11:55 ...we should begin to do this because our requirements are different. 17:12:25 robert: VoiceXML/SRGS analogy is different because SRGS can be used independently. both are tightly coupled. 17:12:56 bjorn: web app API makes sense by itself and also server API 17:13:27 ...we are implementing both of those at Google. We have non-browser clients that use the server API 17:13:55 milan: how about an MRCP over HTTP protocol? 17:14:05 ...are people familiar with MRCP? 17:14:21 bjorn: seems a lot more complex than MS proposal 17:14:33 milan: MS is a simplified version of MRCP 17:15:00 robert: that is kind of what we've done, could also do MRCP over Web Sockets. 17:15:59 raj: MRCP is a good idea, because it's already been implemented, but wouldn't it be overkill for a local system? 17:16:11 milan: most OS's would optimize that 17:16:27 bjorn: it's more than just efficiency 17:16:49 milan: talking about using MRCP paradigm, not full MRCP 17:18:16 dan: MRCP is a protocol that just controls ASR and TTS resources. MRCP v2 makes use of SIP to set up and MRCP session, but from then on all communication is MRCP. milan is talking about the MRCP protocol itself, which doesn't require SIP. 17:18:34 milan: would be willing to stage this. 17:18:49 jerry: MRCP in the browser is very messy. 17:19:07 robert: could we layer MRCP over Web Sockets 17:19:52 milan: i'm not suggesting that developers would program to MRCP, in a web app you would have to have simpler concepts, or the browser could support it, which would totally mask it from the developer. 17:20:13 danD: it should definitely be abstracted from the web browser 17:20:36 ...it will enable both weekend and enterprise develoers to use the spec 17:20:50 s/develoers/developers 17:21:13 dan: any other topics that require discussion? 17:21:36 danD: the proposals lack clarity around privacy, preferences and consent. 17:21:58 bjorn: they should be up to user agents 17:22:14 danD: we need to put some mandates on the developers of user agents. 17:22:40 robert: for example, a way to indicate to the recognizer that it shouldn't log? 17:23:09 danD: yes, should have a very clear indication of what the user can specify or override in regard to the speech interaction 17:23:58 robert: it depends highly on the user agent itself. a cell phone is different from the dashboard of a car or one that's being used by a blind person. I don't feel comfortable mandating something. 17:24:36 danD: for example, where do we display what engine is being used? do we want to have a consistent way for the user to specify their profile? 17:25:09 danB: it seems clear that we must address this topic in a specification 17:25:26 bjorn: about protocols vs. web api's. 17:25:47 we also need to discuss microphone API 17:25:58 ...there was discussion about protocols, but we didn't talk much about web api's. we pretty much agree on web api's. 17:26:07 michaelB: not sure about the details. 17:26:33 michaelJ: agree, details need to be worked out. 17:26:49 bjorn: yes, but we seem to agree on high level. 17:27:21 robert: we seem to be moving in the direction of a JavaScript API, although not in HTML, or the protocol. 17:27:46 bjorn: if we start on the web api, there are a lot of things we could agree on. 17:28:36 danB: major issues need to be worked out early during process, but it's also good to be able to make progress. so we need to be able to do both at the same time. 17:29:19 ...that is, discuss big issues and work out details of things we roughly agree on. 17:30:29 michaelB: agree, this is a reason it's useful to have things in the same document. 17:31:07 danB: Michael and I will talk about how to structure discussion, e.g. write down things we agree on. 17:31:38 ...it might be too soon to work out details of proposals. 17:32:22 robert: one thing we don't agree on is microphone api. 17:32:29 milan: also result format 17:32:57 (we may not need to think about microphone if we move to use audio streams) 17:33:00 -Patrick_Ehlen 17:33:01 -Jerry_Carter 17:33:01 -Michael_Johnston 17:33:03 -Dan_Druta 17:33:03 -Raj_Tumuluri 17:33:03 -Olli_Pettay 17:33:04 -Milan_Young 17:33:05 -Michael_Bodell 17:33:07 -Bjorn_Bringert 17:33:11 -Robert_Brown 17:33:54 zakim, who is here? 17:33:54 On the phone I see Dan_Burnett, Debbie_Dahl 17:33:55 On IRC I see DanD, satish, Raj, ehlen, MichaelBodell, ddahl, Zakim, RRSAgent, burn, smaug_, trackbot 17:34:47 -Debbie_Dahl 17:34:57 -Dan_Burnett 17:34:59 INC_(HTMLSPEECH)12:00PM has ended 17:35:00 zakim, who is on the phone? 17:35:00 Attendees were Dan_Burnett, Olli_Pettay, Milan_Young, Michael_Bodell, +1.818.237.aaaa, Raj_Tumuluri, Patrick_Ehlen, +1.425.421.aabb, Jerry_Carter, +1.425.391.aacc, Robert_Brown, 17:35:02 ... Dan_Druta, Debbie_Dahl, Bjorn_Bringert, Michael_Johnston 17:35:04 apparently INC_(HTMLSPEECH)12:00PM has ended, burn 17:35:05 On IRC I see satish, Raj, ehlen, MichaelBodell, ddahl, Zakim, RRSAgent, burn, smaug_, trackbot 17:35:31 rrsagent, draft minutes 17:35:31 I have made the request to generate http://www.w3.org/2011/04/07-htmlspeech-minutes.html burn 17:35:35 rrsagent, make log public 17:38:23 zakim, bye 17:38:23 Zakim has left #htmlspeech 17:38:32 rrsagent, bye 17:38:32 I see no action items