Media Capture F2F (TPAC2014) -- 30-31 Oct 2014

<inserted> ScribeNick: dom

Welcome

hta: this is the media capture task force meeting as part of the WebRTC WG F2F
... We're hoping to be at the stage where we stop making big changes to the document
... we need to review a last set of changes before we're sure we're done

<stefanh> mom: http://lists.w3.org/Archives/Public/public-media-capture/2014Oct/0186.html

stefanh: minutes approved

Authenticated Origins

stefanh: this topic has been discussed in the task force before
... it's been quite a heated debate

hta: we have a TAG representative on the phone

domenic: that would be me
... work for Google, been elected to the TAG ~ a year ago

hta: we wanted someone from the TAG to get some keywords on why imposing that restriction might be a good idea

domenic: the TAG has made a statement that we should move sensitive APIs to authenticated origins over time
... i.e. a deprecation plan for using getUserMedia on plain http

ekr: I don't find that analysis is uncompelling
... the attacker can detour you from an unsecure origin

domenic: a more interesting technical question is whether this provides protection to the user
... we've seen that it does

<alexG> domenic: we think authenticated origin provide more protection against proven attacks.

<scribe> scribenick: alexG

ekr: supposing that https is enough even if you connect to an attacker site is not going to work.
... the problem is not asking GUM access over https
... the problem is to know when you can trust, wether it is http / https

dominic: at least with https we can have alittl ebit more control and add red flags in the address bar.

<npdoty> it sounds like ekr is arguing that the user doesn't have any reason to trust the origin on http-only, but that it should be allowed in that case anyway

ekr: what is the way forward with this questions in the absence of specific use case one way or another?

domenic: i just wanted to convey the position of TAG, thank you

hta: this is a good overview of the disagreements today

domenic: .... The specs should be consistent or it is not going to work. We should make things better for user down the path. nobody is proposing anything crazy, really.

hta: this is actually how the first message came across. that might be the reason of the ... strength .... of the reaction

domenic: i m happy i clarified this thing then.

adamR: justin, do you have comment on this conversation?

<npdoty> it sounded to me like TAG was not suggesting a "flag day"

justin: chrome would like to move to more https in the future, but breaking existing content is not ok. We should have a multi year deprecation process, but having a flag day today is not going to happen.

ekr: couldn t we move ahead with GUM like it is today.

dom: i heard your point, ad i find it compeling

today, GUM is working on any origin, and there dis no compelling reason that we see to move away from that today.

domenic: the worse outcome would be for users to see that you guys are putting together specs without making sure we re looking forward the future, and just saying that s how we do it today, and so be it. We would like you guys to have a future direction statement.

ekr: what about a non normative note.

hta: what about a statement like " a conform ant browser MAY decide to make this available over https only."

<npdoty> that doesn't lend itself to interoperability

ekr: i thought it was the car, but if it s not, i d be happy to do it."

hta: let s make it the case.

matthew: source selection

<npdoty> +1

domenic: someone brought the interop issue. one problem would be if one browser would work only over https, and another one would not, then call would not be established.

ekr: can we stop saying that people that don t want to go for https don t care about the users.
... we have disagreement here, and let's agree to disagree.
... can we state that everybody wants to do what is right for the user, we just disagree about how to do it?

domenic: ok

matthew: i was observing that the GUM API as it stands might not have this property
... but we are having other things in the spec that potentially change the profile, hum, usage of this thing.

getuserdevices has the possibility to enumerate devices and expose some things. it does not change the security profile, but give us the capacity to expose more or less informations that in turn influence the user decision

martin: is there an opportunity there to use this ?

ekr: my bad :)

dom: we have a rough agreement
... that non normative note is good

domenic: yes, and i encourage that note to encourage https

hta: any volunteer to draft this note

ekr: i suggest that justin does it

hta: ekr has volunteered to do it, and requested help from justin.
... domenic thank you for having showed up, and enabled this discussion

dom: we re happy for you to stay of course if you want

domenic: ok, good luck guys

MediaStreamTrack.ended

jib: presenting mediastreamtrack.ended
... i m just going to show the problem
... present a solution
... and then we can speak about it
... JS arrow functions as an example
... <slide 3>

I'm also going to use promises

jib: here is background info and links

burn: and it will also be part of the specs

jib: <slide 4>
... we have this ended event
... the only thing it tells you is that the track has ended.
... two kind of problems
... call could have hanged up or dropped (which one?)
... GUM could have stop capturing, have had permission problem, or a driver issue (which one?)
... <slide 5>
... so I propose an ended promise
... allows to differentiate between two cases: success and failure
... consistent with usage of promises for state change

ekr: why not a "started" equivalent?
... my point is that not all those events have state changes

jib: i m not proposing to replace the existing ones, i just want to show another way to get the errors and differentiate between several "types" of ended tracks

hta: history of the problem: can we tell the difference between a track that ended between an error or not

ekr: my concern is API consistency

burn: you did not say you suggested we should remove the original one

ekr: that s then even worse if we don t remove: we have two APIs for the same things, and I don t know when to use which

ShijunSun(MS):

is it the right way to handle all the errors, from different objects

jib: let s get to the second example

jib: in this example, I don't care if I succeeded, it is just showing the different syntaxes
... you can do a switch
... i did a pull request where I pull all the existing errors and show how this would look like.
... here the error can happen upfront, or later on
... you just don t want to end up not catching an error
... and there has been no other proposal that does all that so far.

juberti: i wouldn t like to use the excuse of having promises for GUM to use them everywhere else.

ekr: i agree with justin

jib: does seem like ......

shijunshun: +1

stefan: do we have the need for this?

ekr/juberti: yes

jib: you need to ale *some changes*

ekr/justin: yes

jib: then why not using prmises?

ekr: why not in this case, but not all events have that need, and we should note use promises everywhere because they are good for GUM.

jib: I hear the consistency argument
... there dis another pattern
... we should use the best language that solves the problem

ekr: i do not thing promises is that language

adam: events happen more than once

jib: ended only happens once

adam: yes, but for other events, it does not stand and promises should not be used.

<Domenic> promises for state transitions that happen only once and that people might be interested in after the fact are strictly better tha nevents

<Domenic> events are not appropriate for that case

ekr: it can t be all promises

<Domenic> and we have only used them because historically they were all we had

juberti: we changed from having a consistent use of callbacks, and now we would have some promises and some callbacks, and I don t like that as it does not bring such added value.

<dom> Domenic, the argument that is being made is that mixing events and promises for handling events makes for a confusing API

<Domenic> you have to be looking for consistency among things that are alike

burn: we spend a lot of time defining which one should be event and which one should be callbacks

<Domenic> things that can jhappen more than once and things that happen once are not alike

<dom> well, both describe the object state machine

burn: and we spend a lot of time making sure that programmers could almost guess from the pattern when one should be used.

dom: we need a tech proposal.

<Domenic> dom: that's fair

ek: right, i m happy to write a proposal

hta: there is such a proposal in the bug that triggered that proposal

er: even better, i ll do nothing!

hta: there seems to be a rough consensus that we should extend events and not use promises.

jib: any other questions on this?
... thank you

hta: we are now pretty ahead of schedule

Audio output device enumeration

hta: let's have the audio output device enumeration discussion

<dom> Justin's slides

juberti: in addition to having enumeration of input device, we also have the same feature for OUTPUT devices but we have no way to access it.
... why would we do that? #1 requested feature, before screensharing and others.
... usage scenario
... changing to usb or bluetooth headset
... right now, haven to change the system settings

<npdoty> does "in chrome" mean "in the web page, not in browser chrome"?

juberti: no API for setting devices.
... we have a way to enumerate them, but no way to SET them

<npdoty> why do we even have a way to enumerate output devices?

<dom> npdoty, this idea was to enable this use case

<dom> (even though we've been missing the last piece of that puzzle)

juberti: you want to avoid a few use case where arbitrary webpage cannot play content on your audio without user consent.
... a prompt would not be practical
... <slide 5>
... for any mediaElement (<audio. or <video>) would have an ID
... by default, set to empty, and use default output (always OK, today;s case)
... specific devices could deb set (unsung the enum API info) is application is authorized. web audio could also use it.
... <slide 7>
... most cases you could use the same group IS for input / output devices.
... for other apps that would need finer granularity, there would be another way of doing this.

burn: permission is then for the group by default?

juberti: exactly.

dan: would that show all permutations in the grouping? How do you define the grouping?

juberti: that s for composite device, they have the same device ID / group ID.

ekr: what would be the lifetime of the permission ?

juberti: same as GUM.

ekr: as long as origin is tagged, the permission stays.

martin: if you have persistent permission, it means yo have access to all device at any time.

juberti: yes, if you have access to all INPUT devices, and all OUTPUT devices are grouped with input device, that s true.

martin: I think we should make it explicit.

juberti: the coupling is quite elegant, and better than just using input devices.

martin: we don t need more prompt

<npdoty> indeed. why are we pushing this onto the page?

adam: even if I , as an application, have access to all the input device, i might not have access to all output devices?

<dom> for proper UX, npdoty (it's hard to build a nice user experience when everything is pushed to the browser chrome)

juberti: you can already enumerate all of them, you just can t use output device.
... you know by using system permission, y ou already have practical access to all devices.

<npdoty> dom, the group thinks proper UX is more likely to happen if we distribute it out to every different web developer in the world

<npdoty> ?

juberti: i think that 99% will either use the default setting, or ONE specific coupling they will give permission to.

shijunshun: how to handle one the fly plugging in or plugging out devices

juberti: not sure yet.

<dom> npdoty, I think that's a correct characterization, yes

martin: ....

shijunshun: we have the notion of a default device, if anything is plugged in, the headphone has priority, and we fallback to default automatically. Now, it seems that webrtc would be a regression from what we propose today.

matrin: i know how to solve that problem i think

martin: there would be physical and logical devices

by using logical devices, then we can switch on the fly between physical devices.

shijunshun: does not have to be in the OS, could deb in IE.

juberti: enumartion API should preset those so the user know which one to choose from

shijunshun: iframe might have different settings, so we have to be careful.

juberti: things working out of iframe would be an issue anyway, if only for PeerConnection.

shijunshun: my comment was more about the scope. do we want it restricted? do we want all page to control, including iframe, kind of overloading iframe settings?

hta: about usage,
... earlier in the week i was in the audio group
... and they are very interesting in using the same mechanism.

shujunshun: great, let's make sure the use case are all written.

burn: let s say you are in an iframe
... you can only set a device as output if you have permission to do, even though you could see it in the enum

juberti: well not exactly, i think we can enumerate all, but you only get access from grouping.

<Zakim> npdoty, you wanted to ask if the cases are so often coupled, why does it need to be exposed at all?

<someone from W3C> your assumption seems to be that the coupling is very frequent. It does not seems it need to be enumerated. and you re also adding a whole list of permission dialogs.

juberti: this avoids permission dialog
... having a generic API ......
... the API we have here announce that abstraction, but underneath we have to deal with another layer ...
... we have to deal with the cases where input and output are not a unique physical device.

npdoty: if the browser does not handle the setting, will the website allow me to do it.

martin: the site might have many things he wants to do at different point in time
... if you play music, you might want to keep rendering that music on the same device.
... but when you have a site that simultaneous plays music and communication
... you don t really have today the flexibility to handle the user experience the way you want

phil: many output devices in our case are not "grouped" with input device

and it s very important for us that the app should be able to use different devices.

phil: another use case: my son is listening to radio with headset, while i m watching a movie locally on my coputer

computer

juberti: we did an app that is media mixing, kind of garage band. There is no input for app. If we are saying that permission only from GUM, ....
... if you use a real pro audio app, you understand already the notion of door hanger

<npdoty> it sounds to me like we're suggesting that every page can choose non-coupled input/output (or maybe it won't have implemented it), which will cause more permission dialogs, but the user can also choose it separately on the browser

juberti: for most of web users, the permission is much simpler

<npdoty> and if the user sets it in their browser first and then the site wants to change it?

juberti: but this API also gives us the capacity to use door hanger for more professional apps.

<npdoty> and if the user asks where they're supposed to configure audio output? in the site or in the browser? or in the site first but maybe overridden in the browser?

ekr: do I understand that the goal is to allow a website to minimize the number of prompts for the most generic cases

juberti: yes
... 90% would be: use the default or use that one specific set of devices.

fluffy:

<ekr> I use the system microphone and the headset for sound

fluffy: the app would enumerate the devices
... the app would then ask permission for a specific device
... the door hanger would then kick in, and the app would get access?

juberti: yes, or you have given persistent permission to that device beforehand and the door hanger would not even be needed

martin: is there a need for labels for groups ....?
... you said default and default group

juberti: ....

martin: the use case you mentioned is only for app that are already using this API

juberti: yes

martin: then they should be aware of this problem, and have an UI, and so on

<npdoty> mt, because you think developers are unlikely to make mistakes about edge cases of hardware?

juberti: well, yes, but they could still make a bad choice. generally, the complexity is not transferred to the app.

dan: ... would it be good to be able to select input/output only to simplify the list ?

juberti: practicalities make it something we don t want.

phil: is there a way that JS know in advance which permission it has access to?

juberti: yes

phil: some devices are also accessible, how do we populate the drop down with that?

juberti: good point
... how do we do for output device, what we do with the input device? .... that s a good question, i need to think about that.

phil: enumarate device might prompt once for allowing ALL devices to be used. so the enumerate API also allow them in one step.

juberti: yes , could do that, but it would be difficult to understand by users.

martin:

<npdoty> it would be a new permission model to say you get permission to things that are less egregious than any permissions you've already granted.

juberti/martin: discussion about how to do it right.

phil: just to clarify, i just want a way for the user to enable all the output device.

juberti: we might b something new to enable what you propose

burn: the persistent permission implies access to all input devices
... and that surprises me
... <reading specs>
... I'm realizing that we actually give permission to ALL devices, while I thought it would give permission for a specific device (the one i agree on in the prompt)
... the implementation consequences are minimal (at least in chrome), but for the user it s quite a shock, I was not personally aware that i was giving away that much

dom: we have to contact other groups for that discussion. e.g. web audio, HTMLMediaElement belongs to another group and so one and so forth. We need cross group coordination.

juberti: I think we need to document the attack scenario, and reach consensus at least within the group before we bring it to other groups.

dom: my perspective is that we should drealy try to spec it

juberti: how do you do it?

dom: you do a partial interface .....

<Zakim> dom, you wanted to ask where to spec this, talk about coordination with other groups

juberti: yes, that would be way more efficient

ekr: the problem i typically run into is when i am using the system microphone, with a non standard headset.

ekr .....

ekr: there are also hierarchy of devices .....

dom: next steps?

juberti: take this proposal and make it into a pull request against existing specs

dom: I would make it a spec on its own.

juberti: ok, is there a template, and where should that thing reside?

dom: i can guide you.

juberti: ok great, i know who to delegate to.
... i also think that there are a couple of questions that showed up here today and should be written as well in the document.

hta: we re still ahead of schedule

i propose a 15mn break

hta: so break until 20 past.

Next steps for Media Capture and Streams document

<inserted> scribenick: npdoty

talking about Last Call

dom: a refresher on Last Call
... assuming we get consensus to go to Last Call
... have to make a number of decisions about how that last call will happen
... have to decide the amount of time for comments. W3C Process minimum is 3 weeks, but can be longer
... review will be open to everyone, but some groups we should specifically contact
... during the time of the formal review period, need to formally track each comment, formally respond, formally seek feedback to our response

hta: a formal definition of "formal"?

dom: need to log each comment (like to the mailing list), needs to send a response, best effort to see that the comment is accepted by the commenter
... not every comment needs to be considered an issue
... some comments may repeat existing issues without raising new information
... even if the comment is not raising a new issue, need to indicate to the commenter, past discussion and arguments

burn: typically we track every comment that comes in. need to be prepared to give a proposed resolution
... eg "we already discussed this and we decided not to do this" or "clarification we'll want to do"
... need to communicate that proposed resolution to the commenter
... make your best effort to get back their acceptance or rejection of your proposed resolution
... often give a time limit, if we don't hear from you in two weeks, then we'll assume you accept our resolution
... should separately track implied vs. explicit acceptance, in order to have clarity for the transition call later

dom: have a tool for tracking comments that we might or might not use
... groups we have intersection with, groups mentioned in our charter
... first list of groups
... Webapps, TAG, Audio, HTML, WAI PF, IETF RTCWeb
... forgot for the slides, but should add the Privacy Interest Group (PING)

npdoty: thanks

dom: might ask the RTCWeb group to formally chime in
... just my suggestion, for reductions or extensions
... once we're done with Last Call comments
... either go to Candidate Recommendation (no substantive changes that requires more reviews)
... otherwise, need to go back to Last Call
... transition request to the W3C Director, including the detailed review of the comments we have received
... for commenters who don't accept the resolution, would check whether we need a Formal Objection, with a separate process
... Last Call can be a difficult period, which this group may be familiar with
... attention from groups who may not have followed all the details of your work

burn: in my experience, Last Call can effectively be first call

fluffy: do you try to get feedback from those groups before we get into the formal Last Call step?

burn: one way is to involve these groups before Last Call
... ask them ahead of time. may save you from doing a second Last Call

dom: we've had a number of interactions with TAG and WebApps
... had some early reviews from Privacy Interest Group, but doc has changed significantly

burn: met with WAI rep, indicated an area they care about a lot
... should get involved sooner rather than later

fluffy: as comments get moved to Formal Objections, who can raise those?

dom: anyone can raise a Formal Objection.

no Membership requirement, any individual or organization

dom: Formal Objection is not something done cheaply, as a social matter. requires quite detailed documentation

hta: what constitutes a Last Call comment?
... any message to the mailing list?

dom: if there's ambiguity, you can ask
... most cases it's fairly clear

burn: in some groups, could say that anything from a public list was a Last Call comment
... but now all groups are operating in public
... social issues, but that doesn't stop some people

dom: understood that WG members should not raise Last Call comments, but can
... for example, if you understand something that's new
... could have a separate mailing list for comments
... most groups just use public mailing lists

burn: for every comment, it's useful to have an email track as well as minutes. so that later you can point back to it
... track discussion of comments, not just the comment itself

dom: the tool I'm thinking of can do some of this tracking

dom: when would we go to Last Call for getUserMedia?

hta: one requirement is to close the bugs
... tomorrow we are going through the remaining bugs (8)
... and the group needs consensus to go to Last Call
... if we have wildly different opinions....

burn: time to go to Last Call is that we don't expect substantive changes (otherwise CR)
... we have a note in the document today about things we're expressly seeking feedback on
... about promise backward-compatibility navigator syntax
... and a few editorial notes in the document

hta: once we close these 8 bugs, does the group believe it's in a state where we should issue a Last Call?

fluffy: how many people have read the document in the last six months?
... read, not looked at

burn: we should not wait long at all to request review from these other groups, whether or not Last Call

dom: one of the advantages of wide review of Last Call is to limit ourselves about not wanting to make big substantive changes
... developers don't like that as much

burn: the next exclusion period for intellectual property. Last Call triggers one

mt: what should we do with changes during this time? (don't want to make changes during the Last Call review)

dom: could make partial interfaces / new specs
... or look at a new version, could be in a different branch

fluffy: should seriously read this document, because it's going to be frozen for a while

hta: where it's possible in a reasonable way to write a separate document that extends interfaces, that's preferable
... a separate question about what makes sense about integrating or keeping a separate spec

burn: if you know you have something substantial to add to this document
... then it's not really the last Last Call
... putting the community through official review steps

mt: the tension between the idea that we have a living spec

fluffy: this is not a living spec. Last Call is a sign that we're freezing it

burn: you don't typically do a Last Call unless you're really indicating that you're done with it

hta: basic conflict between publishing Rec track vs. living specs

fluffy: if we allocate ten people from this room to review this document beginning to end, would get a lot of comments
... we should do that before we issue a Last Call and get those comments from a dozen different groups

dom: goal should be a conservative approach to commenting

fluffy: we should fix the things that everyone will indicate that we fix

ekr: we should get approximate signoff from implementers, prior to Last Call
... if those people are basically happy, we can talk about going to Last Call. but if they're not, then we need to resolve those issues first

fluffy: we put out a deadline for comments twice. only two responses?
... can we get volunteers from several, separate individuals from major implementers to review?

timeless: once we have an announce list for reviews, I'll be a part of it. I would do a pass, I would do a very detailed review

timeless: or could contact some individuals like me separately

fluffy: everybody who's ever read it before has had a lot of comments. rate doesn't seem to be dropping

burn: need a full pass through of entire document

dom: specific action items?
... who volunteers?

hta: give it two weeks for comments. 15 November

<dom> ACTION: ShijunS to make full review of getUserMedia - due Nov 21 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action01]

mt: a big document. would take time, but IETF/vacation are conflicts

<dom> ACTION: martin to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action03]

burn: November and December can be a slow time for responses

<dom> ACTION: Josh to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action04]

<trackbot> Created ACTION-30 - Make full review of getusermedia [on Josh Soref - due 2014-11-28].

<dom> ACTION: juberti to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action05]

hta: will note to the mailing list that we have a few volunteers for comments by November 28th, and we're soliciting more

burn: even comments indicating that you can't understand it, is useful information

<dom> ACTION: PhilCohen to do full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action06]

dom: but we do want to finalize this thing

mt: will generate pull requests for editorial, grammatical things

fluffy: commits, can cherry pick, but grateful for any review at this point

stefanh: end of the morning agenda
... will continue in this room after lunch with #webrtc

Mediacapture bugs

fluffy: "volume" is underdefined

hta: could define as a number of decibels, which would be inconsistent wtih HTML

fluffy: but it's not that. my proposal is that it's a multiplier in a linear space
... 0 is silence. 1 is maximum volume
... a volume setting you can move up and down between 0 and 1
... could be a linear or logarithmic curve, just pick one. this is linear

hta: using a constraint as if it were a control

ekr: if you want this, why not use WebAudio?

dom: doesn't make sense as a constraint

fluffy: we had some confusion over 0.5. could remove "volume". not sure WebAudio covers all cases

ekr: is there some reason you can't do it with a filter?

fluffy: different implmenations will do it different ways

mt: isolated streams is an example

fluffy: maybe we shouldn't re-open whether to have volume or not. only proposed change is explaining the meaning of 0.5

hta: let's integrate this change and close this bug

burn: a clarification, not a change to the requirements for it.
... if we all agree

mt: some encouragement will be provided. it's probably a bad idea to do it over an unauthenticated origin

hta: will assign that to ekr

npdoty: should be clear about whether the requirement on stored permissions is normative

ekr: it should be normative, as it is in IETF

npdoty: and that stored permissions section would be a good place for the additional encouragement, and should use a better definition for "secure origin"
... may follow up in email

jib: constrainable pattern, which is abstract. and specific use in getUserMedia
... specific use doesn't need to be abstract. should say exactly what is returned
... reuse the existing MediaTrackContraintSet dictionary, which may be added to in the future
... a second dictionary, a subset of the capability set
... hopefully I get back success and get back values
... capabilities is a superset of constraints which is a superset of settings
... pull request illuminates that the datatypes are related
... should we write two more dictionaries (enumerating the same keys), or should we just re-use the same type?
... re-use the same type because capabilities are exactly the same structure (based on the prose)

burn: IDL, we don't say that, that would be a change to the document

jib: we could use a narrower data type for the returned set, but it could easily be the same data type

mt: no content-accessible type information are available
... maybe it should return an array of strings rather than a dictionary anyway
... don't mind about the difference between capabilities and constraints. tough for spec authors and implementers, but oh well
... JavaScript more natural to use an array, with indexOf

jib: could be a fourth use of the dictionary. return a dictionary that you can enumerate, all the keys you find in there are supported
... UA puts in some truthy, an object

burn: trying to remember why we did it this way

fluffy, where are you?

burn: don't want to put the same defined type for all those different returns
... because they're not the same return

jib: X, Y and Z are different things, even if they're the same type
... we need more specific text. either this pull request with using the same dictionary, or we define more specific dictionaries

hta: separate discussion of getSupportedConstraints
... capabilities, you might want to look at the value, modify it slightly and then send it back to the browser

burn: even if you want them to be almost the same data structure, I'd rather see different names for them

jib: different names, same type
... argument type, argument name

dom: developers are not likely to read the spec

mt: something we typically leave to editorial discretion
... if they can address it in some way, leave it up to them
... we will review the outcome and ensure it's not crazy
... acceptable?

jib: fine. but want to specify something, not just abstract types

burn: I hear you.
... we already have the prose for it, but now have the IDL

fluffy: legal syntax is different

dom: has anyone started implementing?

jib: hoping not to make any functional changes at this point

hta: WG position is to leave to editorial discretion

dom: WebIDL must be valid

fluffy: editors please bring us a proposal

[adjourned for lunch.]

re-convene at 1pm

Media Capture Depth Stream Extensions

<dom> Media Capture Depth Stream Extensions specification

<anssik> https://docs.google.com/presentation/d/1mwlD8H_RzlB2JheyjqXxa7sMSMTN8x96VgSzjy5B4pc/view

<dom> Anssi's slides

<dom> ScribeNick: dom

Anssi: I'm Anssi Koitianen from Intel, Ningxin Hu from Intel, and Rob Manson (Invited Expert)
... we discussed the idea of bringing 3D camera to the Web last year at TPAC
... I remember polling for interest back then
... lots has happened since then
... we collected use cases, played with the spec and wrote code
... we will be summarizing this
... [slide 2]
... The spec is about making 3D camera 1st-class citizen of the Web platform
... up to now, these have required special plugins
... the native platforms have these capabilities

<hta> Stefan is running the slides (so you don't have to say "stefan or someone")

Anssi: the approach we've taken is to integrate with existing APIs as much as possible
... reusing primitives rather than inviting new APIs
... this means relying on getUserMedia, Canvas 2D, WebGL
... if you attended the symposium on Wednesday, you saw a live demonstration on stage
... TimBL mentioned it as exciting :)
... [slide 3]
... Current status: we started with use cases and requirements — thanks for the contributions!
... it took 2 to 3 months to make sure we had a solid set of requirements
... over the summer, we started drafting the specification and published as a FPWD two weeks ago
... parallel to this work, Ninxgin has been working on an experimental implementation which was used on stage on Wednesday
... the code is available

Ningxin: the build is available on Windows; the source code is also available

Anssi: the references are given on the last slide
... [slide 4]
... Regarding use cases: some of them are obvious, like video games (e.g fruit ninja with your hands)
... 3D object scanning: measure a sofa by pointing at it
... video conferencing — it would let you remove the background; or make the experience more immersive
... lots of use cases in augmented reality too
... Rob, maybe you want to expand with your favorite AR

Rob: you can add virtual objects behind real objects
... all AR could be improved with depth tracking

Anssi: this is only scratching the surface — there are lots of other use cases
... I think it's as significant as bringing the RGB stream to the Web, with lots of potential
... [slide 5]
... This summarizes our IDL interfaces
... not all of them are complete yet
... but this is our current view of what needs to be done
... we're very open to feedback on this
... we've already received good feedback from canvas implementors — we'll adjustment based on this
... I won't go on the details — look at the spec for that
... DepthData is the data structure that holds the depth map
... CameraParameters, soon to be renamed CameraIntrisics
... it's associated with the DepthData
... it represents the mathematical relationships between the 3D space and its projection in the image plane

<anssik> http://en.wikipedia.org/wiki/Pinhole_camera_model

Anssi: it's the minimal data required for the pinhole camera model
... these are the two only new interfaces we're adding; the rest are extensions to existing interfaces

<juberti> please, please, can we add getTracks(kind) instead of getDepthTracks

Anssi: We add a boolean flag to MediaStreamConstraints ; similar to the audio and video boolean

Martin: are you planning on having constraints for these devices?

anssi: we've chosen to wait for the constraints discussion to stabilize

martin: I think we're stable enough; we would need your input on what constraints would be needed in this space

Rob: in a lot of ways, the constraints can be very similar to the video constraints (e.g. minimal range for width and height)

it's also related to CameraIntrinsics - but they'll largely just be read only Settings and Capabilities

anssi: we're still looking at this
... the group sounds to be open for us to propose new constraints
... thanks for that feedback
... we will take care of that aspect
... the next interface we're extending, we add getDepthTracks() which returns a sequence of depth track

<robman> +1 to getTracksKind() or a more generic idea

dom: justin noted he would prefer to have a generic getTracks(kind) instead of the specific getDepthTracks

anssi: noted; we'll look at this too
... Next interface is adding the "depth" kind attribute

<juberti> this of course would be generic and obsolete getAudioTracks and getVideoTracks

anssi: In addition to extending these getUserMedia interfaces, we have also additional APIs on the Canvas Context API
... similar to the imagedata apis
... we're having discussions with the canvas editors
... [Usage example slide]
... this is copy-pasted from the use cases doc
... this shows how easy it is for someone familiar with getUserMedia to use that API
... [next steps slide]
... we're engaging with the Khronos folks for a minor extension to WebGL to be able to pipe data to the WebGL context

Ningxin: we are proposing a small extension to the WebGL extension called WEBGL_texture_from_depth_video
... with that extension, Web app developers need to know whether they can upload a video element representing a depth stream to WebGL
... using shaders
... with this extension, it defines circumstances under which an HTML video element with depth data can be uploaded there
... we will define the format of the texture
... this is a proposal against WebGL 1.0
... if WebGL2.0 comes, we will update the texture format to match
... DepthData as unsigned short is to be as close as possible to the native representation of the depth stream
... (which is what most 3D cameras give)
... so as to limit CPU processing as much possible, and leave as much as the GPU parallelism as possible

Anssi: we've talked with Dom with regard to the collaboration with Khronos
... we're currently working on an informal technical basis
... we'll keep both groups updated when we make progress on either side
... that's our model for operation
... Khronos has an established model for WebGL extensions; there are tens of extensions that are widely implemented
... ningxin and Rob are the ones watching this space most closely
... the other part of our work is to address open issues
... we use github issue tracker to track open issues
... that's the place to go to if you want to open new issues
... the slide shows the list of currently identified issues
... the highest priority items should be resolved before we publish a heartbeat wd

<robman> NOTE: a range of these issues are likely to be resolved as part of the update to use the ConstrainablePattern

bernard: the issue list talks about transmission via WebRTC peerconnection
... that would mean changes to WebRTC 1.0?

<scribe> ... new codecs?

peter: what happens if you @@@

ningxin: this is still under discussion
... we're looking at an extension to H264 to support 3D TV to carry the depth data besides the RGB data in the stream
... there are already several extensions in the codec space to do that
... there is also an extension to SDP to describe that kind of media
... we're looking at all these to see if we can support that transmission

peter: with regard to PeerConnection, it's critical to determine if it's a separate track or part of the same codec

Rob: our proposal is that it's a different track, that looks like a video track

peter: but that requires different RTP packets

Shijun: the codec extension defines a different bitstream from the video
... I was the first proposer for stereo video coding for H264 10 years ago
... I'm working on this at Microsoft still
... it's a fun project, but I'm not sure it's ready for prime time

Anssi: it's good we have the right people in the room — we'll continue the discussion on the mailing list

Martin: we need to understand what adding a depth track to a peerconnection means
... this has impact on many things

bernard: unless the codec supports this, you simply won't get anything

stefanh: we can extend the timeslot a bit for having this discussion

anssi: is everyone here active in the task force? we would like to keep you in the loop and we would appreciate your continued contributions?
... we're currently at this phase where we're just getting more and more people to look at our work and giving feedback
... we appreciate feedback from people interested in this technology and with the right background?

peter: can we do getTracks(kind) instead of getAudioTracks / getVideoTracks?

hta: we already have that

martin: let's just not add kind-specific tracks any more

Shijun: for any stereo related topic, it would be useful to check with other groups on whether stereo videos in video tags
... if we don't have any surface to render a 3D video, what would we do with these streams?
... (even if there are other use cases without that)

anssi: note that webrtc is not required to make use of this; the same way getUserMedia is used well beyond WebRTC

shijun: I'm not saying don't do stereo video capture
... but whether we want to make that transmissible via WebRTC is another question

ningxin: regarding the 3D video question, our proposal makes it possible to use the depth stream independently
... 3D cameras can capture the depth stream without the RGB stream
... e.g. for hand gesture detection

Martin: if we have both video and depth with different constraints, what would that do the cameras?

Rob: we need to calibrate the cameras
... but otherwise, the constraints should apply to the both at the same time
... for a calibrated stream with the two together, you should consider them as a single source

Shijun: these are two sensors with different ids
... synchronizing the signals across these is quite challenging
... delays can induce headaches

martin: if you request a mediastream with both video and depth, you get back a single mediastream
... which by definition are kept in synchrony

rob: asking for both depth and video gets you a calibrated stream

dom: the WebRTC story is far from being done
... but looking at it will be a good test of the extensibility of the WebRTC API

hta: the depthinput kind needs to be added to the enum of type of devices in enumerateDevices()
... (I don't want to think of depthoutput quite yet)

martin: let's get the non-WebRTC stuff done; the WebRTC interactions are a whole new enterprise
... we should scope the work to reflect that

anssi: makes sense
... thanks for the great feedback
... [demos]
... [magic xylophone demo]

ningxin: this is based on a demo that was done with a simple RGB stream analysis
... trying to detect the movements by analysing the RGB stream
... I modified that to add depth data
... there is a video representing the depth data
... js-based hand recognition is based on js-handtracking
... also originally based on RGB data; we updated it to use depth data
... it's more accurate, more stable and more performant
... we can extract the background and apply the recognition algorithm only on the foreground objects, reducing computation
... because depth camera are infra-red based, they can be used in low-illumination context

anssi: [fruit ninja demo]

ningxin: the idea is similar; still based on js-handtracking library
... originally, this is done with a mouse or a touch screen
... here we integrate this with finger gestures
... you can also see the integration of the depth image that we composite above the background and beyond the foreground
... this is done via WebGL via shaders
... that demonstrates depth rendering with WebGL still in 2D space

anssi: [RGB+Depth with WebGL]

ningxin: we initiate a request with depth only where the user is seen only as data
... then we request also RGB data
... main idea is here to use RGB texture and depth for the positioning

anssi: hopefully this gave a good idea of what this technology is about
... please give feedback, and let's discuss this further on the mailing list or in github, etc
... the spec has still lots of room for changes
... we have ongoing discussions with the canvas folks
... if you're active also in the HTML WG, the public-canvas-api mailing list is where this is discussed

stefanh: thank you anssi!

<robman> excellent discussion and feedback - thank you everyone

stefanh: no further questions from the room

hta: I think this work is great; if scoped adequately (i.e. getusermedia stuff first), this will be useful in many contexts

<robman> 8)

hta: I'm glad you brought that here!

Screensharing

martin: this discussion will be around directions and guidance in this space
... we want to create media stream from stuff on the screen
... [slide 3] this is what we are looking at for the API — an additional source property to the mediastreamconstraints
... that source could apply for non-video cases (e.g. get the audio from a given app)
... [slide 5] we're down with terrible names, but not sure what to do
... "monitor" is also used in the context of audio

burn: I like "system"; it works well for audio, video, etc

martin: only issue is that system sounds global, where we would want for some sub-part (e.g. a window)

dom: we could hook the API on another end point than navigator.mediaDevices

martin: jib suggested the same

pthatcher: another suggestion would be to enumerate "system" devices to which you could then constraint your getUserMedia call

martin: that doesn't quite fit with the security model where we want the user to be entirely in control of what is shared
... we want to distinguish the sources that are enumerable and the ones that aren't
... screen sharing would fit in the latter case

shijun: the browser is in a better position also to pre-select sources too

martin: the UX mock up our UI guys have done comes with a doorhanger where the user can pick which windows to share (with no default selection)
... none of this is exposed to the app until it is agreed by the user
... I don't think that's a problem that is too troubling here
... but we need a common taxonomy to have these discussions
... I'm gravitating towards "monitor", "window", and "browser" (without distinguish tab or window)

<robman> display?

peter: window doesn't really apply well across mobile / desktop
... leaving it a bit generic is valuable

martin: "system" is too generic because my microphone is part of system

burn: maybe getOutputMedia?

peter: (or alternatively 'source: output')

martin: I'll go with one of these
... for application, our Cisco friends who have experience in this area have shared that "application" doesn't work very well for users
... someone decides to share a power point presentation
... they choose to share the whole PPT application rather than just the single presentation they want to show
... and leak information without realizing
... so the suggestion was to simplify the interface and leave it to the browser to determine how to e.g. composite several window in one if they feel that's adequate
... this would be a new enum constraint
... we distinguish browser windows from other app windows for same origin protection
... breaking that isolation is too potential scary to be just dismissed via a getUserMedia prompt

dom: for "browser" filtering, would you also identify other instances of the current browser? of other browsers?

martin: typically, only the current instance since that's the only one the attacker is potentially in control of
... doing that with other browsers require a lot more difficult social engineering
... it's a bit more far stretched for other browsers, but not completely unthinkable either
... we may want to filter other browsers out
... clearly sharing another browser window is something we want to enable without too much work for the user
... we're currently whitelisting, but want to have a more generic approach
... but we're still trying to figure how to get a clear signal from the user that they trust enough the site to share that other site

bernard: there are other risks we're not protecting against (e.g. an app revealing the user of the password)

martin: users are likely to understand this
... but users probably don't understand the cross-origin risks where we need to protect a site from another site

alexandre: what's the impact on the API?
... independently of the UI impact

martin: it doesn't have impact on the API
... there will be a number of sources that can produce screen sharing
... and access to any one of those will be what the browser allows combined with the consent mechanism

shijun: for screen sharing, should we isolate by default?

martin: it doesn't solve the problem - the attacker could simply sends the screen sharing stream to itself

shijun: I think there won't be a unique solution; but I think isolated streams might be part of the solution

dom: what if one could only start screen sharing after a peerIdentity-certified connection has already been established?

martin: I'm not convinced this would really help

dom: but I guess this shows that not all policy decisions can be API-neutral

martin: right... Will need to think more about this, please send more ideans and suggestions
... [slide 6]
... some platforms offer distinction between logical and visible windows

shijun: on some platforms, content that is not visible is not fully rendered

martin: screen sharing guys have tricks to circumvent this

alex: we need to make sure that the security considerations don't break interop and make it horrible to develop for

hta: what are our next steps?

martin: we've been working on this informally so far
... there is a w3c github repo but that's as much as formality as we got
... we probably need to decide whether this group wants to take it on

hta: so the next step should be to present a draft to the group and push it to FPWD

martin: ok, let's proceed with that; I'll make a few updates on terminology before we get that reviewed

hta: rough plan would to achieve consensus on FPWD by week after IETF
... is there rough consensus here we should proceed with that?

shijun: agreed

[lots of heads nodding]

getUserMedia Testing

Dom's slides

<stefanh> slides shown on the getUserMedia test suite

<stefanh> (webrtc test suite non-exitstent right now)

Generating MediaStream from HTMLMediaElement

https://github.com/dontcallmedom/mediacapture-fromelement

Media Capture F2F (TPAC2014)

30-31 Oct 2014

Attendees

Contents