See also: IRC log
<inserted> ScribeNick: dom
hta: this is the media capture
task force meeting as part of the WebRTC WG F2F
... We're hoping to be at the stage where we stop making big
changes to the document
... we need to review a last set of changes before we're sure
we're done
<stefanh> mom: http://lists.w3.org/Archives/Public/public-media-capture/2014Oct/0186.html
stefanh: minutes approved
stefanh: this topic has been
discussed in the task force before
... it's been quite a heated debate
hta: we have a TAG representative on the phone
domenic: that would be me
... work for Google, been elected to the TAG ~ a year ago
hta: we wanted someone from the TAG to get some keywords on why imposing that restriction might be a good idea
domenic: the TAG has made a
statement that we should move sensitive APIs to authenticated
origins over time
... i.e. a deprecation plan for using getUserMedia on plain
http
ekr: I don't find that analysis
is uncompelling
... the attacker can detour you from an unsecure origin
domenic: a more interesting
technical question is whether this provides protection to the
user
... we've seen that it does
<alexG> domenic: we think authenticated origin provide more protection against proven attacks.
<scribe> scribenick: alexG
ekr: supposing that https is
enough even if you connect to an attacker site is not going to
work.
... the problem is not asking GUM access over https
... the problem is to know when you can trust, wether it is
http / https
dominic: at least with https we can have alittl ebit more control and add red flags in the address bar.
<npdoty> it sounds like ekr is arguing that the user doesn't have any reason to trust the origin on http-only, but that it should be allowed in that case anyway
ekr: what is the way forward with this questions in the absence of specific use case one way or another?
domenic: i just wanted to convey the position of TAG, thank you
hta: this is a good overview of the disagreements today
domenic: .... The specs should be consistent or it is not going to work. We should make things better for user down the path. nobody is proposing anything crazy, really.
hta: this is actually how the first message came across. that might be the reason of the ... strength .... of the reaction
domenic: i m happy i clarified this thing then.
adamR: justin, do you have comment on this conversation?
<npdoty> it sounded to me like TAG was not suggesting a "flag day"
justin: chrome would like to move to more https in the future, but breaking existing content is not ok. We should have a multi year deprecation process, but having a flag day today is not going to happen.
ekr: couldn t we move ahead with GUM like it is today.
dom: i heard your point, ad i find it compeling
today, GUM is working on any origin, and there dis no compelling reason that we see to move away from that today.
domenic: the worse outcome would be for users to see that you guys are putting together specs without making sure we re looking forward the future, and just saying that s how we do it today, and so be it. We would like you guys to have a future direction statement.
ekr: what about a non normative note.
hta: what about a statement like " a conform ant browser MAY decide to make this available over https only."
<npdoty> that doesn't lend itself to interoperability
ekr: i thought it was the car, but if it s not, i d be happy to do it."
hta: let s make it the case.
matthew: source selection
<npdoty> +1
domenic: someone brought the interop issue. one problem would be if one browser would work only over https, and another one would not, then call would not be established.
ekr: can we stop saying that
people that don t want to go for https don t care about the
users.
... we have disagreement here, and let's agree to
disagree.
... can we state that everybody wants to do what is right for
the user, we just disagree about how to do it?
domenic: ok
matthew: i was observing that the
GUM API as it stands might not have this property
... but we are having other things in the spec that potentially
change the profile, hum, usage of this thing.
getuserdevices has the possibility to enumerate devices and expose some things. it does not change the security profile, but give us the capacity to expose more or less informations that in turn influence the user decision
martin: is there an opportunity there to use this ?
ekr: my bad :)
dom: we have a rough
agreement
... that non normative note is good
domenic: yes, and i encourage that note to encourage https
hta: any volunteer to draft this note
ekr: i suggest that justin does it
hta: ekr has volunteered to do
it, and requested help from justin.
... domenic thank you for having showed up, and enabled this
discussion
dom: we re happy for you to stay of course if you want
domenic: ok, good luck guys
jib: presenting
mediastreamtrack.ended
... i m just going to show the problem
... present a solution
... and then we can speak about it
... JS arrow functions as an example
... <slide 3>
I'm also going to use promises
jib: here is background info and links
burn: and it will also be part of the specs
jib: <slide 4>
... we have this ended event
... the only thing it tells you is that the track has
ended.
... two kind of problems
... call could have hanged up or dropped (which one?)
... GUM could have stop capturing, have had permission problem,
or a driver issue (which one?)
... <slide 5>
... so I propose an ended promise
... allows to differentiate between two cases: success and
failure
... consistent with usage of promises for state change
ekr: why not a "started"
equivalent?
... my point is that not all those events have state
changes
jib: i m not proposing to replace the existing ones, i just want to show another way to get the errors and differentiate between several "types" of ended tracks
hta: history of the problem: can we tell the difference between a track that ended between an error or not
ekr: my concern is API consistency
burn: you did not say you suggested we should remove the original one
ekr: that s then even worse if we don t remove: we have two APIs for the same things, and I don t know when to use which
ShijunSun(MS):
is it the right way to handle all the errors, from different objects
jib: let s get to the second example
<slide 7>
jib: in this example, I don't
care if I succeeded, it is just showing the different
syntaxes
... you can do a switch
... i did a pull request where I pull all the existing errors
and show how this would look like.
... here the error can happen upfront, or later on
... you just don t want to end up not catching an error
... and there has been no other proposal that does all that so
far.
juberti: i wouldn t like to use the excuse of having promises for GUM to use them everywhere else.
ekr: i agree with justin
jib: does seem like ......
shijunshun: +1
stefan: do we have the need for this?
ekr/juberti: yes
jib: you need to ale *some changes*
ekr/justin: yes
jib: then why not using prmises?
ekr: why not in this case, but not all events have that need, and we should note use promises everywhere because they are good for GUM.
jib: I hear the consistency
argument
... there dis another pattern
... we should use the best language that solves the problem
ekr: i do not thing promises is that language
adam: events happen more than once
jib: ended only happens once
adam: yes, but for other events, it does not stand and promises should not be used.
<Domenic> promises for state transitions that happen only once and that people might be interested in after the fact are strictly better tha nevents
<Domenic> events are not appropriate for that case
ekr: it can t be all promises
<Domenic> and we have only used them because historically they were all we had
juberti: we changed from having a consistent use of callbacks, and now we would have some promises and some callbacks, and I don t like that as it does not bring such added value.
<dom> Domenic, the argument that is being made is that mixing events and promises for handling events makes for a confusing API
<Domenic> you have to be looking for consistency among things that are alike
burn: we spend a lot of time defining which one should be event and which one should be callbacks
<Domenic> things that can jhappen more than once and things that happen once are not alike
<dom> well, both describe the object state machine
burn: and we spend a lot of time making sure that programmers could almost guess from the pattern when one should be used.
dom: we need a tech proposal.
<Domenic> dom: that's fair
ek: right, i m happy to write a proposal
hta: there is such a proposal in the bug that triggered that proposal
er: even better, i ll do nothing!
hta: there seems to be a rough consensus that we should extend events and not use promises.
jib: any other questions on
this?
... thank you
hta: we are now pretty ahead of schedule
hta: let's have the audio output device enumeration discussion
<dom> Justin's slides
juberti: in addition to having
enumeration of input device, we also have the same feature for
OUTPUT devices but we have no way to access it.
... why would we do that? #1 requested feature, before
screensharing and others.
... usage scenario
... changing to usb or bluetooth headset
... right now, haven to change the system settings
<npdoty> does "in chrome" mean "in the web page, not in browser chrome"?
<slide 4>
juberti: no API for setting
devices.
... we have a way to enumerate them, but no way to SET them
<npdoty> why do we even have a way to enumerate output devices?
<dom> npdoty, this idea was to enable this use case
<dom> (even though we've been missing the last piece of that puzzle)
juberti: you want to avoid a few
use case where arbitrary webpage cannot play content on your
audio without user consent.
... a prompt would not be practical
... <slide 5>
... for any mediaElement (<audio. or <video>) would
have an ID
... by default, set to empty, and use default output (always
OK, today;s case)
... specific devices could deb set (unsung the enum API info)
is application is authorized. web audio could also use
it.
... <slide 7>
... most cases you could use the same group IS for input /
output devices.
... for other apps that would need finer granularity, there
would be another way of doing this.
burn: permission is then for the group by default?
juberti: exactly.
dan: would that show all permutations in the grouping? How do you define the grouping?
juberti: that s for composite device, they have the same device ID / group ID.
ekr: what would be the lifetime of the permission ?
juberti: same as GUM.
ekr: as long as origin is tagged, the permission stays.
martin: if you have persistent permission, it means yo have access to all device at any time.
juberti: yes, if you have access to all INPUT devices, and all OUTPUT devices are grouped with input device, that s true.
martin: I think we should make it explicit.
juberti: the coupling is quite elegant, and better than just using input devices.
martin: we don t need more prompt
<npdoty> indeed. why are we pushing this onto the page?
adam: even if I , as an application, have access to all the input device, i might not have access to all output devices?
<dom> for proper UX, npdoty (it's hard to build a nice user experience when everything is pushed to the browser chrome)
juberti: you can already
enumerate all of them, you just can t use output device.
... you know by using system permission, y ou already have
practical access to all devices.
<npdoty> dom, the group thinks proper UX is more likely to happen if we distribute it out to every different web developer in the world
<npdoty> ?
juberti: i think that 99% will either use the default setting, or ONE specific coupling they will give permission to.
shijunshun: how to handle one the fly plugging in or plugging out devices
juberti: not sure yet.
<dom> npdoty, I think that's a correct characterization, yes
martin: ....
shijunshun: we have the notion of a default device, if anything is plugged in, the headphone has priority, and we fallback to default automatically. Now, it seems that webrtc would be a regression from what we propose today.
matrin: i know how to solve that problem i think
martin: there would be physical and logical devices
by using logical devices, then we can switch on the fly between physical devices.
shijunshun: does not have to be in the OS, could deb in IE.
juberti: enumartion API should preset those so the user know which one to choose from
shijunshun: iframe might have different settings, so we have to be careful.
juberti: things working out of iframe would be an issue anyway, if only for PeerConnection.
shijunshun: my comment was more about the scope. do we want it restricted? do we want all page to control, including iframe, kind of overloading iframe settings?
hta: about usage,
... earlier in the week i was in the audio group
... and they are very interesting in using the same
mechanism.
shujunshun: great, let's make sure the use case are all written.
burn: let s say you are in an
iframe
... you can only set a device as output if you have permission
to do, even though you could see it in the enum
juberti: well not exactly, i think we can enumerate all, but you only get access from grouping.
<Zakim> npdoty, you wanted to ask if the cases are so often coupled, why does it need to be exposed at all?
<someone from W3C> your assumption seems to be that the coupling is very frequent. It does not seems it need to be enumerated. and you re also adding a whole list of permission dialogs.
juberti: this avoids permission
dialog
... having a generic API ......
... the API we have here announce that abstraction, but
underneath we have to deal with another layer ...
... we have to deal with the cases where input and output are
not a unique physical device.
npdoty: if the browser does not handle the setting, will the website allow me to do it.
martin: the site might have many
things he wants to do at different point in time
... if you play music, you might want to keep rendering that
music on the same device.
... but when you have a site that simultaneous plays music and
communication
... you don t really have today the flexibility to handle the
user experience the way you want
phil: many output devices in our case are not "grouped" with input device
and it s very important for us that the app should be able to use different devices.
phil: another use case: my son is listening to radio with headset, while i m watching a movie locally on my coputer
computer
juberti: we did an app that is
media mixing, kind of garage band. There is no input for app.
If we are saying that permission only from GUM, ....
... if you use a real pro audio app, you understand already the
notion of door hanger
<npdoty> it sounds to me like we're suggesting that every page can choose non-coupled input/output (or maybe it won't have implemented it), which will cause more permission dialogs, but the user can also choose it separately on the browser
juberti: for most of web users, the permission is much simpler
<npdoty> and if the user sets it in their browser first and then the site wants to change it?
juberti: but this API also gives us the capacity to use door hanger for more professional apps.
<npdoty> and if the user asks where they're supposed to configure audio output? in the site or in the browser? or in the site first but maybe overridden in the browser?
ekr: do I understand that the goal is to allow a website to minimize the number of prompts for the most generic cases
juberti: yes
... 90% would be: use the default or use that one specific set
of devices.
fluffy:
<ekr> I use the system microphone and the headset for sound
fluffy: the app would enumerate
the devices
... the app would then ask permission for a specific
device
... the door hanger would then kick in, and the app would get
access?
juberti: yes, or you have given persistent permission to that device beforehand and the door hanger would not even be needed
martin: is there a need for
labels for groups ....?
... you said default and default group
juberti: ....
martin: the use case you mentioned is only for app that are already using this API
juberti: yes
martin: then they should be aware of this problem, and have an UI, and so on
<npdoty> mt, because you think developers are unlikely to make mistakes about edge cases of hardware?
juberti: well, yes, but they could still make a bad choice. generally, the complexity is not transferred to the app.
dan: ... would it be good to be able to select input/output only to simplify the list ?
juberti: practicalities make it something we don t want.
phil: is there a way that JS know in advance which permission it has access to?
juberti: yes
phil: some devices are also accessible, how do we populate the drop down with that?
juberti: good point
... how do we do for output device, what we do with the input
device? .... that s a good question, i need to think about
that.
phil: enumarate device might prompt once for allowing ALL devices to be used. so the enumerate API also allow them in one step.
juberti: yes , could do that, but it would be difficult to understand by users.
martin:
<npdoty> it would be a new permission model to say you get permission to things that are less egregious than any permissions you've already granted.
juberti/martin: discussion about how to do it right.
phil: just to clarify, i just want a way for the user to enable all the output device.
juberti: we might b something new to enable what you propose
burn: the persistent permission
implies access to all input devices
... and that surprises me
... <reading specs>
... I'm realizing that we actually give permission to ALL
devices, while I thought it would give permission for a
specific device (the one i agree on in the prompt)
... the implementation consequences are minimal (at least in
chrome), but for the user it s quite a shock, I was not
personally aware that i was giving away that much
dom: we have to contact other groups for that discussion. e.g. web audio, HTMLMediaElement belongs to another group and so one and so forth. We need cross group coordination.
juberti: I think we need to document the attack scenario, and reach consensus at least within the group before we bring it to other groups.
dom: my perspective is that we should drealy try to spec it
juberti: how do you do it?
dom: you do a partial interface .....
<Zakim> dom, you wanted to ask where to spec this, talk about coordination with other groups
juberti: yes, that would be way more efficient
ekr: the problem i typically run into is when i am using the system microphone, with a non standard headset.
ekr .....
ekr: there are also hierarchy of devices .....
dom: next steps?
juberti: take this proposal and make it into a pull request against existing specs
dom: I would make it a spec on its own.
juberti: ok, is there a template, and where should that thing reside?
dom: i can guide you.
juberti: ok great, i know who to
delegate to.
... i also think that there are a couple of questions that
showed up here today and should be written as well in the
document.
hta: we re still ahead of schedule
i propose a 15mn break
hta: so break until 20 past.
<inserted> scribenick: npdoty
talking about Last Call
dom: a refresher on Last
Call
... assuming we get consensus to go to Last Call
... have to make a number of decisions about how that last call
will happen
... have to decide the amount of time for comments. W3C Process
minimum is 3 weeks, but can be longer
... review will be open to everyone, but some groups we should
specifically contact
... during the time of the formal review period, need to
formally track each comment, formally respond, formally seek
feedback to our response
hta: a formal definition of "formal"?
dom: need to log each comment
(like to the mailing list), needs to send a response, best
effort to see that the comment is accepted by the
commenter
... not every comment needs to be considered an issue
... some comments may repeat existing issues without raising
new information
... even if the comment is not raising a new issue, need to
indicate to the commenter, past discussion and arguments
burn: typically we track every
comment that comes in. need to be prepared to give a proposed
resolution
... eg "we already discussed this and we decided not to do
this" or "clarification we'll want to do"
... need to communicate that proposed resolution to the
commenter
... make your best effort to get back their acceptance or
rejection of your proposed resolution
... often give a time limit, if we don't hear from you in two
weeks, then we'll assume you accept our resolution
... should separately track implied vs. explicit acceptance, in
order to have clarity for the transition call later
dom: have a tool for tracking
comments that we might or might not use
... groups we have intersection with, groups mentioned in our
charter
... first list of groups
... Webapps, TAG, Audio, HTML, WAI PF, IETF RTCWeb
... forgot for the slides, but should add the Privacy Interest
Group (PING)
npdoty: thanks
dom: might ask the RTCWeb group
to formally chime in
... just my suggestion, for reductions or extensions
... once we're done with Last Call comments
... either go to Candidate Recommendation (no substantive
changes that requires more reviews)
... otherwise, need to go back to Last Call
... transition request to the W3C Director, including the
detailed review of the comments we have received
... for commenters who don't accept the resolution, would check
whether we need a Formal Objection, with a separate
process
... Last Call can be a difficult period, which this group may
be familiar with
... attention from groups who may not have followed all the
details of your work
burn: in my experience, Last Call can effectively be first call
fluffy: do you try to get feedback from those groups before we get into the formal Last Call step?
burn: one way is to involve these
groups before Last Call
... ask them ahead of time. may save you from doing a second
Last Call
dom: we've had a number of
interactions with TAG and WebApps
... had some early reviews from Privacy Interest Group, but doc
has changed significantly
burn: met with WAI rep, indicated
an area they care about a lot
... should get involved sooner rather than later
fluffy: as comments get moved to Formal Objections, who can raise those?
dom: anyone can raise a Formal Objection.
no Membership requirement, any individual or organization
dom: Formal Objection is not something done cheaply, as a social matter. requires quite detailed documentation
hta: what constitutes a Last Call
comment?
... any message to the mailing list?
dom: if there's ambiguity, you
can ask
... most cases it's fairly clear
burn: in some groups, could say
that anything from a public list was a Last Call comment
... but now all groups are operating in public
... social issues, but that doesn't stop some people
dom: understood that WG members
should not raise Last Call comments, but can
... for example, if you understand something that's new
... could have a separate mailing list for comments
... most groups just use public mailing lists
burn: for every comment, it's
useful to have an email track as well as minutes. so that later
you can point back to it
... track discussion of comments, not just the comment
itself
dom: the tool I'm thinking of can do some of this tracking
<mic noise>
dom: when would we go to Last Call for getUserMedia?
hta: one requirement is to close
the bugs
... tomorrow we are going through the remaining bugs (8)
... and the group needs consensus to go to Last Call
... if we have wildly different opinions....
burn: time to go to Last Call is
that we don't expect substantive changes (otherwise CR)
... we have a note in the document today about things we're
expressly seeking feedback on
... about promise backward-compatibility navigator syntax
... and a few editorial notes in the document
hta: once we close these 8 bugs, does the group believe it's in a state where we should issue a Last Call?
fluffy: how many people have read
the document in the last six months?
... read, not looked at
burn: we should not wait long at all to request review from these other groups, whether or not Last Call
dom: one of the advantages of
wide review of Last Call is to limit ourselves about not
wanting to make big substantive changes
... developers don't like that as much
burn: the next exclusion period for intellectual property. Last Call triggers one
mt: what should we do with changes during this time? (don't want to make changes during the Last Call review)
dom: could make partial
interfaces / new specs
... or look at a new version, could be in a different
branch
fluffy: should seriously read this document, because it's going to be frozen for a while
hta: where it's possible in a
reasonable way to write a separate document that extends
interfaces, that's preferable
... a separate question about what makes sense about
integrating or keeping a separate spec
burn: if you know you have
something substantial to add to this document
... then it's not really the last Last Call
... putting the community through official review steps
mt: the tension between the idea that we have a living spec
fluffy: this is not a living spec. Last Call is a sign that we're freezing it
burn: you don't typically do a Last Call unless you're really indicating that you're done with it
hta: basic conflict between publishing Rec track vs. living specs
fluffy: if we allocate ten people
from this room to review this document beginning to end, would
get a lot of comments
... we should do that before we issue a Last Call and get those
comments from a dozen different groups
dom: goal should be a conservative approach to commenting
fluffy: we should fix the things that everyone will indicate that we fix
ekr: we should get approximate
signoff from implementers, prior to Last Call
... if those people are basically happy, we can talk about
going to Last Call. but if they're not, then we need to resolve
those issues first
fluffy: we put out a deadline for
comments twice. only two responses?
... can we get volunteers from several, separate individuals
from major implementers to review?
timeless: once we have an announce list for reviews, I'll be a part of it. I would do a pass, I would do a very detailed review
timeless: or could contact some individuals like me separately
fluffy: everybody who's ever read it before has had a lot of comments. rate doesn't seem to be dropping
burn: need a full pass through of entire document
dom: specific action items?
... who volunteers?
hta: give it two weeks for comments. 15 November
<dom> ACTION: ShijunS to make full review of getUserMedia - due Nov 21 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action01]
mt: a big document. would take time, but IETF/vacation are conflicts
<dom> ACTION: martin to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action03]
burn: November and December can be a slow time for responses
<dom> ACTION: Josh to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action04]
<trackbot> Created ACTION-30 - Make full review of getusermedia [on Josh Soref - due 2014-11-28].
<dom> ACTION: juberti to make full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action05]
hta: will note to the mailing list that we have a few volunteers for comments by November 28th, and we're soliciting more
burn: even comments indicating that you can't understand it, is useful information
<dom> ACTION: PhilCohen to do full review of getUserMedia - due Nov 28 [recorded in http://www.w3.org/2014/10/30-mediacap-minutes.html#action06]
dom: but we do want to finalize this thing
mt: will generate pull requests for editorial, grammatical things
fluffy: commits, can cherry pick, but grateful for any review at this point
stefanh: end of the morning
agenda
... will continue in this room after lunch with #webrtc
<agenda discussion>
fluffy: "volume" is underdefined
hta: could define as a number of decibels, which would be inconsistent wtih HTML
fluffy: but it's not that. my
proposal is that it's a multiplier in a linear space
... 0 is silence. 1 is maximum volume
... a volume setting you can move up and down between 0 and
1
... could be a linear or logarithmic curve, just pick one. this
is linear
hta: using a constraint as if it were a control
ekr: if you want this, why not use WebAudio?
dom: doesn't make sense as a constraint
fluffy: we had some confusion over 0.5. could remove "volume". not sure WebAudio covers all cases
ekr: is there some reason you can't do it with a filter?
fluffy: different implmenations will do it different ways
mt: isolated streams is an example
fluffy: maybe we shouldn't re-open whether to have volume or not. only proposed change is explaining the meaning of 0.5
hta: let's integrate this change and close this bug
burn: a clarification, not a
change to the requirements for it.
... if we all agree
mt: some encouragement will be provided. it's probably a bad idea to do it over an unauthenticated origin
hta: will assign that to ekr
npdoty: should be clear about whether the requirement on stored permissions is normative
ekr: it should be normative, as it is in IETF
npdoty: and that stored
permissions section would be a good place for the additional
encouragement, and should use a better definition for "secure
origin"
... may follow up in email
jib: constrainable pattern, which
is abstract. and specific use in getUserMedia
... specific use doesn't need to be abstract. should say
exactly what is returned
... reuse the existing MediaTrackContraintSet dictionary, which
may be added to in the future
... a second dictionary, a subset of the capability set
... hopefully I get back success and get back values
... capabilities is a superset of constraints which is a
superset of settings
... pull request illuminates that the datatypes are
related
... should we write two more dictionaries (enumerating the same
keys), or should we just re-use the same type?
... re-use the same type because capabilities are exactly the
same structure (based on the prose)
burn: IDL, we don't say that, that would be a change to the document
jib: we could use a narrower data type for the returned set, but it could easily be the same data type
mt: no content-accessible type
information are available
... maybe it should return an array of strings rather than a
dictionary anyway
... don't mind about the difference between capabilities and
constraints. tough for spec authors and implementers, but oh
well
... JavaScript more natural to use an array, with indexOf
jib: could be a fourth use of the
dictionary. return a dictionary that you can enumerate, all the
keys you find in there are supported
... UA puts in some truthy, an object
burn: trying to remember why we did it this way
fluffy, where are you?
burn: don't want to put the same
defined type for all those different returns
... because they're not the same return
jib: X, Y and Z are different
things, even if they're the same type
... we need more specific text. either this pull request with
using the same dictionary, or we define more specific
dictionaries
hta: separate discussion of
getSupportedConstraints
... capabilities, you might want to look at the value, modify
it slightly and then send it back to the browser
burn: even if you want them to be almost the same data structure, I'd rather see different names for them
jib: different names, same
type
... argument type, argument name
dom: developers are not likely to read the spec
mt: something we typically leave
to editorial discretion
... if they can address it in some way, leave it up to
them
... we will review the outcome and ensure it's not crazy
... acceptable?
jib: fine. but want to specify something, not just abstract types
burn: I hear you.
... we already have the prose for it, but now have the IDL
fluffy: legal syntax is different
dom: has anyone started implementing?
jib: hoping not to make any functional changes at this point
hta: WG position is to leave to editorial discretion
dom: WebIDL must be valid
fluffy: editors please bring us a proposal
[adjourned for lunch.]
re-convene at 1pm
<dom> Media Capture Depth Stream Extensions specification
<anssik> https://docs.google.com/presentation/d/1mwlD8H_RzlB2JheyjqXxa7sMSMTN8x96VgSzjy5B4pc/view
<dom> Anssi's slides
<dom> ScribeNick: dom
Anssi: I'm Anssi Koitianen from
Intel, Ningxin Hu from Intel, and Rob Manson (Invited
Expert)
... we discussed the idea of bringing 3D camera to the Web last
year at TPAC
... I remember polling for interest back then
... lots has happened since then
... we collected use cases, played with the spec and wrote
code
... we will be summarizing this
... [slide 2]
... The spec is about making 3D camera 1st-class citizen of the
Web platform
... up to now, these have required special plugins
... the native platforms have these capabilities
<hta> Stefan is running the slides (so you don't have to say "stefan or someone")
Anssi: the approach we've taken
is to integrate with existing APIs as much as possible
... reusing primitives rather than inviting new APIs
... this means relying on getUserMedia, Canvas 2D, WebGL
... if you attended the symposium on Wednesday, you saw a live
demonstration on stage
... TimBL mentioned it as exciting :)
... [slide 3]
... Current status: we started with use cases and requirements
— thanks for the contributions!
... it took 2 to 3 months to make sure we had a solid set of
requirements
... over the summer, we started drafting the specification and
published as a FPWD two weeks ago
... parallel to this work, Ninxgin has been working on an
experimental implementation which was used on stage on
Wednesday
... the code is available
Ningxin: the build is available on Windows; the source code is also available
Anssi: the references are given
on the last slide
... [slide 4]
... Regarding use cases: some of them are obvious, like video
games (e.g fruit ninja with your hands)
... 3D object scanning: measure a sofa by pointing at it
... video conferencing — it would let you remove the
background; or make the experience more immersive
... lots of use cases in augmented reality too
... Rob, maybe you want to expand with your favorite AR
Rob: you can add virtual objects
behind real objects
... all AR could be improved with depth tracking
Anssi: this is only scratching
the surface — there are lots of other use cases
... I think it's as significant as bringing the RGB stream to
the Web, with lots of potential
... [slide 5]
... This summarizes our IDL interfaces
... not all of them are complete yet
... but this is our current view of what needs to be done
... we're very open to feedback on this
... we've already received good feedback from canvas
implementors — we'll adjustment based on this
... I won't go on the details — look at the spec for that
... DepthData is the data structure that holds the depth
map
... CameraParameters, soon to be renamed CameraIntrisics
... it's associated with the DepthData
... it represents the mathematical relationships between the 3D
space and its projection in the image plane
<anssik> http://en.wikipedia.org/wiki/Pinhole_camera_model
Anssi: it's the minimal data
required for the pinhole camera model
... these are the two only new interfaces we're adding; the
rest are extensions to existing interfaces
<juberti> please, please, can we add getTracks(kind) instead of getDepthTracks
Anssi: We add a boolean flag to MediaStreamConstraints ; similar to the audio and video boolean
Martin: are you planning on having constraints for these devices?
anssi: we've chosen to wait for the constraints discussion to stabilize
martin: I think we're stable enough; we would need your input on what constraints would be needed in this space
Rob: in a lot of ways, the constraints can be very similar to the video constraints (e.g. minimal range for width and height)
it's also related to CameraIntrinsics - but they'll largely just be read only Settings and Capabilities
anssi: we're still looking at
this
... the group sounds to be open for us to propose new
constraints
... thanks for that feedback
... we will take care of that aspect
... the next interface we're extending, we add getDepthTracks()
which returns a sequence of depth track
<robman> +1 to getTracksKind() or a more generic idea
dom: justin noted he would prefer to have a generic getTracks(kind) instead of the specific getDepthTracks
anssi: noted; we'll look at this
too
... Next interface is adding the "depth" kind attribute
<juberti> this of course would be generic and obsolete getAudioTracks and getVideoTracks
anssi: In addition to extending
these getUserMedia interfaces, we have also additional APIs on
the Canvas Context API
... similar to the imagedata apis
... we're having discussions with the canvas editors
... [Usage example slide]
... this is copy-pasted from the use cases doc
... this shows how easy it is for someone familiar with
getUserMedia to use that API
... [next steps slide]
... we're engaging with the Khronos folks for a minor extension
to WebGL to be able to pipe data to the WebGL context
Ningxin: we are proposing a small
extension to the WebGL extension called
WEBGL_texture_from_depth_video
... with that extension, Web app developers need to know
whether they can upload a video element representing a depth
stream to WebGL
... using shaders
... with this extension, it defines circumstances under which
an HTML video element with depth data can be uploaded
there
... we will define the format of the texture
... this is a proposal against WebGL 1.0
... if WebGL2.0 comes, we will update the texture format to
match
... DepthData as unsigned short is to be as close as possible
to the native representation of the depth stream
... (which is what most 3D cameras give)
... so as to limit CPU processing as much possible, and leave
as much as the GPU parallelism as possible
Anssi: we've talked with Dom with
regard to the collaboration with Khronos
... we're currently working on an informal technical
basis
... we'll keep both groups updated when we make progress on
either side
... that's our model for operation
... Khronos has an established model for WebGL extensions;
there are tens of extensions that are widely implemented
... ningxin and Rob are the ones watching this space most
closely
... the other part of our work is to address open issues
... we use github issue tracker to track open issues
... that's the place to go to if you want to open new
issues
... the slide shows the list of currently identified
issues
... the highest priority items should be resolved before we
publish a heartbeat wd
<robman> NOTE: a range of these issues are likely to be resolved as part of the update to use the ConstrainablePattern
bernard: the issue list talks
about transmission via WebRTC peerconnection
... that would mean changes to WebRTC 1.0?
<scribe> ... new codecs?
peter: what happens if you @@@
ningxin: this is still under
discussion
... we're looking at an extension to H264 to support 3D TV to
carry the depth data besides the RGB data in the stream
... there are already several extensions in the codec space to
do that
... there is also an extension to SDP to describe that kind of
media
... we're looking at all these to see if we can support that
transmission
peter: with regard to PeerConnection, it's critical to determine if it's a separate track or part of the same codec
Rob: our proposal is that it's a different track, that looks like a video track
peter: but that requires different RTP packets
Shijun: the codec extension
defines a different bitstream from the video
... I was the first proposer for stereo video coding for H264
10 years ago
... I'm working on this at Microsoft still
... it's a fun project, but I'm not sure it's ready for prime
time
Anssi: it's good we have the right people in the room — we'll continue the discussion on the mailing list
Martin: we need to understand
what adding a depth track to a peerconnection means
... this has impact on many things
bernard: unless the codec supports this, you simply won't get anything
stefanh: we can extend the timeslot a bit for having this discussion
anssi: is everyone here active in
the task force? we would like to keep you in the loop and we
would appreciate your continued contributions?
... we're currently at this phase where we're just getting more
and more people to look at our work and giving feedback
... we appreciate feedback from people interested in this
technology and with the right background?
peter: can we do getTracks(kind) instead of getAudioTracks / getVideoTracks?
hta: we already have that
martin: let's just not add kind-specific tracks any more
Shijun: for any stereo related
topic, it would be useful to check with other groups on whether
stereo videos in video tags
... if we don't have any surface to render a 3D video, what
would we do with these streams?
... (even if there are other use cases without that)
anssi: note that webrtc is not required to make use of this; the same way getUserMedia is used well beyond WebRTC
shijun: I'm not saying don't do
stereo video capture
... but whether we want to make that transmissible via WebRTC
is another question
ningxin: regarding the 3D video
question, our proposal makes it possible to use the depth
stream independently
... 3D cameras can capture the depth stream without the RGB
stream
... e.g. for hand gesture detection
Martin: if we have both video and depth with different constraints, what would that do the cameras?
Rob: we need to calibrate the
cameras
... but otherwise, the constraints should apply to the both at
the same time
... for a calibrated stream with the two together, you should
consider them as a single source
Shijun: these are two sensors
with different ids
... synchronizing the signals across these is quite
challenging
... delays can induce headaches
martin: if you request a
mediastream with both video and depth, you get back a single
mediastream
... which by definition are kept in synchrony
rob: asking for both depth and video gets you a calibrated stream
dom: the WebRTC story is far from
being done
... but looking at it will be a good test of the extensibility
of the WebRTC API
hta: the depthinput kind needs to
be added to the enum of type of devices in
enumerateDevices()
... (I don't want to think of depthoutput quite yet)
martin: let's get the non-WebRTC
stuff done; the WebRTC interactions are a whole new
enterprise
... we should scope the work to reflect that
anssi: makes sense
... thanks for the great feedback
... [demos]
... [magic xylophone demo]
ningxin: this is based on a demo
that was done with a simple RGB stream analysis
... trying to detect the movements by analysing the RGB
stream
... I modified that to add depth data
... there is a video representing the depth data
... js-based hand recognition is based on js-handtracking
... also originally based on RGB data; we updated it to use
depth data
... it's more accurate, more stable and more performant
... we can extract the background and apply the recognition
algorithm only on the foreground objects, reducing
computation
... because depth camera are infra-red based, they can be used
in low-illumination context
anssi: [fruit ninja demo]
ningxin: the idea is similar;
still based on js-handtracking library
... originally, this is done with a mouse or a touch
screen
... here we integrate this with finger gestures
... you can also see the integration of the depth image that we
composite above the background and beyond the foreground
... this is done via WebGL via shaders
... that demonstrates depth rendering with WebGL still in 2D
space
anssi: [RGB+Depth with WebGL]
ningxin: we initiate a request
with depth only where the user is seen only as data
... then we request also RGB data
... main idea is here to use RGB texture and depth for the
positioning
anssi: hopefully this gave a good
idea of what this technology is about
... please give feedback, and let's discuss this further on the
mailing list or in github, etc
... the spec has still lots of room for changes
... we have ongoing discussions with the canvas folks
... if you're active also in the HTML WG, the public-canvas-api
mailing list is where this is discussed
stefanh: thank you anssi!
<robman> excellent discussion and feedback - thank you everyone
stefanh: no further questions from the room
hta: I think this work is great; if scoped adequately (i.e. getusermedia stuff first), this will be useful in many contexts
<robman> 8)
hta: I'm glad you brought that here!
martin: this discussion will be
around directions and guidance in this space
... we want to create media stream from stuff on the
screen
... [slide 3] this is what we are looking at for the API — an
additional source property to the mediastreamconstraints
... that source could apply for non-video cases (e.g. get the
audio from a given app)
... [slide 5] we're down with terrible names, but not sure what
to do
... "monitor" is also used in the context of audio
burn: I like "system"; it works well for audio, video, etc
martin: only issue is that system sounds global, where we would want for some sub-part (e.g. a window)
dom: we could hook the API on another end point than navigator.mediaDevices
martin: jib suggested the same
pthatcher: another suggestion would be to enumerate "system" devices to which you could then constraint your getUserMedia call
martin: that doesn't quite fit
with the security model where we want the user to be entirely
in control of what is shared
... we want to distinguish the sources that are enumerable and
the ones that aren't
... screen sharing would fit in the latter case
shijun: the browser is in a better position also to pre-select sources too
martin: the UX mock up our UI
guys have done comes with a doorhanger where the user can pick
which windows to share (with no default selection)
... none of this is exposed to the app until it is agreed by
the user
... I don't think that's a problem that is too troubling
here
... but we need a common taxonomy to have these
discussions
... I'm gravitating towards "monitor", "window", and "browser"
(without distinguish tab or window)
<robman> display?
peter: window doesn't really
apply well across mobile / desktop
... leaving it a bit generic is valuable
martin: "system" is too generic because my microphone is part of system
burn: maybe getOutputMedia?
peter: (or alternatively 'source: output')
martin: I'll go with one of
these
... for application, our Cisco friends who have experience in
this area have shared that "application" doesn't work very well
for users
... someone decides to share a power point presentation
... they choose to share the whole PPT application rather than
just the single presentation they want to show
... and leak information without realizing
... so the suggestion was to simplify the interface and leave
it to the browser to determine how to e.g. composite several
window in one if they feel that's adequate
... this would be a new enum constraint
... we distinguish browser windows from other app windows for
same origin protection
... breaking that isolation is too potential scary to be just
dismissed via a getUserMedia prompt
dom: for "browser" filtering, would you also identify other instances of the current browser? of other browsers?
martin: typically, only the
current instance since that's the only one the attacker is
potentially in control of
... doing that with other browsers require a lot more difficult
social engineering
... it's a bit more far stretched for other browsers, but not
completely unthinkable either
... we may want to filter other browsers out
... clearly sharing another browser window is something we want
to enable without too much work for the user
... we're currently whitelisting, but want to have a more
generic approach
... but we're still trying to figure how to get a clear signal
from the user that they trust enough the site to share that
other site
bernard: there are other risks we're not protecting against (e.g. an app revealing the user of the password)
martin: users are likely to
understand this
... but users probably don't understand the cross-origin risks
where we need to protect a site from another site
alexandre: what's the impact on
the API?
... independently of the UI impact
martin: it doesn't have impact on
the API
... there will be a number of sources that can produce screen
sharing
... and access to any one of those will be what the browser
allows combined with the consent mechanism
shijun: for screen sharing, should we isolate by default?
martin: it doesn't solve the problem - the attacker could simply sends the screen sharing stream to itself
shijun: I think there won't be a unique solution; but I think isolated streams might be part of the solution
dom: what if one could only start screen sharing after a peerIdentity-certified connection has already been established?
martin: I'm not convinced this would really help
dom: but I guess this shows that not all policy decisions can be API-neutral
martin: right... Will need to
think more about this, please send more ideans and
suggestions
... [slide 6]
... some platforms offer distinction between logical and
visible windows
shijun: on some platforms, content that is not visible is not fully rendered
martin: screen sharing guys have tricks to circumvent this
alex: we need to make sure that the security considerations don't break interop and make it horrible to develop for
hta: what are our next steps?
martin: we've been working on
this informally so far
... there is a w3c github repo but that's as much as formality
as we got
... we probably need to decide whether this group wants to take
it on
hta: so the next step should be to present a draft to the group and push it to FPWD
martin: ok, let's proceed with that; I'll make a few updates on terminology before we get that reviewed
hta: rough plan would to achieve
consensus on FPWD by week after IETF
... is there rough consensus here we should proceed with
that?
shijun: agreed
[lots of heads nodding]
<stefanh> slides shown on the getUserMedia test suite
<stefanh> (webrtc test suite non-exitstent right now)
This is scribe.perl Revision: 1.138 of Date: 2013-04-25 13:59:11 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) FAILED: i/Toipc: Welcome/ScribeNick: dom Succeeded: i/Topic: Welcome/ScribeNick: dom Succeeded: s/might decide/MAY decide/ Succeeded: s/matthew/martin/ Succeeded: s/sss/ShijunSun/ Succeeded: s/<someone from w3c>/npdoty/ Succeeded: s/audio/input/ Succeeded: s/video/output/ Succeeded: s/WebPaps/Webapps/ Succeeded: s/hi Domenic, thanks for joining! we're still getting set up here FWIW// Succeeded: s/ok, np.// Succeeded: s/lol wut// Succeeded: s/(hi everyone)// Succeeded: s/:P// Succeeded: s|i/Toipc: Welcome/ScribeNick: dom|| Succeeded: i/talking about Last Call/scribenick: npdoty Succeeded: s/@@/jib/g Succeeded: s/improved with head tracking/improved with depth tracking/ Succeeded: s/ ... @@@/it's also related to CameraIntrinsics - but they'll largely just be read only Settings and Capabilities/ Succeeded: s/@@@_video/WEBGL_texture_from_depth_video/ Found ScribeNick: dom Found ScribeNick: alexG Found ScribeNick: npdoty Found ScribeNick: dom Inferring Scribes: dom, alexG, npdoty Scribes: dom, alexG, npdoty ScribeNicks: dom, alexG, npdoty WARNING: Replacing list of attendees. Old list: TPAC Domenic New list: Portland anssik +86216116aaaa robman ningxin Default Present: Portland, anssik, +86216116aaaa, robman, ningxin Present: Portland anssik +86216116aaaa robman ningxin Josh_Soref Got date from IRC log name: 30 Oct 2014 Guessing minutes URL: http://www.w3.org/2014/10/30-mediacap-minutes.html People with action items: josh juberti martin philcohen shijun shijuns[End of scribe.perl diagnostic output]