Media Capture Task Force teleconference -- 24 Apr 2012

<scribe> Scribe: Josh_Soref

Administrativia

Move MediaStream base to getUserMedia doc

<anant> proposed way to integrate MediaStream into the getUserMedia doc: http://mozilla.github.com/webrtc-w3c/getusermedia.html

Minutes approval

Resolution: Minutes from last call are approved

<inserted> anant: we'd like to propose integrating the change and publishing an editor's draft

<richt> +1 to Anant's proposal.

anant: if there are no major comments, i'd like to move this into VC on Friday

adambe: we can always back out changes

anant: we don't gain anything by publishing sooner

Travis: publishing, or getting a new editor's draft?

anant: we'd like to have a new editor's draft

adambe: the reason i proposed that we should move it as soon as possible
... is that i don't think we need to introduce another step
... there's already the Editor's Draft before the Working Draft

Travis: this change involves moving part from the CVS repo to the Hg repo

<adambe> <- speaker

adambe: we're currently working in Github and then publishing to CVS

<jesup> I'm on the 610 number

anant: how does Hg relate to the ED on www?

Travis: pushing to Hg on w3 updates the ED

adambe: where is the mercurial repository in the picture?

anant: getUserMedia is on Hg (dvcs.w3.org)
... WebRTC is in CVS (cvs.w3.org)

<stefanh> I said 'Anant, please post a link on the list with a link to the updated version'

adambe: we have git (internal edits)

stefanh: for the time being it will probably stay in the same repo as WebRTC (CVS)

Resource reservation

Constraints and Capabilities

http://lists.w3.org/Archives/Public/public-media-capture/2012Apr/0027.html

burn_: I put a summary at the top
... a number of comments on the list were relating to not understanding the structure
... I looked and realized that a number of things were relating to violating JavaScript syntax
... i'd much rather MediaStreamDeviceCapabilities be a dictionary
... but elements of an array can't be a dictionary
... so i don't know how to do that
... suggestions (offline) welcome
... I distinguished between MediaStream Constraints and XXW
... when you have an object that has only one key-value pair, and when you have an object with one-or-more key-value pairs
... looking at Constraints
... i didn't change the core algorithm
... I changed 2A and 3C1
... relating to what to do when a constraint is not supported by the browser
... 2A is in the Mandatory set
... if the author specified a Mandatory constraint and the browser doesn't support it
... then it must count as an error
... 3C1 is for Optional constraints
... the instructions say to skip it if the browser doesn't support it
... I updated the first example
... to use the new syntax
... I also added two more examples
... in the first example, i included mandatory and optional lists
... both parts are Sequences
... we could change mandatory could be a Set
... the browser is specified to honor all constraints for Mandatory, so order doesn't matter
... but for Optional constraints, order matters
... if you have conflicting optional constraints
... the algorithm requires satisfying the first one
... for simplicity in the mind of the author, both are sequences
... with ONE value permitted in each array element
... i'm open to improvement in how we structure each element
... the requirement stems from having key-value pairs
... the first example has Mandatory and Optional sections

[ burn_ reads through example ]

burn_: the browser can fail to satisfy optional constraints
... the goal is to satisfy as many as it can in the order given
... for the second example
... what to do if you don't want to set any constraints
... i came up with a name
... video enum provide
... the registry provides for max, min, enum
... the principal is that there's a constraint for each one
... solely to say "i want a stream of that type returned"
... the example says "i must have a video"
... "audio would be nice"

anant: I like the approach in general

burn_: I've only defined audio + video
... i haven't seen any others

anant: i'd propose maintaining the dictionary

<anant> {audio: true, video: true} <-- simple case

stefanh: didn't richt propose something like that?

<anant> {audio: {mandatory: [], optional:[]}, video: {mandatory: [], optional: []}} <-- constraint case

richt: it comes back to a more simple thing
... it's really the use cases
... which we haven't talked about at all

<richt> My feedback: http://lists.w3.org/Archives/Public/public-media-capture/2012Mar/0042.html

<richt> Travis' feedback: http://lists.w3.org/Archives/Public/public-media-capture/2012Mar/0072.html

richt: the use case here is user/environment facing cameras

burn_: you and i have talked about it
... just because you don't believe it
... doesn't mean it's a real
... i put in a proposal for a user/environment as a constraint
... i remember someone asking for other things
... maybe jesup asked about resolution?

richt: i'd really like to see the UCs
... on the list
... so we could see it and discuss it

anant: burn_, stating that a web app needs to control resolution
... isn't a valid use case
... you need to frame it in terms of a real application someone wants to build

hta: there is a UC for RTCWeb needing [rec:F25] to control resolution

<anant> https://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06#section-5.3 that's the document

<anant> section 5.3 lists API requirements

<anant> 5.2 has browser requirements

<hta> Requirement F25 of draft-ietf-rtcweb-use-cases-and-requirements

Travis: is controlling resolution similar to min-height/max-height/min-width/max-width

anant: i think width/height are different resolution because pixel density is different

Travis: ah, pixel density

anant: we typically try to avoid specifying width/height in pixels
... typically you use `1em`
... given that devices have different dimensions

derf: we hard code pixel count for em in CSS

anant: but by using a different unit than pixels
... it gives us the freedom to vary

Josh_Soref: No it doesn't

derf: not at all

anant: we should use a different dimension for width-height
... and then use pixels for dpi

Travis: if we want media stream to interoperate with Canvas which deals in pixels
... then we need to have a way to translate
... i'd like to get back to burn_
... on mandatory/optional for vidoe/audio

burn_: the pixel dimension discussion is orthogonal
... wrt audio/video with mandatory/optional
... and a simplified form of true
... while audio/video are the two stream types we talk about today
... those are the types that my company cares about
... i've heard people say they want to support other media types in the future
... so i tried to come up with a data structure that doesn't constrain to audio/video
... only the registry constrains it
... if a new media type comes up, then you can just add items to the registry
... without reving the document

Travis: one of the UCs suggested a while ago
... was to record your screen
... while that technically falls under the category of Video
... it seems like you could define that as a new Provider
... [The Screen]

burn_: that's possible
... you could go to the registry and do that
... maybe there are kinds of Text sent as something other than audio/video
... i don't know what the future holds

Travis: i think it's definitely good that we design this API so that it's easily extensible/not limiting
... i applaud that

anant: the reason you don't want it to be top level is to avoid revising the API spec
... it seems like we're working around revising the api

<jesup> media == time-labelled sampled data, typically from a sensor of some sort

anant: maybe we should make the first argument to getUserMedia be extensible
... so that we could extend things in the future

burn_: i think we should decide how the registry should operate
... i was trying to make the registry easy
... if you and others would rather audio+video be top level, we could do it

anant: we should prioritize the API being better over curating the registry

burn_: i agree with you
... in general i'm in favor of making things easier for the end user of the api
... vs the implementers

Travis: may i propose that we just add audio+video to the constraints dictionary

hta: can i propose to not do that
... having two pathways through the code makes it more complex

adambe: Travis is proposing something similar to the one i proposed
... you'd remove the audio-enum-provide

Travis: that's what i'm trying to suggest
... we should take this to the list

adambe: there's already a thread about the syntax of this
... paul neave
... burn_ pointed out that order is important
... we have a thread for this
... regarding anant 's proposal
... and revising the w3c spec
... i don't think there's a problem with adding screen at a later point
... adding stuff is pretty straightforward

Travis: i agree
... and adding things would be backwards compatible

richt: to be clear on that
... we have implemented it
... but we can make it work
... we shouldn't be wed to the current bit

Travis: that's a very politically correct thing to say

burn_: if the way to actually make this work is to have mandatory, optional, audio, video
... and that we can add others later
... maybe that's the way to go

stefanh: maybe that's the way to go

richt: there's concern about booleans
... dictionaries would be better

burn_: closer to what anant proposed?

richt: yes

burn_: i'm fine with that too
... maybe the thing to do is to write up both

<richt> I'll try to dig up the email referring to why booleans are bad in web apis :)

burn_: and see what people think

ACTION burn_ to write up the audio-video-mandatory proposals to the list

<trackbot> Sorry, couldn't find user - burn_

ACTION burn to write up the audio-video-mandatory proposals to the list

<trackbot> Created ACTION-41 - Write up the audio-video-mandatory proposals to the list [on Daniel Burnett - due 2012-05-01].

Constaints Example 3

UNKNOWN_SPEAKER: in the third example, audio+video are in mandatory
... the browser has to return audio+video, or it's an error

Capabilities

<adambe> richt:: this was the mail I was talking about earlier http://lists.w3.org/Archives/Public/public-media-capture/2011Dec/0061.html

burn_: I cleaned up the definition of the structure
... so it's now an array
... with one entry for each device/channel
... i'll let someone else suggest the appropriate term for that
... each entry contains an id + capabilities
... an id needs to be unique relative to the other ids in the same capabilities array
... "camera001" v. "camera002"
... the name must be composed of the high level media type
... Camera should have said Video
... followed by an opaque alphanumeric id
... so that you can distinguish by media type and per device
... but nothing else
... i used "supported" and "satisfiable"
... the description of the Trusted Scenario hasn't changed
... I added a bit for the Untrusted Scenario
... for the uses at my company, we're pretty much interested in the trusted scenario
... if people have suggestions for untrusted, please do
... my suggestion was just listing IDs, but no capabilties
... determining trust levels, I wrote TBD
... other TF members are better able to comment

anant: the getCapabilities call
... you defined it under navigator

<richt> s#I'll try to dig up the email referring to why booleans are bad in web apis :)#Why booleans are bad in Web APIs: http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0349.html/#

anant: we can't have it there
... we could have navigator.media.getCapabilities

burn_: good point
... i wasn't paying attention to that. you're absolutely right
... let's do that on the list

<Travis> navigator.getUserMediaCapabilities :) (bikeshedding)

burn_: that's related to the discussion about how many different places should you be able to get capabilities/set constraints
... we should decide what these mean

anant: i agree
... your company is the trusted case
... at mozilla we're more interested in the untrusted case and expect it to be more common

burn_: there reason it isn't there is because although i think it's important, i don't know how to do it right and want someone to do it right

hta: if you have a camera (with microphone), is it really two devices?

burn_: i was, but i am open to that
... we have a challenge anyway
... earlier proposals didn't really distinguish
... they asked for "give me audio+video from the same device"
... this is where anant 's more general constraints coming in
... maybe i want audio+video from the same device

adambe: isn't that incompatible?
... where would those go?

burn_: under General

hta: we need to be clear about whether a camera with audio is two or one device

adambe: it doesn't have to be two devices
... say i want to use my headset with a mic and my webcam
... you can't force people to force them to use the crappy mic

hta: you might want to know if the mic is next to the camera
... i expect CLUE people will want the 6-dimensional coordinates of the microphone

adambe: about the algorithm
... 1C the first pass is through all possible streams the browser could return
... i'm not sure how practical that is

[ burn_ points to sentence ]

burn_: there may be more efficient ways to implement this
... it's easier to describe this algorithm as a process of elimination
... than a process of addition
... if the browser can take its 5 streams and know which satisfy the algorithm, that's fine

Travis: algorithm step 4
... will call success callback with the final set
... does each callback get a single track?
... or does the UA group them into as many compatible stream objects as possible?

burn_: i wasn't clear about this
... for quite a while, there wasn't clear on what a Stream was v. Track v. Channel
... it depends on what we say getUserMedia should return
... if one media stream is returned containing multiple tracks
... then i'd say it has at most 1 audio and 1 video
... if the group says that they're separate, then i'm fine with that too
... when you get to this algorithm
... See 3
... 3D, select one stream from the candidate set and add it to the final set
... we add one set of video-data (if requested) and one set of audio-data (if requested)
... we're not merging
... [ burn_ tries to avoid saying Stream/Track ]

hta: i think it should be Track

burn_: the closest to my understanding is Track
... so i should probably rewrite it

Travis: the algorithm will return at most one Track of audio and one Track of video
... and return it in some container

adambe: the output is what the user gives it
... it isn't a problem if

burn_: the point of constraints is to say what you as an application author cares about
... and the browser selects one

adambe: the user needs to select a camera
... even if you have constraints, the user needs to select from the satisfiable list

hta: step 3D
... select one stream [s.b track]
... this step may be automatic or involve user interaction

jesup: if these algorithms are limited to returning a single video/audio track
... how do we handle the UCs that handle multiple synchronized cameras?
... this algorithm concerns me

burn_: that's a great question
... there's a separate thread on the list about that
... whether it should return one track per media type
... or should return multiple ones
... the algorithm is written for the single case
... it's possible to extend it for multiple
... but i'd like to scope that to a distinct discussion
... i agree jesup, this algorithm doesn't cover that

stefanh: burn_, there's a step where the user would be involved

burn_: i don't know if we've been completely clear about how permissions work
... or where user involvement occurs
... i'm open to suggestions
... maybe hta's 3D is correct

adambe: it needs to be before the success callback

burn_: i don't have a particular opinion on that
... i'm happy to have other people duke it out
... and we can add it explicitly if it's necessary

richt: i believe it's covered in the existing algorithm
... it says the user must select something
... i need to go check

burn_: the last bit is registration
... audio/video were the only two listed as required
... but if we change the structure,...
... i put example definitions for width/height/direction
... those are _examples_
... it is up to this group to decide

<richt> fyi: Step 10 in existing getUserMedia algorithm is the point that permissions occur: http://dev.w3.org/2011/webrtc/editor/getusermedia.html#navigatorusermedia

burn_: on constraints
... i'd almost rather other people suggest them
... it seems whatever i propose, some want and some oppose
... but i think we can take that to the list

stefanh: thanks burn_
... it seems we're discussing details
... are there objections?

richt: i'm objecting
... i want to see some UCs
... that's the premise
... i think it's very important

stefanh: we have the Scenarios document that Travis wrote

richt: i looked through the requirements from Travis's document, and i don't think anything needs this

stefanh: hta pointed out F25

richt: I think F24
... was very vague

<richt> The WebRTC requirement says: "The browser MUST be able to take advantage of capabilities to prioritize voice and video appropriately." That doesn't necessarily pre-ordain constraints are required IMO.

<richt> F24 in https://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06

Travis: i've been planning to update the document
... i anticipate in the next several weeks to be able to put forth an update

ACTION Travis to update the scenarios document requirements portion

<trackbot> Sorry, couldn't find user - Travis

Travis: if anyone has other things they want to see added
... please send them my way

ACTION stefanh to check with Travis on updating the scenarios document requirements portion in about 4 weeks

<trackbot> Created ACTION-42 - Check with Travis on updating the scenarios document requirements portion in about 4 weeks [on Stefan Håkansson - due 2012-05-01].

stefanh: maybe hta and i should give help

Travis: i'm certainly open to that
... we should keep in mind what happens when getUserMedia is called a second/third time with different constraints
... it came up while microsoft was investigating this feature

burn_: that's a good point

<jesup> g+

burn_: it's good to know when you're requesting new streams, and when you're requesting replacement streams

anant: is there a reason to distinguish?
... as opposed to you just closing a stream and requesting a new one

jesup: replacing may cause it to choose a different camera
... or querying the user may be a problem
... or interupting the stream may cause a glitch
... i'm very much in favor of an API which allows for requesting modifications to existing streams
... i made comments on the list
... we talked about using the constraint language
... for modifying an existing request

anant: i agree there are valid UCs for replacement v. new
... i'd like to make a straw man that we don't need to use getUserMedia
... for replacement
... I think there's a set of constraints the browser could change automatically

<jesup> anant: agree

anant: there are some which the browser would need to do with User interaction
... for Modification, we could have the API be on the Stream object

Travis: i'll second that
... and suggest we describe what the action is for the affected stream

hta: i also like the idea of modifying the capabilities of an existing stream
... because it also maps well to modifying remote streams

richt: I also agree

[ Time check ]

Next Call

stefanh: we should arrange for a new call

<richt> richt: I was going to suggest the same thing as Anant. That changing the capabilities of an existing stream should happen on the LocalMediaStream object.

stefanh: when should the next call be?
... after the WebApps F2F meeting?
... [ the first week of May ]

hta: at least 2 weeks from now

stefanh: yes

burn_: that should be easy if we're just continuing the agenda

stefanh: hta and I will put up a doodle

hta: from the week of May 6th to the 12th

stefanh: yeah
... thanks everyone for joining

[ Thanks to the scribe for scribing ]

[ Adjourned ]

<jesup> I can stay a few

<Travis> Zakim: [Microsoft] is Travis

<Travis> :-)

trackbot, end meeting

- DRAFT -

Media Capture Task Force teleconference

24 Apr 2012

Attendees

Contents