W3C

– DRAFT –
WebRTC September 2021 virtual interim

20 September 2021

Attendees

Present
ArneSchramm, BenWagner, BernardA, BrianBaldino, Carine, Dom, EladAlon, GuidoUrdaneta, Harald, Jan-Ivar, SergioMurillo, SongXu, ThomasGuilbert, TimPanton, TonyHerre, YouennFablet
Regrets
-
Chair
Bernard, Harald, Jan-Ivar
Scribe
dom

Meeting minutes

Slides

Next meetings 🎬

Bernard: October VI to be scheduled 1st week of October - Doodle poll open till nex tweek
… then TPAC meetings (joint & solos)

Status of recent CfCs 🎬

Bernard: Republishing media capture and streams as CR - completed positively on Sep 17
… Jan-Ivar will summarize the chairs decision on it
… Another CfC on Transferrable MediaStreamTracks running until Sep 27
… our next meeting in October will build on this

WHATWG Streams 🎬

Bernard: we have potential dependencies to WHATWG streams
… a number of discussions in their repo relate to issues we've discussed in terms of our media processing pipelines

Agenda review 🎬

Bernard: main topics: Conditional focus, getViewportMedia, Display surface contraints, echo cancellation

Conditional Focus 🎬

Elad: depending on use cases, switching the focus from the browser to the captured window makes more or less sense
… focus control is an important part of the user experience, given that making a presentation can be stressful
… e.g. if you're capturing a window where you're writing text, focus needs to be there
… but there are situations where the browser can be used directly to control to the captured window
… the challenge is that the browser cannot determine one situation from another
… when the capturing application has a lot more situational awareness
… not necessarily complete knowledge, but at least some
… I'm proposing an API that associates stream capture with the ability to give a specific limited focus switch opportunity
… to the capturing application
… because this is done right after the capture is starting (although before a frame is being catpured), the capturing application has all the context it can get to make its decision
… the idea is to gives that focus-switching opportunity in a microtask in a promise resolution of the capture request
… the proposal includes a number of mitigations (e.g. a 1s timeout) to avoid risks of focus-switching attacks
… the particular API I'm proposing is exposed via a method on a subcall of MediaStreamTrack - that way it's only available when obtained through a captured tab or window
… we could look at a more finegrained inheritance tree if there is interest

Jan-Ivar: this is a reasonable problem to solve; I have some concerns with the API surface
… since focus switching is global to the user, it doesn't need to be on a mediastreamtrack subclass
… it could live e.g. on navigator.mediaDevices
… I think a microtask is too narrow - we should queue a task instead, this would give the same presentation
… Without having received a frame, how can app determine whether to switch or not?

Elad: getSettings() on the captured stream can tell you the kind of display surface
… checking the content of a frame is likely challenging to get right in any case
… looking just at the metadata is easier
… re global vs mediastreamtrack, it was partly to protect against attacks based on cloning - but happy to look more into alternatives
… task vs microtask - can you say more about your concerns about shim-ability?

Jan-Ivar: it's a general principle, and I'm not sure the advantages of a microtask in the first place

Elad: part of it was a concern of backwards compatibility and performance

Jan-Ivar: I think track & microtask can both address these aspects
… in any case, my main concern is where the API lives at the moment

Youenn: cloning of tracks is known; when you subtype tracks, it starts to be messy
… what type would be assigned to a cloned track?
… we should avoid subtypes if possible
… mitigations of 1s and against busy-looping sound good
… I need to think more about the 1s delay

Harald: re cloning and MST subtracks - we have one case like that, and I think we should change it
… we have 2 options: subclassing or making the method returns an error
… I don't think JS dev care one way or another
… subclassing feels a bit tidier

Elad: the goal was to reflect our design in the class hierarchy indeed

Youenn: to get there, I think we should first list the use cases where subtypes actually help - just one method feels not enough to consider changing clone()

Elad: 3 methods would fit: captureHandler, @@@ only apply to captured media

Jan-Ivar: I'm opposed to subclassing - I think that API should live in a global space e.g. navigator.mediaDevices.focus

Harald: where will that written up? I would like to see the argument in more details

Elad: I'm hearing interest in the API

Jan-Ivar: interested in solving the problem with a slightly different shape

Youenn: +1 on a different shape, and discussion on the 1s delay; but sounds like a good space to work on

[clarification on the 1s requirement makes Youenn happy]

getViewportMedia 🎬

getViewportMedia(): Let pages opt-in to capture #155

Elad: getViewportMedia is an API allowing to capture the current viewport (what is visible in the tab launching the API call)
… equivalent of calling getDisplayMedia and selecting the current tab
… there is danger associated with self-capture
… to protect against this, we're requiring crossOriginIsolation, opt-in via a header (most likely document policy, but to-be-confirmed)
… and only available to top-level docs or privileged iframes
… Jan-Ivar and I have been discussing a lot and have converged on a number of proposals as summarized in the slide

Jan-Ivar: we're proposing that getViewportMedia would capture the entire viewport when called from an iframe
… and we're proposing using Document Policy with names built on "viewport-capture"
… the first proposal is basically deferring the approach to cropping to later

RESOLUTION: getViewportMedia capture the full viewport when called from an iframe

Harald: re "viewport-capture", is it aligned with the naming convention of Document Policy?

Tim: just noting the two decisions (iframe capturing the full viewport, and naming) are linked

RESOLUTION: use viewport-capture as naming basis for Document Policy of getViewportMedia

Harald: these will be confirmed on the mailing list

Elad: I also intend to suggest a cropping API that might complement getViewMedia in the upcoming months

Jan-Ivar: getViewportMedia should require user activation

Dom: +1

Elad: I can imagine certain cases where use activation makes sense, but others where less so
… e.g. if you open a new tab

Youenn: this feels like a general problem for user activation that is worth discussing in general
… but given that this is privileged API, user activation feels like a must

Dom: +1 on solving it generically for user activation unless we can demonstrate something specific to capturing

Youenn: note that changing user activation rules is really hard, so we need to get our answer right before shipping

jan-ivar: removing user activation shouldn't as hard as adding it afterwards

Elad: I would want more time to make a decision on that particular bit

Display surface constraint 🎬

Revisit: Let getDisplayMedia() influence the default type choice in the picker #184

Elad: getDisplayMedia doesn't let influence user's choice
… user's choice is already being influenced though, by virtue of having a 1st item in the list of choices
… Chrome has Screen-first
… Safari has only choice (so a major influence)
… FF is evolving
… Influence could be wielded positively - towards the safer choice, or the more relevant one
… a lot of Web developers have expressed interest in allowing influence or limit user's choice:
… - save clicks (if the app knows they only want tab, or only want windows)
… - apps want to capture audio - only available on a subset of capture sources
… - tabs provide higher FPS
… - the app knows from context - e.g. allowing to favor slides over other content when doing a presentation
… - avoid risk with over sharing
… The proposal I'm making is to add a hint as part of the contraints, e.g. "ideal: browser"
… the user agent may choose how to apply that hint - from using it to prioritize, to ignoring it or adding warnings in case the UA determines it's not safe to apply the hint
… [showing the specific text proposal in #184]
… all other contraints are still processed after the user made their choice, only that one gets processed before
… it's only a hint, it cannot limit user's choice
… e.g. Chrome would show the list of tabs in preference when "browser" is hinted

Jan-Ivar: in the github discussion, we mentioned additional mitigations - e.g. not listing the requesting tab/window in the list of tabs
… would like to see some of these ideas reflected in the text
… min & exact constraints are disallowed in gDM, so it would have to be "ideal"
… I think it makes sense to use a hint to steer these selectors UI
… for clarification, "influence/limiting" requirements discussed earlier were about the app, not the user agent

Harald: re removing the calling tab, would it be only for this usage of the hint, or any use of gDM?

Jan-Ivar: I think they need to be considered before we add this

Elad: my recollection was we would encourage the UA to warn of risks of self-capture rather than removing the option altogether
… there are other ways of adding friction that doesn't require removing the option completely
… removing it completely might create risks of oversharing via sharing of the entire screen

Jan-Ivar: I think we can probably converge on mitigations for self-capture
… ideally, I would like normative language

Youenn: should we allow a hint for capturing the entire screen? that's the riskiest
… let's focus on hinting towards capturing less
… In general, I dislike constraints - can we add a dedicated parameter instead of reusing the contraints syntax?
… this may open further extensibility down the line (e.g. highlight tabs from a given origin?)
… can you share more about Chrome's plans in terms of mitigations against self-capture and its dangers?

Elad: we haven't prototyped the warning mechanism yet
… re constraints, I have no objection to using a parameter instead of constraints
… re removing "screen" - it's interesting, but if that is the default when no hint is given, this isn't really helping

Youenn: that default behavior is specific to Chrome
… Safari only allows screen, but we will have a picker at some point where screen won't be the default
… and I don't think apps should have a way to default to screen

Jan-Ivar: FF already doesn't default to screen, and +1 to youenn of not allowing (or just ignoring) screen as a constraint

Elad: the user agent would already be free to ignore the hint
… for Chromium, getting visibility on dev's intent would be useful in migrating away from that default

Bernard: in terms of the requests from developers, is audio capture only avaiable on screen?

Elad: no, it's available on tab, and screen on windows

Bernard: re high-FPS capture - is that typically tab?

Elad: in Chromium, yes
… but it's in general, a way for developers to steer toward what they know will work for their use cases

Bernard: is "screen"-level capturing key to any of these requests?

Elad: right; but note that "screen" could be used to capture from a different monitor

Jan-Ivar: but all monitors are dangerous

Elad: so I'm hearing support except for the the screen-hint

TimP: I dislike heuristics-based picker - it makes it a nightmare to test and makes everything unpredictable

Elad: the mention for heuristics was for apps to use, not the UA

Jan-Ivar: supporting, but with stronger language on warnings for self-capture

Echo Cancellation 🎬

Echo cancellation: Need to specify the source of the echo cancellation reference signal #31

Specify constraint echoCancellationReferenceSinkId #32

Harald: this is a request coming from our audio team
… echo cancellation is about removing the audio picked up by the microphone in the room to keep only the audio generated *in* the room
… it's in general complicated - a complicated part is knowing what to remove
… current implementation in Chrome just looks at what's coming it via the peerconnection
… this has proven insufficient and we want to revise this
… if we want to remove audio output, you can hit issues with specific headphones or setups
… from the application perspective, you want to identify what output has been used that is most relevant to echo cancellation and feed that to the algorithm
… to keep it simple, we have an enumaration of output devices via sinkIds
… the proposal is to re-use this sinkid in the contraint for echo cancellation

TimP: +1 to do something in this space
… will it help if you mix WebAudio in?
… i.e. when the audio output comes from WebAudio processing

Harald: yes, it should cover this (as long as the output makes it to the speaker)

Jan-Ivar: Mozilla doesn't believe this API is needed to do correct echo cancellation
… why does the UA needs JS input on this? The UA already know which headset is being used
… it's not clear what getting input from the app is useful here

Harald: which audio output is currently used by the echo cancellation?

Jan-Ivar: I believe we have access to the rendered output (incl out of WebAudio)
… Paul Adenot is our key person on this

Harald: would like his opinion on the headcase

Youenn: +1 to Jan-ivar - the UA should already have access to the all info it needs
… and it has more info that apps would have on this

bernard: Harald, you said chrome currently uses sum of all audio outputs from peerconnection
… is the intent here to improve the chromium implementation or to let them do better echo cancellation?

harald: this is not for app-based echo cancellation

bernard: I've heard requests from apps to do have an adjustable echo cancellation - e.g. an echo cancellation transform stream

Harald: that is orthogonal to this proposal
… echo cancellation can't be modeled as a transform stream: it's a 2 input objects
… it can be modeled as process that takes 2 audio inputs

youenn: you could still do 1 input / 1 output with an additional parameter
… in the transform stream creation with the reference stream

Harald: interesting thing to do, but not this proposal

TimP: there are situations where you don't want to cancel part of the stream being output - e.g. background music
… with the room accoustics
… maybe a rare use case, but one we've stumbled upon it for immersiveness

harald: you could turn echo cancellation off?

timP: but that generates other issues

Sergio: I don't think this proposal would help solve the Chrome issue
… there are 3 different issues being discussed: echo cancellation in Chrome, new echo cancellation tuning use cases (that would need clarification/refinement), and exposing echo cancellation separately from WebRTC (maybe in Web Audio)

Harald: I'm hearing opposition to making an API of the specific proposal because the UA should be able to figure it out
… I find it interesting that only browser output should be cancelled - if you have another app than the browser producing audio, shouldn't it be removed too?

Jan-Ivar: RNNoise has been exploring some of this; but echoCancellation: true is likely focused on the meeting use case

Youenn: the OS can also provide user-configurable echo cancellation styles

Guido: the motivation for Chrome is to help figure which of the output devices should be used as the reference signal for echo cancellation
… if there are several audio output devices with one being preferred by the app

Harald: I'd like to invite comments on the issue on whether this API is needed or not
… I haven't seen much comments on the shape of the API
… if we were to conclude there was such a need, this API may be OK
… but no consensus on the need for such an API

Wrapping up 🎬

Bernard: any CfC needed based on our discussions?

Jan-Ivar: re getViewportMedia, should we put this in a new doc or an existing one?

Dom: having a single document couple their process progress

elad: also keeping them separate helps making clear how distinct they are

youenn: it also helps in terms of separating the test cases in different folders

harald: sounds like convergence towards a separate spec

jan-ivar: would still prefer a single doc

October meeting 🎬

Bernard: next meeting will be devoted to mediacapture-transform - proposed content and agenda was shared on the list

Preview of October Virtual Interim slide deck

Bernard: there is overlap between mediacapture-transform and WHATWG streams issues

Youenn: I will try to mark more explicitly issues in MC-T that are linked to WHATWG streams

Bernard: part of what I thought might be useful to hear is where these upstream WHATWG stream issues are on the roadmap (if at all)

Jan-Ivar: the new proposal we want to present is streams-based, but improvements over the existing one
… still needs some fixes in WHATWG streams
… I have linked demos in the slides for some of the issues we're trying to address

TimP: it would be good to start these presentations with use cases to scope our discussions

Jan-Ivar: the slides Youenn and I developed includes goals of the proposals

Harald: Media Capture Transform starts with use cases

Bernard: Streams have been adopted to use streams to manage pipelines

Youenn: please send early feedback on the proposals

Summary of resolutions

  1. getViewportMedia capture the full viewport when called from an iframe
  2. use viewport-capture as naming basis for Document Policy of getViewportMedia
Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).