W3C

– DRAFT –
WebRTC August 27 2024 meeting

27 August 2024

Attendees

Present
Alfred_Heggestad, Bernard, Carine, Dom, Elad, Florent, Frederick_Google, Guido, Harald, Henrik, Jan-Ivar, JohannesKron, Lucia_Google, Markus_Handell, PatrickRockhill, PeterT, Sameer, SunShin, TimP, Tove, Varun_Singh, Youenn
Regrets
-
Chair
Bernard, HTA, Jan-Ivar
Scribe
dom

Meeting minutes

Recording: https://www.youtube.com/watch?v=SDKG463dvfI

Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf

Bernard: TPAC is ahead of us - please send request for agenda time, takind advantage of the longer meetings we'll have there

Captured Surface Control 🎞︎

[Slide 11]

[Slide 12]

[Slide 13]

[Slide 14]

[Slide 15]

[Slide 16]

[Slide 17]

Jan-Ivar: the capture wheel solution looks promising, I'm supportive; couldn't we use it for zoom as well, through the preview tile with some browser controls?
… re zoom level, would there be an opportunity to give feedback on the API shape? e.g. use an attribute instead of a method
… re transient activation, would it be consumed? would this through a button?

Elad: for instance, but it would vary across apps

Jan-Ivar: why a 0-100 integers rather than floating point?

Elad: it matches what browsers show in their UI; also helps with other UI (e.g. drowdown, slider, radio buttons) which is also why we want to leave the UI to the app

Jan-Ivar: I still would prefer to use the same solution for zoom; does the zoom affect only the capture or also the original doc?

Elad: also the original document

Youenn: I discussed this internally; being able to send commands to another app breaks a pretty high security boundary, which got pushback
… +1 on consuming user activation
… re scrolling - how should this work on touch devices (e.g. ipad)? limiting this to "wheels" isn't ideal

Guido: scrolling might be a better name indeed
… we could limit this to a browser surface for the time being and leave it window to a later iteration

youenn: in terms of UX, either you embed everything in the capturing app, or you leave the capturing app aside
… in the latter case, managing scrolling is of less interest

Elad: yes, but that pattern doesn't work across all apps/UXes
… there is finite real estate on the screen to make use of

youenn: this is an area of experimentation, e.g. macos provides new options in this space
… but in general, having inconsistent behavior across browser/non-browser apps would be un-optimal
… conversely, if the plan is to integrate both, we need to understand how that would work and if that could work

Guido: how about to start with tab?

Youenn: tab is interesting, but if we limit ourselves to tab, this isn't necessarily the best API

Elad: but shipping tab would be a good way to validate the interest before we invest in the more complicated space for "window" (which requires different OS adaption and different security barriers)

Jan-Ivar: re transient activation, it doesn't resolve the remote attack - e.g. setting a very high zoom would confuse the user
… hence why I would prefer the wheel approach
… the PiP button in the media element in FF could serve an example of a browser-provided UI

Elad: so I hear support for send wheel from Jan-Ivar

Youenn: on our end, feedback is negative at the moment - having something that keeps more control under the user agent would be preferable

Elad: does that apply if we only do tabs?

Youenn: not currently opposed to tabs, but it remains that the more control left in the UA, the better

Moving Forward with Mute 🎞︎

[Slide 20]

[Slide 21]

[Slide 22]

[Slide 23]

[Slide 24]

[Slide 25]

[Slide 26]

Youenn: track.muted means no frame, not black frame - we should decide first what to do with black frames
… this is JS-observable
… in Safari, there will no rfvc callback from a muted track
… we should have a consistent implementation

guido: not opposed to that, but the spec currently supports including black frames in muted

youenn: so let's try to converge on muted = no frame

guido: the goal would be to transition existing apps to the new attribute, and then frame counter

youenn: I'm not sure Safari would implement this, but this may not impact compat
… re "isSendingFrames = false", it would be best to use "isNotSendingFrames" for compat with UA that wouldn't implement it
… if the source is generating black frames, I'm happy for them to have a counter

Bernard: I share some of Youenn's concerns
… originally, we did say that black frames would be sent on muted, but I don't think we thought this through the whole system
… inferring muted from seeing black frames feel like it may generate many interop issues across many APIs
… Why did we decide to send blackframe (vs not sending)?

HTA: sending a single black frame to replace the content of a muted stream would be sufficient, but the spec allows to continue sending black frames

Jan-Ivar: I appreciate the migration path you've identified; +1 to using the negative form, and maybe not "sending", but e.g. "producing"

Guido: happy to bikeshed if there is interest in the direction

Jan-Ivar: adding 3 stats feel a bit excessive; maybe we can count which of the frames are black

Youenn: safari only send black frames on a peerconnection (maybe mediarecorder)
… it's a on consumer basis

JSFiddle exploring what happens on mute

Guido: the goal is to simplify the spec by removing the flexibility the spec currently allows
… so that mute becomes more useful with better interop

TimP: if you use stats, everyone is already using polling

Guido: the goal is to have a smooth migration path, with clarity that it will be deprecated later

Henrik: I think the boolean is needed for the migration path; isMuted stops the counter increment in Chrome IIRC

Guido: I'll start a PR to iterate on this

Youenn: I'll file an issue to get us to converge on muted=no frame

Speaker selection 🎞︎

Issue #142 / PR #143 Why prompt for a subset of stored speakers or speakers setSinkId already accepts? 🎞︎

[Slide 30]

youenn: this seems fine to me; small caveat: all output devices exposed in enumerateDevices vs only output speaker associated with a microphone in getUserMedia
… PR #143 is fuzzy about that - not sure if you mean the restricted or broader scope for getUserMedia
… maybe a note to be explicitly this is only for the speaker tied to a microphone exposed via gUM

Issue #133: The first "audiooutput" MediaDeviceInfo returned from enumerateDevices() is not the default device when the default device is not exposed 🎞︎

[Slide 31]

[Slide 32]

Youenn: if we already have an audio output entry, it means we're already out of passive fingerprinting - we could expose the "real" deviceId of the default?

Jan-Ivar: setSinkId("") has different semantics from setSinkId("the-actual-deviceid-of-the-default")

Youenn: indeed, the latter wouldn't change if the default changes
… OK, I'm fine with either proposals, with a bit of a preference with the non-empty string solution

Guido: UA & System defaults aren't the same
… system default maps to what the underlying platform calls system default
… default is different semantically from the specific deviceid currently the default
… the UA might have a different default than the OS, that would track a different device than the system
… I think we need to be more specific about what we mean by system-default device (the one we use "default" for in Chromium)
… I'm partial to proposal B to avoid overloading the meaning of empty string

Jan-Ivar: the spec only talks about system-default, not about UA-default; I'm not aware of any UA with a default speaker

Youenn: I agree with Guido there is a difference

Harald: "default" is a tricky concept; windows had two default devices (one of telephony, the other for general audio)
… referring to a UA default might make more sense since system-default isn't a well-defined term

Jan-Ivar: the empty string is already identified as dynamically following the system-default

RTCRtpEncodingParameters: scaleResolutionTo 🎞︎

[Slide 35]

[Slide 36]

Jan-Ivar: this SGTM; I would use our own dictionary, and find a better name than rect

Henrik: e.g. resolution

Jan-Ivar: re aspect ratio, what you propose seems to match what we do for constraint, I like that
… my only question is if the UA could do it on its own without new API

Henrik: I don't think it's possible, it's inherently racy and buffers makes it even more uncertain

Youenn: this is maxWidth and maxHeight really?

Henrik: yes, we can call it that

Jan-Ivar: what happens if the aspect ratio set by width & height is different from the source?

Henrik: it will make it fit in the specified width & height

Florent: what happens if either width or height isn't specified?

Henrik: I think we should require them both

Florent: that might help deal with aspect ratio issues

Henrik: but that breaks the orientation agnostic approach

Florent: if you only care about maxHeight (as typical e.g. for a presentation)...

Elad: windows or tabs can be resized, so we should probably expect that API to be called more than once

Henrik: the point of the API is to avoid reconfiguration as much as possible, not in all cases

Florent: scaleResolutionDownBy would be a better fit for that situation

Henrik: this is mostly about optimizing processing when dropping layers in simulcast

jan-ivar: what happens when setting both?

Henrik: we throw an exception

RESOLUTION: proceed with a PR for #159 with revised names

RTCRtpParameters.codec matching is probably too strict 🎞︎

[Slide 39]

Florent: there is a provision in the spec about unsetting a codec (pointed to the relevant step in the github issue)
… hidden in the long "apply a description" algorithm
… using the "codec dictionary match" algorithm (which may need to be improved)
… maybe we need to focus it about the other side wants to receive, which as we've grown aware of has a lot of subtleties

[Slide 40]

Harald: the two codecs in the slide can't match, since one of them say it can only deal with 30 fps
… codec matching is defined by SDP O/A, on a per-codec basis

Jan-Ivar: but there are other examples of fmtp that would be compatible, right?

Harald: yes, e.g. most h264 profiles would accept baseline
… but main and high are different superset of baseline, so shouldn't match
… illustrating again this is codec dependent

Bernard: a non-match should only occur in situations where you need symetry (which most codecs don't require)

HTA: that's about negotiation - what we're discussing is what we want to send

Bernard: I thought the original issue was about negotiation; in this is particular example, this is about receiver capabilities, which aren't incompatible as a result, since no symetry is required

HTA: we need a matching algorithm for negotiation, and a different one for setParameters

Jan-Ivar: codec-dict-match shouldn't be confused with the negotiation algorithm
… we should specify a selection algorithm
… the spec allows to clear the codec parameter after negotiation - the UA might still use it as a hint (for then we should specify it for interop)

Florent: the order in the SDP express a preference, but not a requirement

Bernard: +1 - the sender can change to a different negotiated codec at any time (e.g. in case of a hardware codec failure)

HTA: we could argue that if the codec specifies a codec description within the parameters of the negotiated parameters codecs, then it should use that one
… if it's a superset, it needs clearing
… I don't want our spec to be dealing with codec match across all codecs, but we could have a note of the acceptability
… this is usually covered in offer/answer considerations in the relevant RFCs

Florent: if the developer really want a codec, they can call setParameters again
… we probably will learn from developers as adopt it of additional needs
… Clearing the codec parameter already signals that it has been ignored, and stats expose what codec is in use

HTA: let's continue these clarifications in the issue

Summary of resolutions

  1. proceed with a PR for #159 with revised names
Minutes manually created (not a transcript), formatted by scribe.perl version 222 (Sat Jul 22 21:57:07 2023 UTC).