Meeting minutes
Recording: https://
Slideset: https://
Encoded Transform - Overflow from TPAC 🎞︎
Harald: during TPAc, we discussed the concept of a packet API, with an explainer, use cases and architecture - not yet done
… other issues didn't get covered
Issue #109 & #119 Depacketization order 🎞︎
Harald: packets don't arrive order on the network (they get lost or retransmitted)
… frames need to be in order for the decoder
… in general, a transformation is simpler when happening in decoding order
… this requires a jitter buffer in front of the decoder
… if the transformer itself introduces jitter, it doesn't get compensated
… currently Chromium has the jitter after the transformer
Bernard: this isn't the only place where we're encoutnering this problem
… are you imagining an explicit API for jitter buffer - e.g. a jitter buffer provided as a transform stream?
Harald: we could say "frames arrive in the order they arrive in", vs "the UA reorder them, incl waiting for frames" (probably not good), with a flag allowing one or the other
Youenn: I recall we discussed this previously
… iirc, we thought that in-order matched the Web developers expectations
… it may make it harder to implement for UA
… we should look at use cases where having out-of-order would be a benefit
… it's a possible footgun; if there are good use cases for it, then we should look for a solution, but otherwise, we should stick with in-order as in the spec
Bernard: for the crypto use case, is out-of-order even doable?
Youenn: for SFrame yes
… the counter may not be monotonic in that situation
… which would lead to dropped frames
… but it shouldn't be an issue from a decryption perspective
Bernard: so is out-of-order a speed concern?
hta: my worry about is in-order is in the case of lost frames
… without nack, rtx - you have to give up at some point
… if we accept in-order frames, we accept that lost frames will cause delays of some magnitude
dom: the wait-for-loss delay could be provided by the developer?
youenn: having both options would create complexity for developers
… if the transform is taking sometimes 2ms and sometimes much longer, a jitter buffer would then be beneficial
… for decryption or metadata passing, it should be fairly stable
… not sure of the value of a jitter buffer positioned after
hta: sounds like we need more time on use cases
Tony: moving the jitter buffer earlier means increased packet loss (given that it removes the processing time from the jitter buffer)
… there will be delays introduced from operating in a worker (rather than say a real time worklet)
youenn: currently chrome & safari implementations do out-of-order, which don't match the spec
… is Chrome planning to move to in-order? if implementations don't intent to align with the spec, that's also a consideration
hta: switching to in-order would require a compelling argument
jib: unless the transform has side effects (time-dependent), it shouldn't matter too much
… use cases would be helpful
… out-of-order seems a footgun - why should developers worry about that?
hta: if delay matters, in-order is a footgun
youenn: so we should use cases for both in-order and out-of-order
Issue #143 generateKeyFrame 🎞︎
Fippo: I wanted to suggest a 4th proposal - an empty return value, but allow the app to pass any subset of the rids to generate keyframes
… some encoders can generate keyframes from individual rids, others can't - it depends on the codecs
hta: the argument list API would thus be strictly more powerful without additional implementor burden
youenn: at TPAC our conclusion was one rid was good & simple enough; we didn't have use cases for 2 layers hitting the same frame
… an encoder-behavior dependent API isn't so helpful, but I agree it isn't a big burden to add either
hta: medium objection to single value, no strong objection to array - should we go with the array args?
RESOLUTION: pass an array arguments to generateKeyframes
fippo: I'll do the PR
Issue #158 / PR #140: add mimeType to metadata 🎞︎
HTA: figuring the meaning of a payload requires parsing the SDP to figure out what was negotiated
… the UA already knows which mime type is associated with which payload type
Fippo: another argument for it is that we don't specific how the data is structured
… being able to specify it as depending on the mime type would be good
youenn: thanks, this provides a good use case
… I think that's a pattern we already apply elsewhere
Fippo: in stats, indeed
Florent: isn't that available via getParameters? that exposes the list of payload types
HTA: but only if you have the PC
Fippo: that's harder in workers
RESOLUTION: Add mimeType to metadata
Issue #154: add rtp seqNum to inbound audio 🎞︎
Fippo: we have a custom decoder that relies on the rtp sequence number to detect loss in the audio
… relatively easy to add to incoming frames for audio
… more complicated for video, or for outgoing frames
HTA: for incoming audio, you have one packet resulting in one set of samples
youenn: coming back to in/out-of order, this would expose that
… if we're not doing in-order, this may create confusion
Fippo: in our use case, we have our custom JS jitter buffer; we don't reenqueue the frame into the pipeline
HTA: so that's also a use case for out-of-order: bring your own jitter buffer
Fippo: I can that written up as input to the other discussion
HTA: are we happy to expose this only for audio incoming frames, as a non required dictionary?
jib: I think it would still be interesting to understand better this one-ended use cases
HTA: ok, so let's wait for the use cases before proceeding then
Issue #131: Packetization API 🎞︎
HTA: any more comment on the packetization API beyond what was discussed at TPAC?
Youenn: we could start with things like MTU
HTA: in the frame API?
Fippo: MTU is mostly an issue for audio; I don't think we hit that threshold even with redundancy
… it becomes an issue with transform that changes size largely
Youenn: I don't think adding the MTU to the frame API would make sense - more at the context level, with changes signaled via events
… the frame is coming from the encoder, that's not where the MTU info lives
Media Capture Extensions 🎞︎
PR #77: Add MediaStreamTrack framesCaptured and framesEmitted 🎞︎
Henrik: `track.getSettings().frameRate` tells the configured, but not actual frame rate
… knowing the actual frame rate and the dropped frames would be useful
… some of that are exposed in stats, or in media playback metrics
… but the measurements are happening later in the pipeline - e.g. if the frame is dropped as soon as it is produced, it won't show up
… and we shouldn't force a webrtc PC to get track specific info
henrik: my proposal is to add a frame counter to track API, with a `getStats()` method
youenn: all APIs that are using an MST will allow you to get the number of frames that you're actually receiving
… Media capture transform gives you the count of frames, likewise for WebRTC & HTMLMediaElement
… what you want is focused between the sink & the source
… not sure I understand the diff between emitted and captured - that feels a bit specific to a specific pipeline
… in our model, it's not clear it would be easy to specific an interoperable way to distinguish captured from emitted
… so maybe focusing first on captured?
henrik: that makes sense; captured is the main gap in any case
jan-ivar: framesCaptured makes sense with a low-lighting camera use case (although we could revisit the constraint model for that)
… share Youenn's concerns for emitted, which feels implementation dependent
… I'm not sure about `getStats()` vs a constraint
Bernard: next step?
Henrik: I'm hearing support for framesCaptured in some form, and leave emitted for later
HTA: framesEmitted makes sense for consistency, but I see the argument that it may be redundant
… so let's start with framesCaptured as accepted
RESOLUTION: move forward with framesCaptured only for now
WebRTC & Simulcast 🎞︎
Issue #2732: Inconsistent rules for rid in RTCRtpEncodingParameters 🎞︎
jib: following up to our discussions started in TPAC about rid length
… limiting RID length to 16 characters would help with web compat
… an errata has been published on RFC8851 removing - and _ characters
… feedback on restricting the length would be hard as an erratum, but could be done in a -bis
hta: note that the empty string is outlawed by the BNF
dom: if we wait for -bis, are implementations going to be updated to match the allowed lengths?
florent: it should be possible to update chrome in that direction if we think if it's a good idea
hta: we don't know of any use case where 17 characters are necessary
youenn: we could limit to 16 characters with a note mentioning ongoing IETF discussion
jib: we could also have a separate decision on addTransceiver vs accepting incoming offers and answers
dom: I don't think it goes against the protocol to limit what the API accepts to generate rids (we should definitely accept any valid rid in O/A)
jib: but then you have an API that doesn't let you set values that you accept from a remote description
Issue #2764: What is the intended behavior of rollback of remote simulcast offer? 🎞︎
RESOLUTION: proceed with the proposed clarification
Issue #2737 / PR #2788: Modifications to [[SendEncodings]] from setParameters and sLD/sRD can be racy 🎞︎
hta: should that addition also be guarded by "if remote is true"?
jib: it would have to also have a "is an answer" gate - I can update the PR
henrik: if you restart and apply the steps again, wouldn't you implicitly rollback anything changed by the in-parallel operations?
… to do that correctly, you would have to wait until the SDP is applied
jib: this is run before we call the success callback
… we would wait until all setParameters are settled
… similar to if a remote description came right after
henrik: so this is done before the SDP process?
jib: right
Issue #2762: Simulcast: Implementations do not fail (and that seems good) 🎞︎
[Varun, Youenn depart]
RESOLUTION: close #2762 as is
WebRTC Extensions: Data Channels 🎞︎
Issue #114: RTCDataChannel transfer and maxMessageSize 🎞︎
florent: RTCDataChannels are transferable; maxMessageSize in RTCSctpTransport needs to be checked before sending data over a channel
… with a channel transferred to a worker, the maxMessageSize may be renegotiated on the main thread, which wouldn't be visible to the worker trying to send data
Florent: we could prevent changing the maxMessageSize during renegotiation - doesn't really happen in practice
… then that value could be kept in the transferred rtcdatachannel and keep the send algorithm as is
… the other aspect to consider is that the datachannel might have been transferred before the initial negotiation
… updating that value of maxMessageSize could be done as part of the "announcing a data channel as open" algorithm
dom: how confident are we that maxMessageSize can be frozen in renegotiation is web compatible?
florent: we would want to confirm that indeed
… sending too much data closes the data channel, so developers already need to pay attention
Bernard: the only time you would see this is in some weird maintenance scenarios - it should be very rare
florent: we can add some measurement in Chrome to see if that happens
dom: +1 to these solutions if they're web compatible
florent: so we can start with copying the value in opening, and measure web-compatibility of rejecting a renegotiated size
jib: would maxMessageSize end up being exposed on the data channel?
florent: we could do that, but that's not part of this proposal
… this wasn't useful in the context of running everything in the same context as peerconnection
… but with transferred channels, this makes more sense to consider
dom: it would be clunky not to expose it
Issue #115: Need to specify behavior of detached RTCDataChannel objects 🎞︎
florent: we need to document a [[Detached]] internal slot per the HTML spec for transferable platform objects
… we would keep [[isTransferable]] for the a datachannel that has already sent
[no objection]
jib: it remains unclear what happens to data channels when they're transfered in the main thread
florent: should transfered data channels be garbage collectable in the main thread? they're "closed" which make them collectable without a strong reference
… we could add a new state "detached" on top of opening, open, closed etc
Bernard: I prefer Proposal 2
jib: transferable objects are more like a clone, leaving an unoperative a clone
… so the broader question is how a [[Detached]] data channel should behave, how it should affects the existing algorithms
florent: because they're closed, this already impacts the methods close() and send()
florent: hearing some support to introducing a "detached" state, and a [[Detached]] internal slot
hta: what about garbage collection?
florent: let's discuss on github
Capture Handle 🎞︎
Elad: the proposal is to add some structure to capture handle
… fo crop targets (possibly with specific content hints)
jib: what about a messageport?
elad: still not structured, so leads to tight coupling
jib: I'm not sure we want to specific all the different things that application might need to agree
… per #11, I don't think we should re-invent postMessage
elad: a messageport informs the capturee they're being captured
… capture handle is a unidirectional message port
… being able to update the handle is useful given that the captured content is going to change
… a messageport can be useful in general, but for different use cases
jib: can we take a step back to understand the requirements we have?
… what API surface would be expose here?
elad: adding structure for a crop target in the capture handle instead of a simple string
… croptarget would have contenthints, and also add a messageport as a separate suggestion
dom: I think maybe a unidirectional messageport would work for what we want?
elad: several suggestions: move from string to object in capture handle - already needed for tightly coupled apps
… for loosely coupled apps, similar to what capture actions already allow, adding explicit support for croptargets / contenthints would go a long way to help
elad: what about the first suggestion - moving from a string to an object?
jib: would re-iterate #11 - let's not reinvent postMessage
elad: but this adds ability to decouple capturees/capturer
jib: but adding this to the browser API when it facts it's down to the app to use it or not
… that's odd
elad: it's similar to capture actions, not really more formalized in semantics
hta: are there establishing standardized protocols over messageport already?
dom: don't know off the top of my head, would have to check
hta: if we were to have to come up with that, this feels scary
dom: re going with an objects, would that be for serializable objects?
elad: yes
jib: the original purpose for handle was an identifier; now we're talking about passing objects, that changes the nature of the API
elad: a messageport doesn't address all the use cases - it's not structured
… I'm hearing support for the use cases, and not seeing an alternative proposal
jib: I remain a bit lost on the requirements we're solving with this API
… e.g. it could be a separate field instead of being part of the handle
… I'm not sure why should allow random web sites to specific crop targets
elad: slide 47 illustrates how this could be a purely user-driven process to avoid any user tricking
jib: but I'm not sold we need to allow this for random web sites
harald: what criteria would a web site eligible to this?
jib: with a messageport?
elad: but that makes it more likely to create situations where a web site might want to trick another provider?
jib: I still don't see a compelling case for making handle an object
elad: is the video provider / vc collaboration use case compelling?
jib: yes - we should figure a better way
elad: what way though?
hta: I'm hearing 2 proposals: make handle with some pre-defined fields for specific purposes (e.g. listing croptargets); and a messageport for tightly coupled apps
… these are 2 independent proposals that should be evaluated separately
jib: would allow any serializable object be safe to expose to the capturer? that seems problematic
elad: the security properties are similar (or even somewhat safer) than a messageport
Ben: with arbitrary objects, could that raise OOM concerns?
elad: 1. The captured page would be attacking itself first and foremost.
… 2. The captured page would be attacking an unknown capturer that likely doesn't even exist.
… 3. We can neuter the attack by ensuring the capture-handle is no-op on the capturer if the capturer does not read the handle. But that's for the future, if the attack comes up in the wild, which is unlikely.
ben: are there objects that could create risks for the receiver?
elad: not that I'm aware
dom: I think the chairs will have to propose steps to unblock this conversation
… maybe an explainer would help figure out all the considerations that need to be taken into account
hta: the chairs will do so