Meeting minutes
Slideset: https://
WebRTC Encoded Transform
Harald: encoded transform is fine for crypto, but not fine for other manipulation use cases
Issue #106: Add use cases that require one-ended encoded streams
Harald: several use cases where you want to connect a stream to somewhere else after processing
… not sure what a proper API would look like, so thought we should go back to requirements
youenn: looking at the use cases - they probably deserve different solutions
… e.g. webtransport probably shouldn't use peerconnection
… alternative encoders/decoders - sounds like a different API altogether
… metadata may be done prior to PC
Harald: encoded transform is a stream source connected to stream sink
… a one-ended stream has only one of these
… we have an ecosystem of media & encoders that people have gotten used to
… if we can plug into this ecosystem, it seems a better solution than creating novel solutions for this
… it might be that we decide that it's not the same ecosystem
… in which case we might kick the ball over to media
youenn: starting from use cases and then deriving requirements as done for WebRTC-NV would be useful to do here
… it's easier to derive APIs from requirements
harald: the SFU-in-browser is a technique to achieve the scalable video conferencing use case we discussed yesterday
youenn: describing use cases in more details and then derive requirements from there
jib: +1 on better description of use cases
Bernard: the NV use cases has no API that satisfies the requirements
… WebTransport doesn't support P2P; the only path is RTP
JIB: so the idea would be to expose an RTP transport to JS
Bernard: or make the datachannel a low-latency media transport, but there doesn't seem to be much stomach for that
Harald: we have a discussion scheduled on whether to consider a packet interface in addition to a frame interface
… We'll detail the use cases more to figure out if an extension of media stream is relevant or if we need something completely different
Issue #90: Pluggable codecs
Harald: we've been lying to SDP about what we're transporting
<martinthomson> what is ECE?
Harald: to stop lying, we need a mechanism to allow the app to tell the SDP negotiation that they're doing something else than the obvious thing
<fippo> probably EME (there was a comment + question in the slides)
youenn: this may lead us to a conclusion than encoded transform was a mistake
<martinthomson> ...I tend to think that this is already broken
youenn: the other possibility, you could state during negotiation that you're going to use app-specific transforms
… letting intermediaries know about this
… we tried to push this to IETF AVTCore, without a lot of success
Harald: maybe MMUSIC instead?
Cullen: it's worth trying again - slow move has been the pattern in the past 2 years, not a signal
Bernard: the reason why SFRAMe has not moved in AVTCore is because nobody showed up, drafts were not submitted, and the area director is considering shutting down the SFrame WG
Youenn: I went to several meetings, tried understand the submitted issues, but struggled to find solutions that would satisfy
… the work has been stalled for lack of consensus
herre: can we move forrward without the dependency on IETF, by allowing the JS to describe its transform to the other party?
Youenn: encoded transform has a section on SFrame transform, which wasn't pointing to an IETF draft until recently
Harald: the scripttransform is fully under the app control, but it doesn't have a way to tell the surrounding system it changed the format
… we could add an API before the IETF work emerges
Martin: SFrame is very close to death, I expect some more work to be done though
… once you give script access to the payload, anything is possible
… this breaks the assumptions under which the encoder and packetization operate
… I don't think letting the script write the SDP, we need a model that makes sense, not sure what it would be
Youenn: we had a model with the traditional video pipeline including a break into it
… we could open it more and exposing more of the states of the pipeline
… we could expose e.g. bitrate if useful, based on use cases
… for pluggable codecs, you need to set a break before webrtc encded transform & the track, and be able to set a special packetization
martin: you'd want to deal with raw media (the track), then do the encoding and the packetization
youenn: not sure we need all the breaks
Issue #31 & Issue #50: Congestion Control
Martin: none of this is necessary if you're looking at just mutating packets
Harald: not if the size or number of packets can change
Martin: some of it can be modeled as a network-specific MTU for the SFrame transform
Harald: the downstream would need to expose its MTU, and the SFrame transform would share its MTU upstream
Martin: but beyond, this is looking at the entire replacement of the chain
Youenn: the AR/VR use case is where data can be much bigger when you attach metadata
… one possible implementation is to do this with ScriptTransform to stuff metadata in the stream, as a hack
… not sure if we should accept this as a correct use of the API
… in such a use case, expanding the frame size means the bitrate is no longer correct
… the UA could instruct the encoder to adapt to the new frame size
… or we could expose new APIs
<peter> Isn't the targetBitrate already in webrtc-stats?
martin: AR/VR is probably a wrong usage of ScriptTransform
… it would better be handled as a different type of MediaSTreamTrack
… this points toward being able to build a synthetic media flow
martinthomson: it would seem better to look at it this way rather than through a piecemeal approach
… the AR/VR points toward synthetic media flows
Bernard: people have tried using the datachannel for AR/VR
… didn't work for A/V sync or congestion control
… they want an RTP transform
… the A/V stream helps with sync
… if you put it in a different flow, how do you expose it in SDP
… it's the only way available in the WebRTC model today
fluffy: on WebEx hologram, we do exactly what Martin describe
… we send a lightfield in a stream that looks like a video stream
… same for hand gestures etc
… all of this sent over RTP
… it's low bit-rate data, doesn't need to adapt like audio
… lightfield instead needs bandwidth adaptation
… this could apply to haptics, medical device data being injected in a media stream
TimP: part of our problem has been mapping all of this to SDP, for things created on the fly
… describing things accurately in SDP is a lost cause as we'll keep inventing new things
<martinthomson> Steely_Glint_: SDP is extensible....
TimP: we should be describing the way we're lying (e.g. we're going to add 10% to the bandwidth; it won't be H264 on the way through)
… without trying to describe it completely
acl peter
s/acl peter
Peter: I had proposed an RTP data mechanism a few years ago, which sounds similar
… we could have an SDP field to say this is arbitrary bytes
… or construct something without SDP
Martin: I was suggesting new type of RTP flows with new "codecs"
… browsers can't keep up with all the ways that SDP would be used; we should instead give a way for apps to describe their "codecs" via a browser API
Issue #99 & Issue #141: WebCodecs & WebRTC
youenn: Both WebRTC and WebCodecs expose similar states
… but there are differences e.g. in mutability
<jesup> I strongly agree with Martin's comments; these data-like should be "codecs", which allows for much more flexibility, specification, and interoperability
youenn: should we try to reconcile? should we reuse webcodecs as much as possible?
<Steely_Glint_> But we do need (in sframe) to allocate a suitable codec (say h264) - the 'generic' pass through drops that into
youenn: I propose we stick to what we shipped
DanSAnders: from the WebCodecs side, that sounds like a good approach
… we don't have a generic metadata capability
harald: so we should document how you transform from one to the other
… it's fairly easy to go from webrtc to web codecs
… the reverse is not possible at the moment
<Bernard> Youenn: we can create constructors to build RTCEncodedVideoFrame from EncodedVideoChunk
herre: if we move to the one-ended model, this creates trouble in terms of ownership and lifecycle
youenn: we deal with that problem in Media Capture transform through enqueuing via cloning (which is effectively a transfer)
<peter> +1 to constructors for RTCEncodedVideoFrame/RTCEncodedAudioFrame
Bernard: re constructors to get from one type to another, allowing conversion between the two
jib: your proposal doesn't address the mutability of metadata
youenn: the particular metadata I'm referring to aren't mutable
<Bernard> Harald: this model does not support the use cases we have been discussing.
youenn: can we close the issue or should wait until the architecture get designed?
Harald: I hear support for the two-ways transform
youenn: let's file an issue specifically about that and close these 2 issues
Issue #70: WebCodecs & MediaStream transform
DanSanders: proposal 1 is straightforward
… we don't have a metadata API for lack of a good enough technical proposal
… the mutation/cloning aspect is the challenge
… e.g. cropping may generate no longer accurate data about face detection
… it depends on what cloning does
peter: are we talking about how the metadata would go over the network?
youenn: here we're focusing on mediastreamtrack as a series of frames
… we don't have a good solution for moving it over the network as we discussed in the previous item
… the WebRTC encoder could be a pass-through for the metadata, but it's still up in the air - we welcome contributions
chris: in webcodecs, there is some request to expose H265 SCI metadata for user defined data
<miseydl> some meta information might be provided by the containerization of the video codec itself (NAL info etc) would we populate that generic meta array with those infos?.
chris: that would presumably be closed expose with videoframe
… it would be useful to look at the use cases together
Dan: this is kind of low priority because of low multiplatform support
… if we have a metadata proposal that works, it could be used here
youenn: we had someone sharing such an approach - although it's codec specific
chris: we'll also continue discussing this at the joint meeting with Media
harald: metadata has some specific elements: timestamp, ssrc, dependency descriptors
… the last one obviously produced by the encoder
… mutable metadata - if constructing a new frame is very cheap, we don't need mutability
DanSanders: it's quite cheap, just the GC cost
Harald: we'll continue the discussion at the joint meeting & on github
Issue #143: generateKeyFrame
<Ben_Wagner> WebCodecs spec requires reference counting: https://
Peter: what about returning multiple timestamps?
youenn: that's indeed another possibility
<martinthomson> does it even need to return something?
youenn: but then the promise will resolve at the time of the last available keyframe
martinthomson: does it need to return anything, since you're going to get the keyframes as they come out?
youenn: it's a convenience to web developers to return a promise (which also helps with error reporting)
martinthomson: at the time the promise resolve, it resolves after the keyframe is available, which isn't the time you want
<miseydl> one could also use the timestamp to associate/balance keyframerequests, which is useful for various reasons.
youenn: it's resolved when the frame is enqueued, before the readablestream
martinthomson: this seems suboptimal if what you want is the key frame
… if frames are enqueued ahead of the keyframe
youenn: in practice, the expectation that you'll be polling the stream otherwise your app is broken
martinthomson: with machines that jank for 100s of ms
youenn: the promise can also be used to return an error, which I don't think can be validated asynchronously
martinthomson: that argues for a promise indeed; not clear that the timestamp return value
fluffy: what you want to know is that the keyframe has been encoded; the timestamp is irrelevant
youenn: so a promise at the timing we said, but not timestamp
Peter: would it be reasonable to have an event when a keyframe is produced?
youenn: you do that by reading the stream and detecting K frames
Peter: I like proposal 3 as a way to cover the situations you want
TimP: the way I recalled it was the purpose of the timestamp was to help with encryption through sframe for key change
martinthomson: this can be done by waiting to a keyframe in the stream before doing the key change
… I also don't think it's strictly necessary to resolve the promise upon enqueuing
<jesup> +1 for proposal 3. Simple. Agree with mt
martinthomson: it could be done when the input has been validated
RESOLUTION: go with proposal 3 without returning a timestamp
Conditional Focus
Elad: screen sharing can happen in situations of high stress for the end user
… anything that distracts the user in that moment is unhelpful
… the API we're discussing is to help the app set the focus on the right surface
elad: still open discussion on default behavior when there is a controller
youenn: re task, we want to allow for the current task - there is no infrastructure for that, but implementations should be able to do that
… a bigger issue: Chrome and Firefox have a model where the screenshare picker always happen within the chrome
… it's very different in Safari - picking a window focuses the window
… so the behavior would be to focus back on the browser window
… being explicit on what is getting the focus would be better, so setFocusBehavior would be an improvement
… I don't think we should define a default behavior since we're already see different UX across browsers
… I would also think it's only meaningful for tabs - for window, they could determine it as the time of gDM call
elad: re different UX models, we could fallback to make that a hint
… re window vs tab, it may still be useful as a hint to adapt the picker
youenn: unlikely we would do something as complex
jan-ivar: I'm actually supportive of option 2
… regarding applicability to window - for screen recording apps, the current behavior hasn't proved helpful
youenn: but this could be done via a preset preference in the gDM call
jan-ivar: we could, although maybe a bit superfluous
jib: setFocusBehavior is a little more complicated, more of a constraint pattern with UA dependent behavior
… but don't feel very strongly
… but yeah, turning off focus by adding a controller doesn't sound great
RESOLUTION: setFocusBehavior as a hint with unspecified default applicable to tabs & windows
youenn: deciding to not focus is a security issue - it increases the possibility Web pages to select a wrong surface
… since this lowers security, there should be guidelines for security considerations
Elad: should this be a separate doc?
youenn: let's keep it in screen-share
jib: +1 given that we're adding a new parameter to getDisplayMedia
<fluffy> scribe fluffy
<fluffy> zakim. scribe fluffy
<fluffy> Proposing cropTargets in a capture handle
Screen-sharing Next Steps
Slideset: https://
mark: setting the crop target on the capture handle - is that serializable / transferable ?
youenn: serializable
mark: then it could be transferred over the messageport
elad: but there is no standard format for that
youenn: re crop target serializability, +1
… I'm not sure yet about having cropTargets in capture handle
… it may require more data, e.g. different cropping for different sinks
… having app specific protocol might be a better way to start before standardizing a particular one
… re MessagePort, the security issues can be solved
… re content hint, I'm not convinced
… the capturer doesn't have to provide the hint, the UA can do it itself
elad: so 3 comments:
… - cropTargets may need more context (although my main use case is for a single cropTarget)
youenn: this could be dealt on a per-origin protocol agreement
elad: but that doesn't work with non-pre-arranged relationship
jan-ivar: this MessagePort would be a first in terms of going cross-storage (not just cross-origin) - definitely needs security review
… this could still be OK given how tied to user action and the existing huge communicaiton path via the video sharing
… In the past, we've tried to piecemeal things by not having a MessagePort
… part of the feedback I've been getting is maybe to just have a MessagePort, as that would be simpler and help remove some of the earlier mechanisms we had to invent
… thank you for suggesting cropTargets to allow non-tightly-coupled catpuree-capturer
… I'm not sure if it's necessary if we're moving to a MessagePort
<youenn> @jib, window.opener can postMessage probably.
elad: I don't think a MessagePort could replace the capture handle, since it only works for cooperative capturee/capturer
… also the messageport alerts the capturee of ongoing capturer, with possible concerns of censorship
… I think we need to address them separately
hta: thanks for the clarification on MessagePort being orthogonal to CropTarget
… MessagePort is two-ways were capture handle is one-way, this may have a security impact
… I think these 2 proposals are worth pursuing (as a contributor)
… not convinced yet about content hint
… should this linked to a crop target instead?
elad: would make sense
TimP: I like all of this, and do like the multiple crop targets and notes
… the MessagePort shouldn't replace the rest of this, it's more complicated for many developers
… I like the 2 layers approach
fluffy: I find the security issues with MessagePort concerning without more details
… re trusting or not web sites for content hint - the capturer could determine it
elad: content hint helps with setting the encoder correctly
jib: I don't think there is new information to change our current decision, nor have I had enough time to consider this
Encoded transform
Issue #131 Packetization API
hta: would this packetization & depacketization?
youenn: we would probably need both, good point
Peter: we could add custom FEC to the list as a valid use case
… being able to send your own custom RTP header would be nice
… although that would be possible to put in the payload if you had control over it
richard: this points toward an API that transforms the packets à la insertable stream
… SPacket is simpler for encryption
Bernard: we need to be able to packetize and depacketize if we use it for RED or FEC
… you need to be able to insert packets that you recover
HTA: I don't think we can extended the encodedvideoframe for this, it's the wrong level
… we need an rtcencodedpacket object probably
… any impression on whether that's something we should do?
… do we have enough energy to pursue this?
Bernard: a bunch of use cases would benefit from this
Peter: I'm energetic on it
richard: +1
… esp if we focus on a transformation API
HTA: next steps would be writing up an explainer with use cases, and a proposed API shape
<rlb> happy to help, if youenn is willing to drive :)
Action items & next steps
HTA: we had seen some serious architecutre discussions on encoded media - I'll take the action item to push that forward
… Elad is on the hook for capture handle
… and we have 3 signed up volunteers for packetization
Bernard: we had good discussion on use cases we want to enable
JIB: we also closed almost of the simulcast issues
Elad: I'm looking into a proposal for an element capture API to generate a mediastreamtrack without occluded content - it has security issues that will need to look into
… this will be discussed at a breakout session tomorrow at 3pm PT
HTA: we also have a joint meeting with Media WG on Thursday - we'll discuss metadata for video frames there
[adjourned]