Information

Evolved Video Encoding with WebCodecs
  • Past
  • Confirmed
  • Breakout Sessions

Meeting

Event details

Date:
Pacific Daylight Time
Status:
Confirmed
Location:
-1 Lower Level - Catalina 2
Participants:
Paul Adenot, Harald Alvestrand, Rijubrata Bhaumik, Florent Castelli, Jasper Hugo, Joone Hur, Randell Jesup, Kensaku KOMATSU, Akira Nagaoki, Suhas Nandakumar, Marina Sokolova, Erik Språng, Vlad Sviatetskyi, Kunihiko Toumura, Kota Yatagai, Eugene Zemtsov
Big meeting:
TPAC 2024 (Calendar)

WebCodecs provides a low-level API to do encoding and decoding of video with control over settings on a per-frame basis. As a relatively young API it currently lacks some more advanced features, such as temporal/spatial scalability that are important for real-time use cases like video conferencing.

This session is intended to discuss a number of potential next steps, to find which features are highest priority, and what benefits or problems we face with each of those.

Some of the topics for discussion:

Explicit reference frame control

By allowing the user to specify which reference buffers to reference and which to update on a per-frame basis, it is possible to implement a number of important reference structures and coding features including temporal/spatial/quality layers, long-term references, low-latency 2-pass rate control, etc.

In short, any of the scalability modes listed in Scalable Video Coding (SVC) Extension for WebRTC, any many more can be implemented with a small set of tools. If done right, this could even be done in a manner that is codec and implementation agnostic.

This way of modeling an encoder does also present some issues. The user needs to be able to determine how many reference buffers are available, how many can be referenced per frame and know which references are allowed or disallowed based on various circumstances. How do we expose such data in a way that is both user friendly, compatible with the current API, and avoids unnecessary finger printing surfaces?

There are also tradeoffs when it comes to integrating with existing encoder implementations, a small subset of which may not fit well into this model.

Spatial/Quality Scalability

Spatial scalability can be achieved by changing the encode call to take a sequence of encoding options, instead of a single option, per input frame. Each option would then represent a different layer and would include a desired encoded resolution. With reference frame scaling, a user may reference a buffer containing a different resolution.

Again, this comes with some challenges. Different codec types might have different bounds on the scaling factors, and even certain implementations have limitations in this regard - if it is supported at all. Some codecs allow only reference frame scaling within the same temporal unit, while other support any reference at any time. How do we handle encoders with special optimized mode such as "multi-res" or "S-mode aware" encoding?

Rate Control

When dealing with layered encoding, rate control becomes much more involved. The easiest way is to just support CQP, putting all of the rate control control with the user. If CBR is desired, the encoder needs to understand the bitrate target and expected frame rate for each spatio-temporal layer, this means it suddenly needs to be SVC aware even if the user is doing all of the reference frame control.

Auxiliary

There are many other knobs that could potentially be added. Speed/Quality control, segmentation/ROI-mapping, etc
What's on the wish-list of the community?

Other Sessions of Interest

Note that there will also be a first-step proposal discussed at the joint Media/WebRTC WG Meeting on the 26th.

Further, there is a proposed breakout session on RtpTransport, an API that allows users to send custom-encoded frames over the RTP channel of a PeerConnection and is intended to go hand-in-hand with WebCodecs.

Agenda

Chairs:
Erik Språng, Eugene Zemtsov

Description:
WebCodecs provides a low-level API to do encoding and decoding of video with control over settings on a per-frame basis. As a relatively young API it currently lacks some more advanced features, such as temporal/spatial scalability that are important for real-time use cases like video conferencing.

This session is intended to discuss a number of potential next steps, to find which features are highest priority, and what benefits or problems we face with each of those.

Some of the topics for discussion:

Explicit reference frame control

By allowing the user to specify which reference buffers to reference and which to update on a per-frame basis, it is possible to implement a number of important reference structures and coding features including temporal/spatial/quality layers, long-term references, low-latency 2-pass rate control, etc.

In short, any of the scalability modes listed in Scalable Video Coding (SVC) Extension for WebRTC, any many more can be implemented with a small set of tools. If done right, this could even be done in a manner that is codec and implementation agnostic.

This way of modeling an encoder does also present some issues. The user needs to be able to determine how many reference buffers are available, how many can be referenced per frame and know which references are allowed or disallowed based on various circumstances. How do we expose such data in a way that is both user friendly, compatible with the current API, and avoids unnecessary finger printing surfaces?

There are also tradeoffs when it comes to integrating with existing encoder implementations, a small subset of which may not fit well into this model.

Spatial/Quality Scalability

Spatial scalability can be achieved by changing the encode call to take a sequence of encoding options, instead of a single option, per input frame. Each option would then represent a different layer and would include a desired encoded resolution. With reference frame scaling, a user may reference a buffer containing a different resolution.

Again, this comes with some challenges. Different codec types might have different bounds on the scaling factors, and even certain implementations have limitations in this regard - if it is supported at all. Some codecs allow only reference frame scaling within the same temporal unit, while other support any reference at any time. How do we handle encoders with special optimized mode such as "multi-res" or "S-mode aware" encoding?

Rate Control

When dealing with layered encoding, rate control becomes much more involved. The easiest way is to just support CQP, putting all of the rate control control with the user. If CBR is desired, the encoder needs to understand the bitrate target and expected frame rate for each spatio-temporal layer, this means it suddenly needs to be SVC aware even if the user is doing all of the reference frame control.

Auxiliary

There are many other knobs that could potentially be added. Speed/Quality control, segmentation/ROI-mapping, etc
What's on the wish-list of the community?

Other Sessions of Interest

Note that there will also be a first-step proposal discussed at the joint Media/WebRTC WG Meeting on the 26th.

Further, there is a proposed breakout session on RtpTransport, an API that allows users to send custom-encoded frames over the RTP channel of a PeerConnection and is intended to go hand-in-hand with WebCodecs.

Goal(s):
Find the highest priority features in the community, and what aspects needs more consideration

Agenda:
The agenda is to discuss the proposal to add reference frame control to WebCodecs, and gather feedback and comments on the path forward. The session consist of a few parts:

  • General goals
  • Our initial proposal, a minimum viable useful implementation of reference control
  • How to query what the encoders are capable of
  • Spatial Scalability (SVC, Simulcast)
  • Rate Control
  • Miscellaneous

The slides can be here.

See also WebCodecs spec and github issue for the reference control.

Materials:

Track(s):

  • Real-time Web

Export options

Personal Links

Please log in to export this event with all the information you have access to.

Public Links

The following links do not contain any sensitive information and can be shared publicly.

Feedback

Report feedback and issues on GitHub.