Games Community Group meeting

WebCodecs
15 April 2021

Table of contents

  1. Video
  2. Transcript

See also:

Video

Transcript

Chris Cunnigham (Google) - Introduction

Okay, so WebCodecs, thank you for having us. I'm actually not the only presenter, I'm going to share the stage with my co editors.

But the simple introduction slide is that this is a new low level decoding/encoding API. If you're familiar with, the analogues on other platforms would be media foundations on Windows or video codec on Android or A/V foundation on. Mac OS, ffmpeg, this is in that space as an API kind of modeled on those and exposing those API is actually to the web.

I'm going to hand it over to Bernard to talk a little bit about use cases or motivations.

Bernard Aboba (Microsoft) - Motivations

Thanks Chris to start off I'd like to use streaming as an illustration of an application that uses the browser for decoding in order to show how WebCodecs fits in. Streaming architectures have a few elements in common. Media is acquired by a server for cloud gaming, this is typically the screen capture for virtual machine running a game. Then the media is encoded containerized and sent to the browser.

The transport could be conventional streaming over HLS or DASH. Could be low latency streaming of a WebRTC data channel or streaming over a WebSockets or WebTransport.

The browser then receives the media, decodes and renders it, as well as sending mouse keyboard on game console and put back to the server. In addition to API's use for media transport these architectures also rely upon media source extensions, MSE, and for applications requiring low latency such as cloud gaming, low latency MSE is used.

Next slide.

So what architectural changes are required when WebCodecs is introduced.

WebCodecs is substituted for MSE, none of the other blocks need to change. WebCodecs supports container formats and works with any transport next slide.

So what are the benefits of WebCodecs?

As we mentioned WebCodecs is transport agnostic and works with any transport since low latency operation is inherent and fully specified, application developers don't have to familiar them, rise themselves with the browser specific differences introduced by low latency MSE support.

WebCodecs leverages the GPU and supports dedicated workers which MSE currently does not, which improves performance on lower and hardware and enables processing of raw video for effects or machine learning.

In scenarios where encoding is done on the browser, WebCodecs offers access to advanced video features such as scalable video coding. Simulcasts can also be supported by creating multiple encoders.

Since WebCodecs is a low level API, the application can control aspects such as recovery and concealment on the receiver side or rate adaptation on the server side.

In the interest of full disclosure, there are a few current limitations. WebCodecs support for AV1 and H.264 with temporal scalability is work in progress, and DRM is currently not supported.

Back to you, Chris.

Chris Cunningham (Google) - API shape

All right, thank you so much, so I just want to do three quick slides on the the API shape to make it real.

And so let's start off with the video side, you manage rendering the video yourself by appending frames to canvas.

Video frames are a canvas image source, so here, you can see I'm using drawImage.

You could alternatively use texImage2D, texImage3D, or you know all of the canvas image sources compatible APIs.

So here's the decoder being instantiated the output callback is that paint frame to canvas function magnifying the previous slide. This is a very simple configuration, there are a lot more knobs in the config dictionary, just for the sake of brevity here it's just the one.

And then I'm calling decode", so you call decode, you can decode as much as you want, queue up as much work, the API is designed from a fire and forget style, when the outfits are ready, they will asynchronously call the callback. And with that I'm gonna hand it over to Paul to talk about audio stuff.

Paul Adenot (Mozilla) - Audio features in WebCodecs

Over the years, it's been quite a lot of issues when it comes to audio asset decoding, especially with web games. In particular, there was no progress, so we don't know how long an asset is going to take to decode. You can't cancel when you started to decode. You have to do everything in one goal so the file needs to be. A complete file. If it's a soundtrack is three minutes long everything needs to be decoded ahead of time, that slows down the game start, for example.

And then you'd have a very big in memory PC and buffer resident memory for all your audio assets, so the soundtrack all the dialogues and all the sound effects. You'd use it a lot of memory, for no no good reason.

So instead WebCodecs is essentially the opposite of all. Being low-level, it does not make assumption about what you're going to be doing. And it supposed to let you do anything that you do in native. But it's going to use the codecs that are present already in the Web browser instead of having to compile, let's say, ffmpeg or libopus or whatever, and ship that as part of your WASM code.

So it's gonna integrate well with SharedArrayBuffer, ArrayBuffer, AudioBuffer if you want to use directly Web Audio API. And the goal is to allow other formats than float32 now. For example, int16 which essentially houses your memory footprint. But you can go over, and you can do other things.

Next slide please.

So I've heard two use cases that I know are used quite a lot when it comes to game engines. Regardless of whether it is on the web or native, console, PC, doesn't matter. So, for example, you want to do, progressive rendering of very long audio track, but you want sample accurate looping, so you wanted to look absolutely perfectly. But you don't want everything in memory at once. So what you do is, you take audio code progressively you stick your decoded audio samples and ring buffer, you play them out, and then you look. You look around and you decode as you go. Right, so you have a very limited amount of PC and data in memory at all time and you can post process in the worker for example. So let's say you decode and OPUS file at 48K, but you want to render at 41.1K, then you would do your sampling there. And that's what we do in native, so lock free in fact wait free ring buffer, that's using the audio workflow so it's going to be extremely low latency, as low as the platform can go. And this is using, so if these slides are released later, there are two links to an article that explains the baton and the library that there's already using prediction that implements stuff for you.

So this is the case where you want to render your soundtrack or your ambient sound or that kind of thing, with low latency. It is no more expensive than doing it high latency, you have low latency sound elsewhere in the game anyways.

Next slide please.

Then the other scenario or use case we can have is when you want to decode audio but instead of instead of copying and being weight free, you'd rather really, really minimize the amount of memory used, and you want to transfer buffers instead. So you can transfer, you can decode some audio to PCM via the methods that Chris presented, and then immediately postMessage the resulting buffer to the audio worklet for your rendering but this doesn't copy, just send a pointer to the other side, so it's going to be very useful as well to limit copies some some games have problems with. memory bus pressure and that can help.

But that's only two use cases that we think will be useful for games, on top of the advantages that I explained in the first place. That's it.

Chris Cunningham (Google) - MediaStreamTrackProcessor

One last thing I mention that it didn't know where to throw this in. It's important that we be able to get video frames and audio frames from user video, so WebCodecs is integrating with another specification that provides frames from user media tracks. This is the MediaStreamTrackProcessor and phase, you have a readable stream here, video and audio frames. And then you can use that to include user video.