Browser Hosted Video Editing

Presenter: James Pearce (Grass Valley)
Duration: 10 minutes

All talks

Slides & video

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek to 0%, 10%… 90%: 0-9

This is our web-based editor, and for those that aren't familiar with editing, this follows the fairly standard approach that editing applications have, which is a normal three window layout. At the top left, we have a source viewer, which is used to load in source clips, and cut them up, and add them to the timeline. At the bottom. We have a timeline view, which just shows us our various tracks, and segments in those tracks, and in the top right, we have a sequence player, which lets me scrub through, and play through the edit that I'm building, that timeline that I'm building in the view below. On the left, the far left, we have just a view of all our assets in the system, so that lets us find a source, and load it into the source viewer, or just drag it into the timeline.

This editor, it's pretty much fully featured, in the sense that we're gonna have arbitrary numbers of tracks, arbitrary numbers of segments. We can have effects on those segments. We can have transitions between those segments. You know, I could add a blur to a particular segment, and that will give me a blurred view of that, and I've got some controls that I can use to tweak how blurry that is.

We have a whole bunch of effects that we've implemented, but, obviously, that list will grow as we build on user requirements.

In terms of transitions, as well, we have a variety, based on feedback from users, so I could add a simple circle wipe, if I play through that. And again, controls to adjust what that transition does.

In terms of some of the richer stuff that I can add to the timeline, that's a brief summary.

In terms of playback as well, we have, obviously, pretty high performance scrubbing here, so I can scrub around on the timeline and see the cuts, and the transitions, and all the effects being rendered into that sequence viewer. I can play through, I can play through at different speeds, so I can... (video audio distorts) Playing at one and a half times there, and we're getting audio, as well. (video audio speeds up) As I play faster and faster, at some point the audio stops, because it becomes unusable. And that works in reverse, as well. (video audio plays in reverse) I can review a section of the material, just by going backwards and forwards.

I can move through by frame, as well. And if you can hear, you get chirps, to tell me... (video audio plays back in chirps) where I am in the audio. If I'm looking for a particular point in the presenter's speech, I'm looking for final p", or a final "s", I can hear that by just frame-stepping through.

This is just a demonstration of quite good performance we're getting, from the media playback, both audio and video.

To talk about the technologies that are underpinning that, and the APIs we're using, I guess, the primary one that is probably of most interest to this meeting, is WebCodecs, and we're using WebCodecs to decode both H264 and AAC.

Previous to WebCodecs, we were using WebAssembly, so we built our own decoders, we compiled those with WebAssembly, and we were using those in a Web Worker to decode the material, buffer that, and then have that available for this rapid, random access that we need in the player.

Because of the way we built our playback engine, we were able to, pretty quickly, swap out the WebAssembly decoders with the WebCodec ones. We get the same kind of the code hasn't changed very much, but their performance has improved significantly, and also, power consumption, and associated elements of performance that we see improvements in.

That's been really good. I mentioned previously, Web Workers, as well. For the most part, we're trying to perform our entire decode and render pipeline in a Worker.

And for the most part, we can do that, where OffscreenCanvas is available, we can do the entire end-to-end video decode and video rendering in a Worker.

Audio is a little bit more problematic, because the Web Audio API, most of that API is tied to the main UI thread, which has caused us some problems, historically. We've had to jump through a few hoops to try and buffer as much as possible, so that playback isn't effected by, if I start scrolling through the list over here, and cause a significant load on the UI thread. We want to avoid having kind of pauses in the video, or pauses in the playback, as the audio can't be decoded in time.

Web Audio is one of the areas that have caused us problems. We've got a solution that works for us, but I think, in the future, we'd like to see a solution that, perhaps, pushes the Web Audio API on to a background Worker, if possible.

We're also using WebGL for the compositing, for the transitions, for the effects, anything that really involves that rendering the video to the screen. And the nice thing about WebGL, as well, because it's a standard, the shader language is standard, we can share that shader code with our render engine.

And that means that when the final timeline gets rendered out to a high res form, the render engine is able to use those same shaders to generate the same result that we're seeing in this in this low quality, this proxy, browse-quality media, that we're rendering to the screen, here.

I think that covers the main primary APIs that we're using. To summarize, we're using WebCodecs, we fall back to WebAssembly, still, where WebCodecs aren't available, and we're still using WebAssembly to perform parsing of the MP4 files that we need to, to de-multiplex the elementary streams.

We're using Web Audio API, that's the one that we'd like to see move to Web Workers, and we're using WebGL, and then various other long-standing APIs that tie the whole thing together.

I think that pretty much covers what I wanted to talk about. I don't know, James, if you had any questions, or you wanted me to go back and cover anything else in more detail?

Yes, thanks, James. One thing I'm interested in, is how you manage memory buffers, and manage their lifetimes? You get frames coming out of the codec, what happens with them?

The buffers we keep tend to be centered around the current position, so we have a cursor, as you can see, here, on the screen. What we try and do is predict, given what the user is doing, what we want to buffer, and how long we want to buffer for. If the user's playing forwards at 1x, then we tend to kind of weight the buffering on the forwards direction. We'll primarily buffer frames, ahead of the current cursor position, and we'll more aggressively discard buffers that have been played and are behind the current cursor position.

If we're playing backwards, we'll flip that round, so we're trying to buffer ahead, but we're always buffering a few, at least a few frames, either side of that cursor, and in some cases a second or two around that cursor, because we don't know, we can't predict that the user's not going to stop playing and suddenly reverse, so we do need to be able to change direction pretty quickly.

And that gives us the ability to do this, where we're jogging up and down, and playing backwards and forwards. (video audio distorts) You know, if I'm reviewing a frame, I can skip backwards and forwards.

We can do that, because we are maintaining a buffer around that current cursor position. We can't, obviously, buffer huge amounts, we're constrained by what is both achievable in the browser and what is reasonable. If I was to just seek to another position on the timeline, that invalidates everything we've buffered, and we have to go and fetch again, but we'll build up a new set of buffered frames again, based on that cursor position.

Does that answer the question?

Yes, I think it does, and I think we should stop there.

Okay.

All talks

Workshop sponsor

Adobe

Interested in sponsoring the workshop?
Please check the sponsorship package.