Media Element accuracy and synchronization with the DOM

Presenter: Sacha Guddoy (Grabyo)
Duration: 7 minutes
Slides: PDF

All talks

Slides & video

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek to 0%, 10%… 90%: 0-9
Slide 1 of 10

Hello there, my name is Sacha Guddoy. I'm the Lead Front-end Engineer at Grabyo. Today, I'm going to be talking about accuracy and synchronization in HTML media elements, our use cases, and some of the challenges that we've faced.

Slide 2 of 10

Firstly, let me tell you a bit about Grabyo.

Grabyo is a SaaS platform for broadcast media production. It's aimed at commercial broadcasters.

We use web and cloud-based technologies to bring remote distributed capabilities to broadcast workflows.

Some of our offerings include live video production, clipping from live streams, non-linear editing and publishing creative media to various endpoints, for example, social media or livestream.

Slide 3 of 10

Firstly, I'm going to talk about accuracy in HTML media APIs.

A common technique in media production workflows is to create a lower resolution, low bit rate draft version of your media.

Slide 4 of 10

And when you're working on that media, when you're making edits and such, that's what you'll be looking at.

Then when you've finished making your edits, you will export it to some more powerful machine, or maybe your own machine, will use a lot of resources to go and render the final high quality version from the original source media.

A common pattern that we use is for users to select their in and out points in a video in the web browser, and then we'll send that off to another service which could be in the cloud, and then we'll use that to process the final version of the media from the high quality source media.

Slide 5 of 10

One of the issues that we faced with this kind of workflow is that timestamps that you get from media elements in the web are just not accurate enough. They won't necessarily represent the same frame of the video when another application loads up that same timestamp.

So what that means is that when you've exported your video, you play it back and the start and the end can be one frame off, which is not very good.

In professional workflows, it's really important to be frame accurate. And it can be very noticeable in a lot of situations.

The most common way of getting the time that the user is at, in the video, is using the currentTime property of the HTMLVideoElement interface.

The problem with this is that it's not very precise. It's measured in seconds and it only gives you two decimals of precision. So you can only be accurate to 100th of a second.

In a 60 FPS video, a single frame only lasts for about 16 milliseconds. So your timestamp, which is only accurate to a hundredth of a second, isn't going to be accurate enough to convey that exact frame.

Different video rendering tools might have different interpretations of exactly what that time is.

Slide 6 of 10

The next point is about synchronization with the DOM.

Another common use case for us is to manipulate and monitor media in real time using DOM interfaces, for example, controlling state of the playback, controlling playback position, monitoring the audio levels, analyzing or manipulating video or audio, displaying overlays, and synchronizing different pieces of media together.

Slide 7 of 10

In these scenarios, it's really important for the DOM to reflect the current state of media as accurately as possible.

Typically, these elements in the DOM would be updated based on the media.

There's two ways that you can do that. You can either use event listeners or you can use requestAnimationFrame. Already there is an inherent level of de-synchronization because the DOM and the media element are not running in the same thread. The media element is playing in its own thread, but the updates to the DOM, they rely on the main UI thread, which means that any user interaction or anything else running in that DOM thread can easily block your updates, which means that your DOM representation of what's happening in the media could potentially be lagging behind.

Furthermore, this can just be expensive. If you are updating your DOM 60 times per second, because your audio level, for example, is changing 60 times a second, then that is taxing on the UI thread and degrades the user experience.

Slide 8 of 10

The next point is on codecs.

All major browsers ship with a range of video codecs. Expansion of codec support and lower level access to the decoding pipeline could enable new features.

For example, you could use intra-frame coding to improve seeking performance. You could extract thumbnails faster than real time. You could decode and store specific time ranges for fast referencing, for example, in a nonlinear editing scenario.

WebCodecs is the elephant in the room here that can help solve a lot of the issues by giving us lower level access to that decoding pipeline, which is something I'm very excited about. Obviously the browser support at the moment, what's this, October 2021, is not the greatest.

Slide 9 of 10

The next part I want to talk about is multi-threading.

So it's often necessary to perform media operations on the same thread as the UI handlers, so for example, rendering a video onto a canvas. This is not particularly desirable, because these tasks can be really resource intensive and run very frequently, blocking the UI thread, degrading performance and the user experience.

If you're painting a video to a canvas, that's probably going to be running on every requestAnimationFrame.

Obviously, the conventional solutions to this last problem is to use multi-threading, i.e., web workers and you do your intensive thread in that web worker. So there is an off-screen canvas interface, and that would allow WebGL to render to a canvas within the scope of a worker.

But there isn't an equivalent API for video. Video cannot be used in the worker. It cannot be accessed.

This is problematic because then your video to canvas rendering pipeline has to be in the DOM thread. It can't live in a worker by itself.

Slide 10 of 10

So that's all I have for you on this topic. I hope that was enlightening in terms of some of the use cases and issues that we were facing. And I'm looking forward to hearing your questions and feedback. Thanks very much. Take care.

All talks

Workshop sponsor

Adobe

Interested in sponsoring the workshop?
Please check the sponsorship package.