Media and Entertainment IG

Meeting minutes

Agenda

cpn: 2 topics
… updats on TextTrackCue and DataCue proposals
… Mile High Video Conf report

Alicia: Text tracks in MSE

TextTrackCue

Chris: Thank you Rob for the work you've been doing

Rob: Quick background, I started trying to write a DataCue polyfill
… I discovered that DataCue is implemented in Webkit and Safari

<nigel> Proposal to expose HTML TextTrackCue constructor WICG/datacue#35

Rob: But in doing that I found that I needed to go back to TextTrackCue given what's happened with DataCue, where it's only implemented in Safari so dropped from the HTML

cpn: This was in HTML but then dropped.
… Since then we ran a task force and produced a report that was focused on these emsg boxes
… that can be carried in CMAF, and how to surface those to applications.
… Then all of that work stalled, and it didn't feel there was enough interest to push it forwards.
… Perhaps it was premature - the DASH-IF had a subgroup looked at event processing, which this tied in with.
… At the time they were defining a processing model for these "media in-band events".
… There ended up being a feeling that it wasn't quite ready for standardisation so we left it.
… As a result it meant that the use case that Rob is interested in, independent of emsg boxes etc.,
… more to do with associating metadata with video and location information and visualising on maps,
… a worthwhile use case, coupled with in-band processing meant that we weren't able to progress
… the thing that was useful, independently of the not-quite-ready thing.
… In the discussion that followed Rob came and said he'd like to move forward with something
… he could use to associate metadata with media times to build applications.
… What's the simplest thing to do to solve that problem? Can we focus on that?
… I'm thankful to Rob for helping to move this forwards.

Rob: My particular interest as Chris hinted is for out-of-band metadata.
… I've been leading the development of WebVMT, a variant of WebVTT which has sort of branched into
… its own now, published in Sep 2023 as a Group Note, by the Spatial Data on the Web Group,
… which provides a v1 for implementers to work to, and I can continue with the development.
… Given the history with Datacue and VTTCue, both inherit from TextTrackCue.
… I started looking at TextTrackCue as the root of this and whether I could adapt something on top of that,
… given that DataCue had not been adopted. Could I do a minimal change?
… I identified that if the constructor for TextTrackCue were available that would solve my problem.
… I am puzzled about why it isn't, given that it's supposed to be an abstract base class.
… That was my direction, so I wrote a TextTrackCue polyfill, which I can show you.
… [shares screen] 5s long video, on Firefox. Video element, with a "count" and "colour" indicator below.
… Those indicators flicker with different values, all driven by DataCue.
… The log shows what's going on in the background.

Rob: The log shows the cue enter and exit times, and the cue data content
… When I move around the timeline, you can see them coming and going
… I found that DataCue is redundant if you take this approach
… I made another demo. If you extend TextTrackCue to create a derived cue, it does the same thing as the 'type' attribute
… This has CountdownCue and ColourCue, which are derived from TextTrackCue. It works across browsers
… What I'm looking to do is put it forward as a proposal to extend TextTrackCue to operate in this way
… Alicia has been very helpful, providing technical input (thank you!)
… Four points have come up in the discussion
… The abstract class, the constructor attributes, there's backward compatibility, and ...
… The proposal is the smallest change possible, as far as I can tell

Rob: It has broad applicability, sensors, dashcam, accelerometer, vehicle collision monitoring
… by parsing the accelerometer data and looking for spikes

Nigel: I want to highlight a discussion in the GitHub about having an abstract base class

<kaz> WICG datacue issue 35 - Proposal to expose HTML TextTrackCue constructor

cpn: We're limited in how much of the solution we can discuss in an IG.
… I understand that you've implemented a constructor that prevents instantiation of TextTrackCue
… and you're preserving existing behaviours while introducing the extensibility point.

alicia: There is already a constructor implemented in WebKit.
… This has already happened with IDL and everything.
… I'm not sure about talking about abstract classes and such.
… In the HTML spec everything is interfaces anyway, and some of them happen to have constructors.
… Classes are only one way to think about it, but it's not in the spec as far as I know.

Rob: Francois has confirmed that in the thread.

<kaz> Francois' response on Issue 35

Rob: I appreciate Francois's input into this. I think what he has explained is not inconsistent with this approach.
… I accept that there may not be another example of abstract classes in HTML (prepared to be corrected!)
… but I think it is possible to do, because I have. My demo does exactly that.
… There's an inheritance test.

Nigel: Can you share the code for that demo?

Rob: Happy to.

<kaz> Breakouts Day 2025 on GitHub

cpn: W3C has a breakouts day in a few weeks, and this could be a good topic for that
… people like Marcos and Eric from Apple for example.
… I'd like to propose that we bring this to one of those breakout sessions.
… In order to do that we need to have more of an agreed approach among ourselves to start with.
… Plus how this relates to some of the other TextTrackCue extension proposals that have been put forward
… to show that this is compatible, e.g. with the HTMLCue idea that Apple put forward.
… Targeting one of those breakout sessions on the 26th may require us to have a more agreed understanding.
… We could organise a separate focused session to talk about this under WICG participation rather than
… the IG.
… So that we're doing it under the community licence.
… Try to fit that in in the next week or so with an aim for having something presentable in the
… wider breakouts.

Rob: I have the demos working and they are compliant with the proposal I put forward.

Nigel: Another option, we could just take the discussion to the breakout
… Having some external viewpoints might help resolve things

Chris: Suggest not using the C++ terminology so much, and focus on what WebIDL and web platform capabilities

<kaz> Schedule for breakouts proposals

Chris: We could hold a meeting to prepare, if needed

<kaz> [ 12 March: Deadline for submitting initial list of proposals. ]

Rob: I can share the code

Mile High Video

Piers: This was a couple of weeks ago. I gave a talk about an enhanced approach to respond to low latency, and minimising stalling,
… by analysing the performance of the download over time, before it ends
… then we can achieve improved QoE using this approach. Intra-segment information could be provided by the network
… fits with some of the edge computing and network quality work
… This code will go into dash.js in the coming months

<piers> https://www.svta.org/2025/02/04/exploring-the-edge-in-the-streaming-video-workflow/

Piers: In terms of other work, SVTA are looking at edge functionality with respect to video streaming
… Initial focus is around edge caching, the open caching initiative. Glenn Deen is organising
… Potentially things could fit with this interest group.
… Other topics that came up, more use of GPU for video encoding, also once video data is in the GPU it could also be used for other purposes, such as AI analysis, or quality metrics on the fly
… It's a new concept that might be enabled, could feed into work on quality monitoring
… There were other talks about doing quality metrics using predictive VMAF implementation, allowing calculation on the fly
… Other talks on processing such as super resolution, to enhance the quality of video. It could be done at the edge, to offset the transmission costs
… Talks from Akamai and Meta on Media over QUIC. Also content-aware transport metrics over QUIC. Adjust the server side config based on the network situation, provided to the client by CMSD to allow adaptation at the client
… A number of presentations on server-guided ad insertion. This is at the final stages of standardisation at MPEG
… Related functionality may benefit from some of the edge functionality
… Ad decision servers and related in-network functionality
… Any thoughts or questions, let me know

Chris: Interesting, also relates to Web & Networks IG, looking at cloud/edge. Also research projects looking at edge composition and compute offload

<kaz> newly created CG on Cloud-Edge-Client Coordination

Kaz: There's a CG working on cloud-edge-client coordination, and the combination of GPU approach and QUIC based handling would be useful

Piers: Also optimising for QUIC delivery, network parameters for congestion control and tuning to the connection

Chris: TPAC breakout on low latency event signalling

Piers: There are different ways to signal having an updated manifest. emsg is one way. Need to deal with the "thundering heard" problem when things happen all at the same point
… Relevant particularly in live sessions

Kaz: Also relates to IoT and smart city use cases

Text Tracks in MSE

Alicia: Webkit has landed support for them in Safari, and in Webkit gstreamer this week. I found that the MSE spec doesn't seem to handle text tracks very well
… I found there are no WPTs for TextTracks in MSE at all
… The spec has some concerning points. It doesn't have the concept of gaps. Workarounds feel like hacks
… In some text formats, e.g., WebVTT you can have overlapping cues in time

<kaz> Media Source Extensions - 5.1 Attributes

Alicia: Not sure MSE spec handles this properly
… Overlapping cues only works reliably if one overlaps with another, but if you have a second one, splicing might happen where it shouldn't
… Also the MSE spec doesn't contemplate having an MP4 or WebM file with just a single text track and no video or audio
… The SourceBuffer buffered attribute only considers audio and video buffers. It works around having a gap concept for text tracks. What if the SourceBuffer only has text?
… Could be reasonable to support, but not possible because of how the spec works
… We were talking about gaps in MSE for another purpose. I've tested Firefox and Chrome, neither supports inband text tracks

Chris: Discussed at last TPAC

Nigel: There was agreement something needs working on

<alicia> original issue in WebKit to add MSE text tracks https://bugs.webkit.org/show_bug.cgi?id=125161

Chris: I could gather info on the previous discussions

<RobSmith> https://www.w3.org/2025/03/breakouts-day-2025/

Chris: Could bring to Media WG, or propose for breakouts day

Nigel: The issue about not being able to define gaps could be a VTT thing, TTML has a model where there's always an active document for a period of time

Alicia: VTT cues are ordered, and there's an MP4 "no cue" sample tag
… I think they also do something to represent overlapping cues
… A sample may have several cues, not necessarily a 1:1: mapping

[adjourned]

– DRAFT –
Media and Entertainment IG

04 March 2025

Attendees

Meeting minutes

Agenda

TextTrackCue

Mile High Video

Text Tracks in MSE