Meeting: Media Timed Events / Unbounded VTT Cues
Chair: Chris_Needham
Agenda: https://www.w3.org/events/meetings/257432ab-e123-4986-bcbd-a006e9ddbf2c
Gary: At the last meeting, we concluded that the way things are now, there's no benefit to having unbounded cues in the text format
topic: Unbounded cues in WebVTT
Gary: The reason why is that seeking to the middle of a stream, where as far as we can tell you'd have to copy each unbounded cue per VTT segment. Otherwise you'd have to load all the text tracks since the beginning of time, which isn't reasonable
Rob: I've read the minutes from last time, and discussion about updating cue attributes
... There's a requirements document
https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/unbounded-cues.md
-> https://github.com/w3c/media-and-entertainment/pull/77 pr with updates to unbounded cues
Rob: From an unbounded cues point of view, requirement 1a, is what unbounded cues do
... There was some discussion about changing other cue attributes, but I don't think we had any use cases for that. Has that changed?
Gary: I think the main thing with other attributes, it's probably fine if narrow, but the worry is about preventing extending in the future to allow cues to be updated
... If we narrow the use case to updating unbounded cues to be bounded, so only the end time is set and nothing else changes, that could be restrictive enough to not be an issue
Rob: I'd generally agree. The scope should be limited as to what can be changed. There aren't use cases for changing other attributes
... We shouldn't rule it out
Gary: There are some for live captioning, but it's too early
Rob: Can be done with unbounded cues, in a different way where the update is done as a new cue, linked at a higher level. It's implementation-specific how to do that
... This comes back to matching by start time and content, which would allow content to be updated
... I'd argue that changing the start time or content should be a new cue, rather than changing an existing cue
... from the point of view of the VTT file format
... There isn't a mechanism to change existing cues. I don't think the syntax supports updating cues currently
Nigel: The discussion last time didn't identify a reason to do it
.. We don't have a delivery mechanism in WebVTT. For video we have segments and we can bound the VTT cue time to the segment interval
... If necessary repeat the cues. Then you don't run into acquisition issues doing that
... Having some kind of external model, if you need to update state of a metadata entity, you can do that in segmented delivery in the same way
... Updating from chapter 1 to chapter 2 without needing to hunt back for old cues
Rob: I agree do don't want to have to look back previously. I didn't understand what you meant by metadata, are you meaning a (time, value) pair
Nigel: The entity you're modelling has a lifecycle, which is application specific
Rob: Are you treating changes as instantaneous events at a point in time, or as a value with duration?
Nigel: The information you send can be bounded to an interval, but what it's about can be changing in time
... The cue has a duration but the entity for which it provides duration may not have the same duration. It's a model maintained in the client application
Rob: For a single segment that starts at chapter 1 then chapter 2. Is there an instantaneous event that says "chapter 2 starts now"?
Nigel: In that application, I'd probably build it by saying there's always an active cue that describes the current chapter
Rob: WebVMT supports that, values can be set in an interval and unset at a later time
... Unbounded cues alllows that to be solved
Chris: Would it help to write this down?
... A worked example could be helpful
... There's an open PR to the use case document: https://github.com/w3c/media-and-entertainment/pull/77/files
Gary: This describes the sports score example, and live captioning
Chris: The requirements in the document might not be useful
Gary: You should represent unbounded cues as multiple bounded cues, always, and the unbounded-ness isn't in the cues
... You might overrun by a second when the cue becomes bounded
... WebVTT gets delivered in segments at a time. You can make cues be the duration of the segment. By the time you're ready to deliver the segment you can set the end time rather than end of segment time
Rob: I agree, the issue is that the unbounded cue has unknown end time, so you're setting a bounded cue with known end time
... so the problem comes if you set a bounded cue across the current segment
Gary: Yes, you're not sending ahead of time
Nigel: For live, the content is segmented
Gary: The framgneted MP4 is basically the same, with small chunks
Rob: That differs from the measurement observation use case. If you take a temperature measurement now, but you don't know when the next one will be
... When the next one arrives, you can update it to the next value
... In a live case, just use the last known value. But when you re-play it, you can interpolate from the last to the next value
... Makes it simple for implementations, just take a sequence of time values
Gary: The way i'd represent a time measurement in WebVTT is each measurement for a preceding period of time, looking back instead of looking forward
Rob: In the case I described, there's no need to look back
Gary: Some people use cues with same start and end time to represent an event, maybe that's the answer
Rob: That's the way WebVMT deals with discontinuities in the data, so if there's a break in the data, where there's no value, make an instantaneous cue to say there's no data
Gary: If you seek to the middle of the video, how do you know the state? Do you need to parse all the history?
Rob: Yes
Gary: That's a requirement we're trying to avoid
... With segmented WebVTT it'll parse just the current segment rather than any previous segments
Rob: Could solve that with unbounded cues, if you're assembling segments retrospectively, if there's an active unbounded cue, it's reasonable to assume it's still active in the next segment
Gary: Yes, and we concluded that you'd have to copy cues from segment to segment
... In the discussion, having that extra signal of unboundedness wasn't adding much as you have to copy the cues between segments
Rob: You'll either know the cue ends within the segment, or it ends at the same time as the segment
Gary: [correction to previous speaker]
Rob: So it seems unbounded cues can be handled using bounded cues in segmented WebVTT
Chris: Does the client coalesce the cues into a contiguous long cue?
Gary: It doesn't. From a previous FOMS discussion, we talked about writing a note to describe avoiding flicker in rendering
... That may be something we want to do as part of this work
Rob: That could be on a per-use case basis
Gary: It's not specific to the format, it's about player implementations, so for a Note instead of in the spec
Rob: For timed metadata you wouldn't want it to repeat
Nigel: Good point. The ability to say that a cue is the same as a previous one is orthogonal but could be worth looking at
... You need a contract between producer and consumer of the files
... All of the specs at the moment only define well-formedness in a single file, not across multiple files
Chris: Rob, what was your understanding of live distribution?
Rob: For WebVMT, if you have recordings from a sensor on a resource limited device, send readings as they're taken. Unbounded cues help, because you don't know when the next reading will be
... So being able to supersede a value with a new value, recorded such that you can interpolate in playback
... If you record an unbounded cue at time A with a value, you can supersede it with another cue at time B, using an identity to link those two things together
Chris: Is there an example we can look at?
Rob: It's an open item to add that. It's been discussed but not added to the document
Chris: Also with WebVMT, use WebSockets for live delivery?
Rob: Yes
Chris: Let's follow up on that another all
Gary: So we can confirm to David that no syntax changes are to be made, and for the unbounded case you copy cues between segments, and they're bounded by segments
WebVMT live interpolation examples: https://github.com/webvmt/community-group/issues/2#issuecomment-708529659
Chris: So we have:
... 1. A proposed model for delivering unbounded cues in segmented VTT
... 2. A client processing model to describe how cues are coalesced (write as a Note)
... 3. How to identify cues across segment boundaries?
... 4. (possibly) live delivery over WebSockets or other non segmented media delivery
Chris: Would MPEG also need to have a solution for identifiers across segments?
Gary: I don't think so, at this stage
Chris: What next for this group?
Gary: Consider whether to adopt some WebVMT syntax changes into WebVTT. I'm unsure, but it's interesting topic
Rob: I'll need to look into live streams
Gary: Is the idea you merge documents client-side from multiple streams
Rob: Yes, there's a video stream and a VMT stream. The way it currently works, a drone embeds metadata into the MPEG file, so there's a postprocessing step to export that into WebVMT
... That's the main case I've been looking at so far. The live case would also be interesting
Gary: Longer term, useful to think about live captioning and potential for updating cues, e.g., a stenographer who wants to correct text already sent
Rob: Or voice recognition. You can mis-hear things and then go back and correct, with additional later context
Gary: 608 captions has some ability to do that
Topic: Next meeting
Chris: TPAC is coming up. Could meet on 11th?
Kaz: I can't make the 11th, but you can go ahead
Chris: I'll send an invite
[adjourned]