IRC log of me on 2021-09-27

Timestamps are in UTC.

14:55:01 [RRSAgent]
RRSAgent has joined #me
14:55:01 [RRSAgent]
logging to https://www.w3.org/2021/09/27-me-irc
14:55:06 [Zakim]
Zakim has joined #me
15:01:00 [ChrisLorenzo]
ChrisLorenzo has joined #me
15:01:25 [kaz]
present+ Kaz_Ashimura, Chris_Lorenzo, Chris_Needham, Rob_Smith
15:02:13 [cpn]
Meeting: Media Timed Events / Unbounded VTT Cues
15:02:21 [cpn]
Chair: Chris_Needham
15:02:24 [cpn]
scribenick: cpn
15:03:02 [nigel]
nigel has joined #me
15:03:06 [cpn]
present+ Gary_Katsevman
15:03:58 [RobSmith]
RobSmith has joined #me
15:04:07 [cpn]
Agenda: https://www.w3.org/events/meetings/257432ab-e123-4986-bcbd-a006e9ddbf2c
15:05:19 [calvaris]
calvaris has joined #me
15:05:29 [cpn]
present: Xabier_Rodriguez_Calvar
15:07:19 [kaz]
rrsagent, make log public
15:07:25 [kaz]
rrsagent, draft minutes
15:07:25 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/09/27-me-minutes.html kaz
15:07:55 [cpn]
Gary: At the last meeting, we concluded that the way things are now, there's no benefit to having unbounded cues in the text format
15:08:00 [kaz]
topic: Unbounded cues in WebVTT
15:08:57 [nigel]
Present+ Nigel_Megitt
15:09:12 [kaz]
s/topic: Unbounded cues in WebVTT//
15:09:12 [cpn]
Gary: The reason why is that seeking to the middle of a stream, where as far as we can tell you'd have to copy each unbounded cue per VTT segment. Otherwise you'd have to load all the text tracks since the beginning of time, which isn't reasonable
15:09:23 [kaz]
i/At the/topic: Unbounded cues in WebVTT/
15:09:29 [kaz]
rrsagent, draft minutes
15:09:29 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/09/27-me-minutes.html kaz
15:10:22 [cpn]
Rob: I've read the minutes from last time, and discussion about updating cue attributes
15:10:35 [cpn]
... There's a requirements document
15:10:54 [cpn]
https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/unbounded-cues.md
15:11:30 [gkatsev]
-> https://github.com/w3c/media-and-entertainment/pull/77 pr with updates to unbounded cues
15:11:41 [cpn]
Rob: From an unbounded cues point of view, requirement 1a, is what unbounded cues do
15:12:01 [cpn]
... There was some discussion about changing other cue attributes, but I don't think we had any use cases for that. Has that changed?
15:13:08 [cpn]
Gary: I think the main thing with other attributes, it's probably fine if narrow, but the worry is about preventing extending in the future to allow cues to be updated
15:13:43 [cpn]
... If we narrow the use case to updating unbounded cues to be bounded, so only the end time is set and nothing else changes, that could be restrictive enough to not be an issue
15:14:14 [cpn]
Rob: I'd generally agree. The scope should be limited as to what can be changed. There aren't use cases for changing other attributes
15:14:29 [cpn]
... We shouldn't rule it out
15:14:41 [cpn]
Gary: There are some for live captioning, but it's too early
15:15:11 [cpn]
Rob: Can be done with unbounded cues, in a different way where the update is done as a new cue, linked at a higher level. It's implementation-specific how to do that
15:15:32 [cpn]
... This comes back to matching by start time and content, which would allow content to be updated
15:15:48 [cpn]
... I'd argue that changing the start time or content should be a new cue, rather than changing an existing cue
15:15:58 [cpn]
... from the point of view of the VTT file format
15:16:19 [cpn]
... There isn't a mechanism to change existing cues. I don't think the syntax supports updating cues currently
15:16:32 [cpn]
Nigel: The discussion last time didn't identify a reason to do it
15:16:58 [cpn]
.. We don't have a delivery mechanism in WebVTT. For video we have segments and we can bound the VTT cue time to the segment interval
15:17:13 [cpn]
... If necessary repeat the cues. Then you don't run into acquisition issues doing that
15:17:39 [cpn]
... Having some kind of external model, if you need to update state of a metadata entity, you can do that in segmented delivery in the same way
15:18:09 [cpn]
... Updating from chapter 1 to chapter 2 without needing to hunt back for old cues
15:18:52 [cpn]
Rob: I agree do don't want to have to look back previously. I didn't understand what you meant by metadata, are you meaning a (time, value) pair
15:19:09 [cpn]
Nigel: The entity you're modelling has a lifecycle, which is application specific
15:19:31 [cpn]
Rob: Are you treating changes as instantaneous events at a point in time, or as a value with duration?
15:20:08 [cpn]
Nigel: The information you send can be bounded to an interval, but what it's about can be changing in time
15:20:50 [cpn]
... The cue has a duration but the entity for which it provides duration may not have the same duration. It's a model maintained in the client application
15:21:28 [cpn]
Rob: For a single segment that starts at chapter 1 then chapter 2. Is there an instantaneous event that says "chapter 2 starts now"?
15:22:00 [cpn]
Nigel: In that application, I'd probably build it by saying there's always an active cue that describes the current chapter
15:22:39 [cpn]
Rob: WebVMT supports that, values can be set in an interval and unset at a later time
15:22:52 [cpn]
... Unbounded cues alllows that to be solved
15:24:39 [cpn]
Chris: Would it help to write this down?
15:25:38 [cpn]
... A worked example could be helpful
15:26:44 [cpn]
... There's an open PR to the use case document: https://github.com/w3c/media-and-entertainment/pull/77/files
15:26:58 [cpn]
Gary: This describes the sports score example, and live captioning
15:30:25 [cpn]
Chris: The requirements in the document might not be useful
15:31:24 [cpn]
Gary: You should represent unbounded cues as multiple bounded cues, always, and the unbounded-ness isn't in the cues
15:32:08 [cpn]
... You might overrun by a second when the cue becomes bounded
15:32:49 [cpn]
... WebVTT gets delivered in segments at a time. You can make cues be the duration of the segment. By the time you're ready to deliver the segment you can set the end time rather than end of segment time
15:33:49 [cpn]
Rob: I agree, the issue is that the unbounded cue has unknown end time, so you're setting a bounded cue with known end time
15:34:02 [cpn]
... so the problem comes if you set a bounded cue across the current segment
15:34:12 [cpn]
Gary: Yes, you're not sending ahead of time
15:34:43 [cpn]
Nigel: For live, the content is segmented
15:35:04 [cpn]
Gary: The framgneted MP4 is basically the same, with small chunks
15:35:38 [cpn]
Rob: That differs from the measurement observation use case. If you take a temperature measurement now, but you don't know when the next one will be
15:35:56 [cpn]
... When the next one arrives, you can update it to the next value
15:36:27 [cpn]
... In a live case, just use the last known value. But when you re-play it, you can interpolate from the last to the next value
15:36:59 [cpn]
... Makes it simple for implementations, just take a sequence of time values
15:37:38 [cpn]
Gary: The way i'd represent a time measurement in WebVTT is each measurement for a preceding period of time, looking back instead of looking forward
15:37:57 [cpn]
Rob: In the case I described, there's no need to look back
15:38:21 [cpn]
Gary: Some people use cues with same start and end time to represent an event, maybe that's the answer
15:39:08 [cpn]
Rob: That's the way WebVMT deals with discontinuities in the data, so if there's a break in the data, where there's no value, make an instantaneous cue to say there's no data
15:39:40 [cpn]
Gary: If you seek to the middle of the video, how do you know the state? Do you need to parse all the history?
15:39:43 [cpn]
Rob: Yes
15:39:51 [cpn]
Gary: That's a requirement we're trying to avoid
15:40:28 [cpn]
... With segmented WebVTT it'll parse just the current segment rather than any previous segments
15:41:10 [cpn]
Rob: Could solve that with unbounded cues, if you're assembling segments retrospectively, if there's an active unbounded cue, it's reasonable to assume it's still active in the next segment
15:41:30 [cpn]
Gary: Yes, and we concluded that you'd have to copy cues from segment to segment
15:41:58 [cpn]
... In the discussion, having that extra signal of unboundedness wasn't adding much as you have to copy the cues between segments
15:43:09 [cpn]
Rob: You'll either know the cue ends within the segment, or it ends at the same time as the segment
15:43:21 [cpn]
s/Rob/Gary/
15:43:45 [cpn]
Rob: So it seems unbounded cues can be handled using bounded cues in segmented WebVTT
15:44:42 [cpn]
Chris: Does the client coalesce the cues into a contiguous long cue?
15:45:11 [cpn]
Gary: It doesn't. From a previous FOMS discussion, we talked about writing a note to describe avoiding flicker in rendering
15:45:19 [cpn]
... That may be something we want to do as part of this work
15:45:33 [cpn]
Rob: That could be on a per-use case basis
15:46:00 [cpn]
Gary: It's not specific to the format, it's about player implementations, so for a Note instead of in the spec
15:46:31 [cpn]
Rob: For timed metadata you wouldn't want it to repeat
15:47:00 [cpn]
Nigel: Good point. The ability to say that a cue is the same as a previous one is orthogonal but could be worth looking at
15:47:28 [cpn]
... You need a contract between producer and consumer of the files
15:47:54 [cpn]
... All of the specs at the moment only define well-formedness in a single file, not across multiple files
15:49:14 [cpn]
Chris: Rob, what was your understanding of live distribution?
15:49:54 [cpn]
Rob: For WebVMT, if you have recordings from a sensor on a resource limited device, send readings as they're taken. Unbounded cues help, because you don't know when the next reading will be
15:50:25 [cpn]
... So being able to supersede a value with a new value, recorded such that you can interpolate in playback
15:53:19 [cpn]
... If you record an unbounded cue at time A with a value, you can supersede it with another cue at time B, using an identity to link those two things together
15:53:57 [cpn]
Chris: Is there an example we can look at?
15:54:12 [cpn]
Rob: It's an open item to add that. It's been discussed but not added to the document
15:54:54 [gkatsev]
q+ to ask about response to David and webvmt/webvtt alignment?
15:55:17 [cpn]
Chris: Also with WebVMT, use WebSockets for live delivery?
15:55:33 [cpn]
Rob: Yes
15:57:56 [cpn]
Chris: Let's follow up on that another all
15:58:31 [cpn]
Gary: So we can confirm to David that no syntax changes are to be made, and for the unbounded case you copy cues between segments, and they're bounded by segments
15:59:01 [RobSmith]
WebVMT live interpolation examples: https://github.com/webvmt/community-group/issues/2#issuecomment-708529659
15:59:54 [cpn]
Chris: So we have:
16:00:14 [cpn]
... 1. A proposed model for delivering unbounded cues in segmented VTT
16:00:38 [gkatsev]
ack gkatsev
16:00:38 [Zakim]
gkatsev, you wanted to ask about response to David and webvmt/webvtt alignment?
16:00:47 [cpn]
... 2. A client processing model to describe how cues are coalesced (write as a Note)
16:01:07 [cpn]
... 3. How to identify cues across segment boundaries?
16:01:23 [cpn]
... 4. (possibly) live delivery over WebSockets or other non segmented media delivery
16:02:35 [cpn]
Chris: Would MPEG also need to have a solution for identifiers across segments?
16:04:42 [cpn]
Gary: I don't think so, at this stage
16:04:50 [cpn]
Chris: What next for this group?
16:05:15 [cpn]
Gary: Consider whether to adopt some WebVMT syntax changes into WebVTT. I'm unsure, but it's interesting topic
16:06:19 [cpn]
Rob: I'll need to look into live streams
16:06:32 [cpn]
Gary: Is the idea you merge documents client-side from multiple streams
16:07:11 [cpn]
Rob: Yes, there's a video stream and a VMT stream. The way it currently works, a drone embeds metadata into the MPEG file, so there's a postprocessing step to export that into WebVMT
16:07:29 [cpn]
... That's the main case I've been looking at so far. The live case would also be interesting
16:08:04 [cpn]
Gary: Longer term, useful to think about live captioning and potential for updating cues, e.g., a stenographer who wants to correct text already sent
16:08:46 [cpn]
Rob: Or voice recognition. You can mis-hear things and then go back and correct, with additional later context
16:09:08 [cpn]
Gary: 608 captions has some ability to do that
16:09:33 [cpn]
Topic: Next meeting
16:10:46 [cpn]
Chris: TPAC is coming up. Could meet on 11th?
16:11:11 [cpn]
Kaz: I can't make the 11th, but you can go ahead
16:13:11 [cpn]
Chris: I'll send an invite
16:13:11 [cpn]
[adjourned]
16:13:23 [cpn]
rrsagent, draft minutes
16:13:23 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/09/27-me-minutes.html cpn
16:13:28 [cpn]
rrsagent, make log public
16:14:21 [cpn]
present+ Gary_Katsevman, Chris_Needham, Chris_Lorenzo,
16:14:25 [cpn]
rrsagent, draft minutes
16:14:25 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/09/27-me-minutes.html cpn
16:14:50 [cpn]
present+ Rob_Smith
16:14:52 [cpn]
rrsagent, draft minutes
16:14:52 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/09/27-me-minutes.html cpn
16:17:50 [kaz]
rrsagent, bye
16:17:50 [RRSAgent]
I see no action items