<scribe> scribenick: cpn
Kaz: I've been busy elsewhere, so will work on it this week
Chris: Thank you!
<tidoust> scribe: tidoust
Chris: There are changes that we
could make, but it's fine to leave it as it is. Section about
synchronized rendering could feature the videoFrameCallback
API, but given that we ran a call for consensus, we should just
publish
... This is coming up in the bullet chat topic, so we can
consider that seperately from this particular IG Note
document.
... In this group, we can focus much more on the DataCue API
itself, and leave synchronized rendering aspects to the wider
IG to follow up on.
Chris: The proposal here is to
add some wording in the Time marches on algorithm to say that
there is an expectation that cues will be triggered ideally
within 20ms of their position on the media timeline.
... We discussed a couple of months ago on the Media WG where
this should go to MDN or the spec itself. I believe that it
would be useful to capture it in the HTML spec itself since
that the mother of all docs.
... I have an open action to turn the wording in this issue
into a pull request.
... My understanding of implementations is that it would in
effect reflect the current state of implementations, given the
work in Chromium to improve accuracy of cue events.
... It might be worthwhile reviewing the issue that's open and
the proposed wording that we put in.
Eric: That sounds fine to me, and I don't think there was any disagreement within the Media WG, so I think you can just move forward and finish it up.
Chris: I have been discussing
offline with John Simmons and members of the DASH-IF group.
We're hoping that, by TPAC time this year, we want to have as
much impementation support in place that we can actually
develop the API specification.
... We really need to engage much more with implementers to
make that happen. Plan that we discussed is first of all make
sure that what we're describing aligns with what the DASH-IF is
doing.
... I believe that is the case, but we need to run a review to
ensure that it is true.
... Once we have that, we need to engage with media companies
to make sure that we have captured all requirements.
... We really need to be showing that media companies want the
API, as it may not be a priority for some implementers.
... So: finish the explainer, reach out to media companies, and
in parallel invite people from Apple, Google, Microsoft,
Mozilla... to make the case about the API.
... I think that's what preventing me from turning the
explainer into a spec directly, I'd like to make sure that
everyone is on board first.
... There is no firm dates set for meetings with DASH-IF, but
we'll make sure to advertize them so that you can join if
interested.
Eric: That sounds fine. I
agree that it may be a challenge to drum up interest from other
browser vendors, but that is what it is.
... Obviously, you need to be prepared for possible
disagreement for particulars of the API, but that's just always
true...
<cpn> WICG Issue 21
Chris: This is feedback from someone from Microsoft
... This is one of the areas of disagreement that we come back to since we started this work. Discussed at TPAC last year.
... Eric, I believe you argued to expose parsed data to the
developer. I agree with that.
... On the other hand, with emsg box, applications may want to
add additional parsers to parse binary messages in the media,
and the question became: as new message types are invented and
introduced, how do we now that an implementation supports
parsing and presenting in a parsed form a particular message
type?
... Some implementations may expose the message in its raw form
while others may expose it as a parsed message.
... We had some back and forth with the contributor from
Microsoft, and ended up with a proposal that allows both
options, and the user agent can choose which to use: either
expose the parsed data (linked to some spec that describes the
structure per cue type), or if it doesn't support the parsing
of a particular cue type, then it could still expose the
message as an ArrayBuffer field.
... One of the implications is that, I think, web applications
would always have to ship a parsing library to handle the
second case.
... Unless we can get to a situation where this is core set of
cue types that are supported across all implementations, I
don't know how we can avoid that scenario.
... I'm trying to get to the heart of one of these potential
points of disagreement.
... I've modified the interface definition of DataCue slightly
to make the data and value properties nullable.
... This would provide a migration path for HbbTV from unparsed
data to parsed data.
Eric: Given that the proposed interface already allows value to be an array buffer, why would we want to have an extra field that is an array buffer?
Chris: I was thinking of an "either...or", not both.
Eric: In that case, why
not use only "value"?
... That is exactly what I do in Webkit now.
... I don't know how to parse emsg, so I don't and I just put
it as an array buffer in the "value" field
Chris: I think it makes sense to not have redundant fields. From an application perspective, you need a way to detect in which case you are.
Eric: which you'll have
to do in any case, since type is "any".
... The comment that "value" is always null in Webkit is true.
I didn't remove it because I didn't know whether there was any
existing content that would assume that the property would be
there.
... I think, moving forward, that we should remove it.
Chris: I think the only hiccup is
that HbbTV uses this field.
... But with the next issue, we're already introducing breaking
changes anyway for HbbTV.
Eric: That is also easy to polyfill.
Chris: True.
... I'm happy to update the explainer and propose removal.
Eric: That makes sense to me.
<cpn> https://github.com/WICG/datacue/issues/20
Chris: Single model for cue type.
HbbTV uses the event metadata dispatch type field to identify
what kind of event each track is carrying.
... I think it's inconvenient from the application developer
perspective. With multiple tracks, the application has to
create multiple TextTracks.
... What we're proposing is to consolidate all messages onto a
single metadata TextTrack, and then the type information is
carried in each individual cue.
... My understanding is that this matches the Webkit model.
Eric: Yes.
Chris: There is a requirement
that we captured from the DASH-IF where they wanted to make
receipt of particular cue types an opt-in from the application
point of view.
... In DASH, the manifest describes which types of events the
player should expect to see, and the application subscribes to
specific types (ID3 messages, manifest updates, etc.)
... With the model that we're proposing as it stands, the user
agent would expose all of the events that it supports to the
application and the application would be responsible for
filtering events it is interested in.
... I need to get feedback from DASH-IF on whether this is a
critical requirement, or whether they're happy to have it at
the application level.
Eric: In an HLS stream
which can contain any number of types of metadata, how would an
application know which types are in the stream so that it can
subscribe to those it is interested in following?
... If you don't have a manifest that describes what is in the
stream, how do you handle the situation?
Chris: That's right and in the general case, we don't have a manifest.
Eric: That's right. I would argue that it is not difficult to setup an event listener and filter on the cue type. That does not create a lot of overhead. Think about mouse events for instance, which fire far more frequently than the events envisioned here.
Chris: I tend to agree, but I'm
not an implementor of this on TV devices, I do not have the
context, I'll take this back to DASH-IF.
... If the actual cue type is always carried in the cue itself,
does the texttrack type make any sense?
Eric: I agree that it is
not at all useful (and disagreed to its inclusion in the first
place but lost that fight).
... I think it makes sense to propose to remove it.
Chris: OK, I'll add an issue to track that.
<cpn> WHATWG Issue 5297
Chris: To recap, this is the idea
that cues can be unbounded. They have a known start time, but
we don't necessarily know when they are scheduled to end, and
it may be that it is when they end that we can tell.
... At the moment, we don't have a direct way to express this
situation. Proposal is to add an unbounded value "Infinity".
This also aligns to the media.duration which can be infinite
too.
Eric: In the live HLS
case for instance, duration is used this way.
... I don't think there is any issue here. Silvia was
disagreeing but realized this alignment and last comment
suggests she's fine with the update.
Chris: How to make progress?
Eric: I'd have somebody write a pull request
Chris: And get implementer's feedback?
Eric: I think it is going to be easier for people to share an opinion when there is a concrete proposal at hand.
Chris: OK, maybe Rob or I can
draft something then.
... My recollection of the feedback from the Media WG was that
there was some concern of allowing cue times beyond the
duration of a stream.
Eric: This already happens in Webkit. If it's a cue of a live stream, then its end time is set to the infinite duration of the stream. I doubt Jer would object to this.
Nigel: Allowing it to be infinity makes a lot of sense. Do we need an algorithm for when time changes from infinity to a finite number?
Rob: I think this is a separate issue and that we should address it separately.
Eric: I agree.
Chris: Having reviewed time marches on, I believe this is actually covered.
Nigel: The meaning of infinity when it's with respect with some media. It may mean "never called" or "called when the media ends".
Rob: To be consistent with the current definition, infinity should mean the end of media.
Nigel: I think it's important that we specify this for interop reason.
Eric: Another angle that may have already been specificed, are you proposing that a cue may have an infinite endTime in a finite file as well?
Rob: Yes.
... WebVMT provides a single example of this.
... Time A, you are at location A, that won't change in the
future.
... If you imagine a capturing scenario, you create cues with
unbounded end times.
... But when you stop capturing, you now have a bounded stream,
requiring an end time means that all your cues are invalid.
Eric: I don't quite follow. If you're recording and you open that file, the media stream has a finite duration.
Rob: Time A, cue runs to infinity. That's valid during capturing. But then when you stop, the infinity value is no longer valid.
Eric: My issue is that it
is logically a problem.
... For example, it is perfectly valid to have a file with
audio and video tracks that have different amount of media in
them.
... Different durations for the tracks for instance.
... But the duration of the file is defined with respect to
those tracks. You have to pick a duration.
... I'll have to think about it.
Rob: I don't follow you. What
would be the problem with having infinite cue endTime?
... Hmm, I see, the end of the media for the video and audio
would be different.
Eric: Right.
... To be clear, if Webkit gets a cue in a file with an
infinite duration, it sets endtime to infinite. If it is
finite, it sets endtime to the duration of the stream.
Nigel: Perhaps a test is that behavior should be the same regardless of whether media is infinite or finite: When end time >= current media time, end event gets fired.
Eric: Duration of the
media stream is defined by the duration of the longest track.
The file duration is 5mn if audio is 1mn and video is
5mn.
... If you have a cue whose duration is infinite, that would
imply that the duration of that text track is infinite, and
potentially then the duration of the media stream is infinite,
which is clearly not what you want in a file with finite media
tracks.
Rob: Cue end time = Infinity => at end of media, which is consistent with the 5-min video, 1-min audio definition
Eric: The duration of a
media file is not defined in HTML. It is defined in individual
specs for different media types.
... Fine to leave that as open issue.
<Zakim> gkatsev, you wanted to mention MSE based players
Gary: MSE players will often
have to update their duration based on extra information.
... Ex: 10s segments but actual duration is 9.7s.
... Being able to say: "I want this cue to trigger whenever
playback finishes" is useful because it's possible that
duration may end up being less than initially planned.
... and therefore we would never trigger that cue.
Chris: Interesting. We need to
follow up on definitions of duration.
... OK, let's capture this in issues.
Chris: 20th of July would be our next scheduled call, same time.
<kaz> [adjourned]