Next Generation TextTrackCue -- 18 Sep 2019

<scribe> scribe: tidoust

Tess: [introduces the problem space]
... It you use the browser's built-in captioning system, you get things such as user stylesheets for free, but if you don't, you're on your own.
... Also you don't get to participate in platform integration such as PiP.
... Why? We have a large corpus of existing captions. Also, WebVTT does not handle all cases.
... All this suggests that a smart thing to do would be to decouple delivery from display.
... So that the user agent can at least participate in the rendering of captionings
... Essentially, goal is to insert a cue model before the cue gets displayed so that, in the future, you can add support for a variety of formats.
... In-band captions come in a variety of formats, if browser supports, this proposal would allow this to be handled.
... What requirements for the data model?
... It has to be reasonably expressive, and it should be really easy to use for the common caption formats out there, starting with WebVTT.
... Also the data model itself should be easy to manipulate in JS.
... [showing an example in WebVTT and IMSC1, and how a data model could represent that]
... Web app should be able to create these things by hand
... Basic proposal is to extend the basic TextTrackCue interface and to restructure parts of WebVTT and TTML.

ericc: [showing a demo of a version of a modified Webkit]
... TextTrackCue is an abstract interface in HTML. No constructor. I have modified that so that it has a constructor that takes a start/end time and an object that follows the data model that describes the cue
... [Big Buck Bunny demo]

[Polyfill demo at https://sandflow.com/ttapi-demo/big-buck-bunny.html]

ericc: The data model is verbose when there are styles. But caption can be text only, with default styles.
... [showing demo with more styles and regions]
... It's obviously possible to apply very complex styles.

pal: The basic idea is to use the TTML model constrained by IMSC, because that seems to be a good place to start.
... If gaps are found, we can backport them in TTML and IMSC.
... It's exactly the same model. Both the content model and the style properties.
... Most of the styles are direct mappings to their CSS counterparts.
... Some people may ask how to define the HTML rendering. There's some open source code that can show how to do it.

ericc: What I've done in this version of Webkit is to take a JS library that understands this data model and outputs a document fragment

glenn: Could you represent the data model as a document fragment directly?
... Instead of going to the process of translating from this data model to document fragment, could you use a serialized version of a document fragment?

ericc: Allowing that as an input?
... Allowing to take that as input might make some people nervous.

glenn: Yes, with scripts disabled.

nigel: We discussed this last year and the years before that. These concerns were raised.

glenn: The reason I ask is that there are other CSS properties than those defined in this list and if I want to construct a TextTrackCue object that uses them, it would be good to have a mechanism that makes it possible.
... E.g. TTML2.

pal: There is a pseudo-classes in the proposal that follows WebVTT and that addresses a lot of use cases.
... The model is easily extensible.

glenn: Adding means writing more code and landing the changes.

ericc: If CSS can be used, I don't see why you wouldn't be able to use your own stylesheet directly.

gary: WebVTT only allows certain CSS. Do we want to open it up?

ericc: No, I don't think so.

glenn: IMSC is evolving. We're adding new properties to it. I just don't want to be restricted. It would be good to have a built-in extensibility mechanism.

pal: In my mind, this evolves as well.

glenn: Yes, but I'm trying to avoid changing the code in the browsers. The way you suggest it is to have browser vendors update their code. I'd like to avoid that.

ericc: We should talk to find a safe way to do it.

nigel: What about the metadata?

pal: No metadata for now.

nigel: Would it be useful to have an API that allows the author to access metadata?

ericc: Since this is a JS API, the cues need to be created from script. You can add any attribute you want to, when the event fires, you'll get it back.
... There is a "content" attribute, I think.
... If you want to add something else to the cue, you can certainly do that.

mounir: we talked about that at FOMS. I don't have strong opinion on the API. Feedback is same as for WebVTT: people got fed up because of slightly different implementations. I think the same would happen here.

tess: How do you do captions in PiP?

mounir: We don't support captions in PiP.

nigel: Question about positioning.
... Page gets some video in there and cues. How do you relate the pixels of what you draw to the video?

pal: Each cue renders a rectangular region which typically overlaps the video.

nigel: How does that relate to PiP

mounir: People cannot do that by design if they use WebVTT. Assuming we can have web sites that want to make use of this solution, that would be much easier.

greg: At Netflix, we'd be interested from a rendering, accessibility , synchronization, and performance perspective in this solution.

markw: For accessibility, we do have site-wide customization, if the customization in the browser has default values, there may be a conflict between Netflix default values and UA default values.

<inserted> nigel: Same issue for BBC

ericc: Right, that may be an issue.

greg: In this model, would there be device settings?

pal: Regulatory requirements whereby users can select particular styles?
... Generally providing hooks allows these styles to apply in the first place.

ericc: [showing Mac OS style UI]. Checkbox allows to let video override some of the settings.
... We honor that.

pal: Having a common API gives us an opportunity to have a common approach.

<inserted> nigel: You could probably spec how the OS settings are applied by defining a place where OS style settings are inserted into the JSON structure before presentation.

greg: OK, it looks that we can, we just have to do things in JS.

james: I want to clarify that this is not specific to Mac OS. That's written in the FCC mandate that the user style should be able to override.

greg: Not one by one, as a whole.

<inserted> nigel: Please don't force FCC requirements on the whole world through standards - they're not accepted as good everywhere.

pal: My point is that by having a common model, we can at least have a common ground for discussion.

andreas: I agree with Pierre.
... In general, I think that's a really great proposal. Speaking from a German / Swiss perspective, we have troubles bringing captions from broadcasters to the Web.
... Different formats, etc.
... That leads to accessibility issues.
... Really important that we try to work on this.

chcunningam: Can you find a way to work with HTML directly to avoid a JSON format?

pal: That's where I started but doesn't work in the end.

ericc: The browser needs to understand exactly what's what so that we can apply the user styles to the right portions of the captions.

cyril: Having pre-defined classes.

pal: That's how WebVTT works.

chcunningam: By adding an API that also doesn't respect Netflix's default styles, we're not solving the issue, right?

ericc: It depends on the perspective. For us, it's extremely important to respect user's accessibility settings.
... This proposal is intended to make it easier for people who feel that they need to render their captions themselves.

greg: Having a proposal that allows user to go to the site or to the device is good.

ericc: We can't expose user's styles outside of the shadow dom because that would be a massive fingerprinting issue.

chcunningam: If you had class names that say "this is the text" or "this is the speaker name", then perhaps you could handle some of it that way

ericc: You mean an alternative that uses a document fragment instead of this?

chcunningam: yes.

tess: I would just have to see it work with concrete examples.

cyril: I heard support for this activity. Where do we work on it?

tess: I understand that there's a Timed Text WG that meets tomorrow. That's a start. We could also start with WICG and take it from there.
... The shape of the current proposal is patches to WebVTT and HTML specs.
... This might be a temporary document.

andreas: I would definitely support to propose this to WICG. We need more experts than we have in the Timed Text WG.

tess: I'm hearing we should start a WICG discource thread.

pal: Any new feature that may arise would need to be backported in TTML.

glenn: As this stands, the defined classes are close to IMSC

pal: If we need something that is neither in CSS nor TTML, then we need to take the discussion back to TTWG to understand why it needs to be added.

glenn: It could also add new entropy to the process.

mounir: The main point is the API design, not really the accessibility part of it.

pal: yes, just join the TTWG if you want to talk about accessibility.

cyril: If the WICG says that a Document fragment can be used, then it opens up other possibilities such as graphics overlay.

tess: We're out of time, thanks!

- DRAFT -

Next Generation TextTrackCue

18 Sep 2019

Attendees

Contents

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output