Media WG – 12 December 2023

Meeting minutes

Slideset: https://docs.google.com/presentation/d/176UKXIpelSAEJ58xReZnFSPQvaDqI4bD3RSOdzGtBZo/edit

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Dec/att-0003/W3C_Media_WG_Meeting_2023-12-12.pdf

Media Session

Tommy: Chrome is interested in adding chapters to MediaMetadata, and wants artwork per chapter. Questions: is startTime sufficient? Or add an action. We think better to not add an action for now, but can leave to later. Also is seconds the right granularity?

Jer: VTT supports chapter tracks, in seconds. That said, there might be some in-band caption formats with a rational time, related to a media timebase. But I've never heard someone ask for sample accurate chapters.

Chris: Agree, seconds is good enough.

Jer: Also don't need "Secs" in the attribute name.

Tommy: I agree. Are people happy adding this to MediaSession?

Jer: Are these optional, especially the artwork?

Tommy: Can do.

Chris: Would updating the artwork and title update automatically during playback?

Tommy: It would, based on the playback time.

Chris: Other implementer interest?

Jer: Cocoa added chaters in the last major release. Sounds like a neat idea, so seems reasonable to me.

Chris: Look forward to the PR.

Media Capabilities #209

[Slide 4]

Bernard: This came up in a PING review of WebRTC SVC in its use of MCAPI. Anything using MCAPI they're blocking advancing in W3C.
… They want MCAPI to be restricted to only when the user has given camera position, as they say MCAPI is a fingerprinting issue. So I moved it here. So question is, if the Media WG agrees it should be limited by camera capture? Privacy concern is about exposing powerEfficient, Supported, smooth.

Jer: Applies to both encoding and decoding? Do we have privacy considerations in spec?

Bernard: We do, but doesn't cover their demand.

Chris: Broader issue is #176.

[Slide 5]

Jer: What they want is something like HLS, roundly rejected by all websites that want media streaming. The solution exists, but nobody wants it.

Jer: Whether they'd prefer an API where you pass a bundle of configuration an the browser chooses one, and not giving smooth/powerEfficient. Editor at the time decided against that option. That may be preferable to PING.

Bernard: A reason it's problematic, if peers are involved, can’t just use local capability. Depends on if other side can decode

Jer: Right, needs sçomething for both sides. A ranked list of approved formats, would also be a fingerprinting risk. Work through the use cases. Negotiating case would be difficult to support.

Bernard: Bigger privacy concern for users is asking for a camera permission while watching a movie. Much bigger issue than anything you leak.

Jer: On Apple platforms we have limited hardware, so everyone with iPhone 13 Pro will have the same media Capabilities. That's how we chose to mitigate the problem.

Bernard: Good to write that down. Only Apple supports hardware HEVC today, so if you can encode HEVC, you have an Apple product, but so what?

Jer: Other ways to determine that, e.g., UA string, so entropy not great.

Jer: Sounds like what the spec needs is a more fleshed out privacy considerations section, possible.

Chris: Agree. Also worth capturing rationale for the current design.

Jer: Fingerprinting still exists with HLS and DASH, can author a playlist and we expose through the video element width and height. You can serve a carefully crafted playlist and see what resolution was picked. But as far as additional entropy goes, it's not as dire a situation as PING may think.

Chris: So coming back to #209, does limiting scope to capture help?

Bernard: SVC is not supported in HW currently. But this doesn't tell you anything. If it were, it's in the chipset if supported, so what does it tell you about the hardware?

Jer: People worry about adding even partial entropy bits.

Tess: 31 bits needed to identify individuals.

Jer: So what use cases does limiting prevent?

Bernard: All decode uses, you couldn't do gaming or streaming with SVC, you'd be linking decode to a camera.

Jer: Doesn't have webrtc have something on connection basis between two peers?

Bernard: SVC use isn't negotiated.

Jer: Could it be added?

Bernard: It's omitted as it requires offer/answer, and the design was that it doesn't need to. Implementations required to include SVC decode. And wouldn't work if negotiated.

Jer: So why expose in Media Capabilities?

Bernard: Some HW decoders can't handle [missed]. On the encoder side, you could try and see what fails, but this would delay startup.

Jer: I don't see why don't need negotiation in WebRTC and not in the web app side. So if you try and fail, you expose the same info as before.

Jer: So I think we should document the privacy considerations.

Bernard: Agree

Chris: Let's do that

JN: General problem we have is detecting layer support, Dolby Vision, alpha channels. All things we're being asked for. We'll need to be more robust in our privacy considerations

Media Capabilities #203

[Slide 6]

cpn: First and third look like browser bugs. Second one is interesting as browser returns an object that does not exist in the spec.

jernoble: We thought we raised that a long time ago and had resolved to do that.
… We implemented support for it, but I guess it never made it into the spec. We should go back and figure out what was proposed.
… How do you know the properties you pass in are understand by the user agent? The solution was to pass them back. Forward-compatibility idea.
… Definitely something that we need to fix.

Media Capabilities prioritisation

[Slide 7]

cpn: I started to go through open issues with a view to prioritizing them.
… I'd like to get a sense of that from the group.
… Which of those do we want to focus on?
… Example of an API proposal to do configuration transition which we never settled on. There is a PR but three different proposals on the API shape.
… Alpha channels, Dolby vision type things, that were things that we talked about at TPAC in particular. And there was a proposal to add some sort of registry with identifiers that let you talk about Dolby Vision.

mfoltz: Unfortunately, Google folks focused on Media Capabilities couldn't attend the call today. I'm here as a delegate for the Google folks interested. Do we have GitHub issues categorised as fixes or improvements, feature requests?

cpn: Media Capabilities API priorities (PDF version)
… Some clarification about what we mean by variable bitrate and constant bitrate.
… Are we using MIME sniff properly.
… A number of things around WebRTC integration.
… Big feature request around transition (what I mentioned before)
… PR #107 just needs a review
… There's the whole capability fingerprinting thing
… Question on how it relates to WebCodecs, can we provide an example.
… Number of questions about the relationship through Media Capabilities in terms of decode capabilities and display capabilities.
… I think we said that display capabilities were out of scope, to be solved elsewhere, e.g., attached to Screen. Needs review at least.
… #136 is the HDR metadata thing we were just talking about.
… #113 is an issue coming from people working on HbbTV with constrained devices and you want to know whether you can decode more than one stream at once and with what parameters.
… A couple of questions around CMAF in particular.
… [going through other issues]
… #99 is a general issue about supplemental data in CMAF (example of 608/708 captions). Should it be in scope?

jernoble: The question is whether if you give a stream with 608/708 captions, whether you get captions?

cpn: We need to clarify that.
… There's another spec about mapping in-band tracks.
… Clarifications about the video frame rate (#95). What if you've got a variable frame rate.
… A lot of these were answered but we've never closed the issues with a resolution to change the spec or say that you can do it without changes, or that it is out of scope.
… #88 is an IDL thing that needs somebody to take a close look at.
… #73 is about audio channels. What we have at the moment is a string. How do you express more complex audio channel configurations?
… Interesting to see what current implementations do.
… Also what happens to audio streams? Do they get downmix to stereo in which case you might prefer to provide stereo directly?
… As far as I can tell, Web Audio has a number of channels, but does not go into more details.

jernoble: Audio is one part where we mixed decoding and rendering.
… It talks about the current audio device, which is dynamic. Whether that device is capable of downmixing audio to something spatial capable. We did that because we did not have a better place to do the work, I think.
… Probably not a good idea at the time, but expedient.
… Some other spec allows you to get more information about the output capabilities.

cpn: Audio Output Device API? I glanced at it, it does not really say anything about capabilities.

mfoltz: If issues were filed by folks in this group, maybe we can ask them to label them as v1 or vnext issues, and then I can look whether editors can take a look at them.

cpn: If there's somebody who would like to become editors, we could use somebody to help.

mfoltz: We're constrained but I'll see what I can do to find people.

cpn: Some of these issues have been around for a number of years.
… I can probably do that passthrough to identify what is in v1 scope and what is a feature request. I’ll take an action item to add labels to the issues.
… And then I can send that through to you.
… I don't think we have time to go through more of this stuff.

cpn: See you in the new year!

Attendees

Meeting minutes

Media Session

Media Capabilities #209

Media Capabilities #203

Media Capabilities prioritisation