W3C

Media Capabilities enhancements - GitHub issues 118, 120

20 Aug 2019

Attendees

Present
Greg Whitworth (gregwhitworth), Scott Low (scottlow), Gurpreet Virdi (gurpreetvirdi), Jean-Yves Avenard (jya), Vi Nguyen (vi-dot-cpp), Mark Watson (markw), Isuru Pathirana (Isuru), Mounir Lamouri (mounir), Chris Cunningham (chcunningham), Greg Freedman (GregFreedman), Jer Noble (jernoble), Jon Piesing (jon)
Chair
Jer Noble
Scribe
Scott Low

Contents


gurpreetvirdi: Thanks everyone for meeting to discuss these three issues. It's great to "meet" everyone
... Let's do a quick round of introductions and jump into the discussion
... We really want to hear industry requirements and reach a consensus on the proposals we're discussing here

Issue #120 (Spatial Audio)

https://github.com/w3c/media-capabilities/issues/120

vi-dot-cpp: The goal here is to expose spatial audio rendering to audioConfiguration
... we're adding a single boolean to the audioConfiguration dictionary to avoid vendor specific names
... and so that contentType determines the codec; a single boolean can be used to indicate spatial capabilities

markw: If we load everything into Media Capabilities, the answer you get is dependent on decoding capabilities and rendering capabilities bundled together. Do we need a mechanism for when those things change?
... for things like detecting headphones

jya: We've had a long issue on how we can detect change. We've never come to any conclusions on it

chcunningham: I think it's a fair question. Definitely get analog to screen change. We've resisted on the decoding side
... it's hard to surface (for example) whether playback will still be smooth, but for this use case I can see value
... is there a pattern on WebAudio that could help here?

jya: I'm not aware of such capabilities with WebAudio

gurpreetvirdi: Do we want to track general Media Capabilities events as part of a separate issue? Or should we track it as part of this change specifically?

markw: I understand the hard problem aspect here. I think that could go into future work. Particular question of display/rendering device changes feels in scope
... unfortunately, where we got to in the GitHub issue is that it's hard to split out the display capabilities and still ask the question of what's supported
... since the answer depends on the display capabilities

vi-dot-cpp: I see the connection here. Can we defer this to 118 and tackle 120 first?

gurpreetvirdi: Perhaps this is a good time to jump into 118 and come back to 120 in the end. Let's switch gears

Issue #118 (HDR Capability Detection)

https://github.com/w3c/media-capabilities/issues/118

vi-dot-cpp: adds an enum to videoConfiguration (hdrMetadataType, colorGamut, and transferFunction)
... modelling after chcunningham's most recent proposal
... for desktop UAs, decodingInfo() does not check display capabilities of screens
... for UAs like Cast, decodingInfo() will check display capabilities but will require user interaction
... this will address fingerprinting concerns

gurpreetvirdi: does this address fingerprinting?

jernoble: I have a question on what you expect user consent to be in the scope of this document
... are you proposing a prompt?

vi-dot-cpp: Yes. Thinking about a permission model

mounir: Shouldn't that be left to user agent? The specification can suggest that a user action MAY want to be displayed

markw: Anything that has to do with user prompts will require everyone familiar in this space to weigh in. Prompting makes sense when you're giving access to something useful, but if you're exposing an additional fingerprinting bit
... not sure if that makes sense. Does this open a paradigm where we show prompts for every bit of entropy? Seems like we should leave this to UAs to implement in terms of users' best interst

jernoble: I don't think we'd put a user prompt up for this since it doesn't expose a high security risk.
... user prompts have been recommended often for fingerprinting risks, but this isn't a mitigation we're interested in

gurpreetvirdi: Does it sound reasonable to leave this up to UAs then?

jernoble: A prompt for fingerprinting reduces the security of every other prompt by creating prompt fatigue

<jernoble> https://w3c.github.io/fingerprinting-guidance/

jernoble: there are other mitigations for fingerprinting coming out of the Privacy WG. We should read this document to come up with the best possible solution

markw: I agree completely. We should characterize the fingerprinting risk and refer to the other guidance for what UAs should do about it

jon: When people are sitting in front of the TV, every question you ask them is an intrusion. We shouldn't ask questions that are hard to explain to them. Fingerprinting would be difficult to explain.

gurpreetvirdi: Is there a recommended way we should proceed here based on the guidance?

jernoble: The document talks about how to classify fingerprinting risk. There are also other things like "is it likely that there will be unique combinations of displayCapabilities that will increase entropy?"
... if the API lets you detect things that have been configured differently (different headsets/monitors), then we are increasing entropy and the consequence of fingerprinting risk is much higher
... I'm not sure if we've gone through the guidance document yet and have decided whether there is a large amount of risk or a small amount of risk
... we don't know until we've gone through the actual effort to go through the document
... before we talk about mitigation, it may be worthwhile to look through this document to see how severe the risk may or may not be

<scribe> ACTION: vi-dot-cpp and gurpreetvirdi to review the fingerprinting document and to come up with a position on where we stand in terms of risk

gurpreetvirdi: We will then update the GitHub issue with our findings
... does anybody else have feedback on this?

chcunningham: In the GitHub, there was a lobby for a bool due to fingerprinting concerns
... there was a discussion on whether the boolean is actually functional. I wanted to ask Jon if the boolean on screen is sufficient

jon: With the latest comment from markw, yes. If the display is not SDR, the content provider will assume it's a better screen. And if it is SDR, then we likely want to provide a SDR stream

chcunningham: It'll be fun to propose the boolean

Jon: BT709 vs. BT2020. Our thought would be that BT709 is true and anything else is false or the other way around depending on how you name the boolean
... 709 to 2020 is much easier than 2020 to 709

vi-dot-cpp: Can we close on this now so that we have a full proposal that we can take through fingerprinting exercise?

markw: My comment is if we now have everything with these enums in the decodingInfo() side that the decoder actually checks renderer and display capabilities
... with that understanding, is decodingInfo() the right name for the method?

chcunningham: History is that this API used to be called mediaCapabilities.query(). I take your point that decoding and rendering are not the same thing. I'm not sure I'm at peace with encodingInfo API
... Happy to reconsider calling this renderingInfo(). Also okay with pretending the word decoding means playback

vi-dot-cpp: is playback reasonable?

jernoble: I'm not sure why casting is implemented differently. If things are run on a Chromecast versus a normal UA, there shouldn't be different code required
... what I'm wondering here is whether we can remove the wording around casting devices checking display

markw: depends on the capabilities of the casting device. If the casting device needs to offload to the display, then we need to know what the display can support

jernoble: This is fallout from moving to a boolean for HDR capabilities because this doesn't give enough information about mapping from input to output types

markw: Having more details on the window.screen side doesn't solve the problem either. You can determine what the decoder and the display supports, but you don't know what the renderer (or the piece in the middle) supports
... by lumping renderer and decoder together, you solve that, but then you still need to check display via the boolean to make the decision between multiple streams

chcunningham: My intention was not to call out these devices separately in the spec, but instead to motivate an API shape that could support them both

gurpreetvirdi: From the sound of it, it sounds like we're heading towards a consensus of three enums on decodingInfo() and a boolean on window.screen
... any other concerns?

Jon: Some detailed comments on some of the values. If you look at colorGamut, p3 and rec2020 aren't really different
... p3 is a subset of rec2020
... nobody does more than p3 (difficult to hear)

chcunningham: This comes from the CSS Media Queries world

mounir: Correct. I don't expect any screen to return rec2020. That being said, I don't think we should change what CSS is doing

<gregwhitworth> here's the CSS spec for that: https://drafts.csswg.org/css-color-4/#predefined

jernoble: The current CSS spec is hand-wavy when it comes to defining what it means to support "p3". I don't think we should remove anything and instead be forward looking here

markw: Here aren't we asking whether the system actually supports the rec2020 primaries?

jernoble: Those are kind of conflated. It's really easy to understand primaries in 2020, but this isn't quite how CSS works. CSS is more about whether the screen can render. I'd hesitate to deviate from these/the web's understanding of color

markw: We're asking whether the device can support content in a certain format. This is easy to define

Jon: I agree with markw. Do you support is a different question from how much do you support?

jernoble: Is there ever going to be a TV that can only display SRGB but advertises that it can understand the full 2020?
... we can shortcut to say it's not whether you understand 2020, but whether you're capable of rendering a specific color gamut
... I was thinking about this on screen, not on decodingInfo().

chcunningham: Jon brought up a good point around whether this enum makes sense in terms of the colors you can understand versus display
... AV1 and VP9 string describe transferFunction but also color primaries and color matrix which is a different structure from what we're seeing here
... how do we backport this to older codecs that don't have these things. Should color primaries and color matrix serve as the units here rather than gamut?
... rec2020 defines a set of primiaries explicitly, but not sure if p3 does
... there is a spec that covers these three things that is separate from a separate codec spec. There is a file in Chrome that has the enums defined for these

Jon: <inaudible>

<chcunningham> the spec I'm referring to - ISO 23001-8:2016

<jya> nit: only AV1 defines in the bitstream things like the transfer function, primaries etc, for vp9 you find this in the container

jernoble: In the end, do TVs actually render in 709? Or do you convert to SRGB?

Jon: <inaudible>

<chcunningham> @jya - the new vp9 string has additional bits

<chcunningham> https://www.webmproject.org/vp9/mp4/

<jya> @chcunningham I'm referring to the bytestream.

<chcunningham> ah, apologies - read to fast

Jon: <mostly inaudible comment on metadataType>

<markw> https://www.webmproject.org/vp9/mp4/

markw: Question was raised whether we should use gamut or primaries and matrix for color

<markw> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069661_ISO_IEC_23001-8_2016.zip

markw: difference is that when we say color primaries here, it's still an enum. Just has more values
... matrix coefficients sometimes they're there and sometimes they're not. Not certain whether we need to call these out explicitly or not

jernoble: If this is going to be a decoding thing, can we say "if you say you support HDR, you must support reading all three color gamuts"?
... is that an issue for set top boxes?

markw: Not as far as I know
... we would be overloading 2020 value to mean we want you to support 2020 as well as most common luma/chroma profiles used

gurpreetvirdi: Lots to ponder over. We'll publish a new proposal on the GitHub repository. I wanted to touch on spatial audio. What's left?
... what's the process on getting sign off from key folks on this?

<chcunningham> @jer - can you write up that proposal on the github?

jernoble: Should there be a separation between the act of decoding and the act of rendering? We don't have a capability to choose a rendering device yet
... if we want to design the API to be future proof, we may want to move that to a future API that lets you choose which device you want to pick
... we should look at WebRTC community for this as they're doing some work here
... it might be best to remove the language about "can you render spatialized audio" to "can you decode it"
... if you make it hand-wavy enough, then UAs can implement it as both decode and render
... then we can move spatial audio queries to device picker later

chcunningham: If we do that, is this question already answered?
... for Dolby, there's Atmos and DD Plus. I believe the codec strings are unique.

Isuru: Correction, there are different codec strings. Atmos can be embedded into multiple codecs so there's no specific codec string that can be used to detect this

jernoble: Ah. So Atmos is metadata. So you can have a file of the same codec with spatialization and without?

Isuru: Correct

jernoble: So we will still need the boolean. Does that answer your question chcunningham?

chchunningham: Yes. So even if there is a future where you have a device picker, we'd still need this boolean?

jernoble: Correct

Isuru: There's also DTS:X

jernoble: Great. Then this boolean makes sense
... We might want to say something about "default destination" to future proof the API

mounir: WebRTC has the idea of a deviceId. Could we imagine the spec returning a deviceId that will match the device that matches the specified configuration?

jernoble: We may want to reach out to WebRTC to see if we can solve this problem together. Let's make the language handwavy enough to say if the "current device" is able to decode spatial audio, then we're good

<scribe> ACTION: vi-dot-cpp to make the proposed changes to spatial audio issue

gurpreetvirdi: Assuming we edit this text, what is the process for sign off?

mounir: In general, as long as folks are in agreement, either chcunningham or I can merge

gurpreetvirdi: Thanks again everyone for an ad hoc meeting. Regrets that I can't make it to TPAC, but thanks for meeting!

<scribe> ACTION: jernoble to review vi-dot-cpp's PR and provide feedback that vi-dot-cpp can incorporate

Summary of Action Items

[NEW] ACTION: jernoble to review vi-dot-cpp's PR and provide feedback that vi-dot-cpp can incorporate
[NEW] ACTION: vi-dot-cpp and gurpreetvirdi to review the fingerprinting document and to come up with a position on where we stand in terms of risk
[NEW] ACTION: vi-dot-cpp to make the proposed changes to spatial audio issue
 
[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/08/20 15:37:29 $