gurpreetvirdi: Thanks everyone
for meeting to discuss these three issues. It's great to "meet"
everyone
... Let's do a quick round of introductions and jump into the
discussion
... We really want to hear industry requirements and reach a
consensus on the proposals we're discussing here
https://github.com/w3c/media-capabilities/issues/120
vi-dot-cpp: The goal here is to
expose spatial audio rendering to audioConfiguration
... we're adding a single boolean to the audioConfiguration
dictionary to avoid vendor specific names
... and so that contentType determines the codec; a single
boolean can be used to indicate spatial capabilities
markw: If we load everything into
Media Capabilities, the answer you get is dependent on decoding
capabilities and rendering capabilities bundled together. Do we
need a mechanism for when those things change?
... for things like detecting headphones
jya: We've had a long issue on how we can detect change. We've never come to any conclusions on it
chcunningham: I think it's a fair
question. Definitely get analog to screen change. We've
resisted on the decoding side
... it's hard to surface (for example) whether playback will
still be smooth, but for this use case I can see value
... is there a pattern on WebAudio that could help here?
jya: I'm not aware of such capabilities with WebAudio
gurpreetvirdi: Do we want to track general Media Capabilities events as part of a separate issue? Or should we track it as part of this change specifically?
markw: I understand the hard
problem aspect here. I think that could go into future work.
Particular question of display/rendering device changes feels
in scope
... unfortunately, where we got to in the GitHub issue is that
it's hard to split out the display capabilities and still ask
the question of what's supported
... since the answer depends on the display capabilities
vi-dot-cpp: I see the connection here. Can we defer this to 118 and tackle 120 first?
gurpreetvirdi: Perhaps this is a good time to jump into 118 and come back to 120 in the end. Let's switch gears
https://github.com/w3c/media-capabilities/issues/118
vi-dot-cpp: adds an enum to
videoConfiguration (hdrMetadataType, colorGamut, and
transferFunction)
... modelling after chcunningham's most recent proposal
... for desktop UAs, decodingInfo() does not check display
capabilities of screens
... for UAs like Cast, decodingInfo() will check display
capabilities but will require user interaction
... this will address fingerprinting concerns
gurpreetvirdi: does this address fingerprinting?
jernoble: I have a question on
what you expect user consent to be in the scope of this
document
... are you proposing a prompt?
vi-dot-cpp: Yes. Thinking about a permission model
mounir: Shouldn't that be left to user agent? The specification can suggest that a user action MAY want to be displayed
markw: Anything that has to do
with user prompts will require everyone familiar in this space
to weigh in. Prompting makes sense when you're giving access to
something useful, but if you're exposing an additional
fingerprinting bit
... not sure if that makes sense. Does this open a paradigm
where we show prompts for every bit of entropy? Seems like we
should leave this to UAs to implement in terms of users' best
interst
jernoble: I don't think we'd put
a user prompt up for this since it doesn't expose a high
security risk.
... user prompts have been recommended often for fingerprinting
risks, but this isn't a mitigation we're interested in
gurpreetvirdi: Does it sound reasonable to leave this up to UAs then?
jernoble: A prompt for fingerprinting reduces the security of every other prompt by creating prompt fatigue
<jernoble> https://w3c.github.io/fingerprinting-guidance/
jernoble: there are other mitigations for fingerprinting coming out of the Privacy WG. We should read this document to come up with the best possible solution
markw: I agree completely. We should characterize the fingerprinting risk and refer to the other guidance for what UAs should do about it
jon: When people are sitting in front of the TV, every question you ask them is an intrusion. We shouldn't ask questions that are hard to explain to them. Fingerprinting would be difficult to explain.
gurpreetvirdi: Is there a recommended way we should proceed here based on the guidance?
jernoble: The document talks
about how to classify fingerprinting risk. There are also other
things like "is it likely that there will be unique
combinations of displayCapabilities that will increase
entropy?"
... if the API lets you detect things that have been
configured differently (different headsets/monitors), then we
are increasing entropy and the consequence of fingerprinting
risk is much higher
... I'm not sure if we've gone through the guidance document
yet and have decided whether there is a large amount of risk or
a small amount of risk
... we don't know until we've gone through the actual effort to
go through the document
... before we talk about mitigation, it may be worthwhile to
look through this document to see how severe the risk may or
may not be
<scribe> ACTION: vi-dot-cpp and gurpreetvirdi to review the fingerprinting document and to come up with a position on where we stand in terms of risk
gurpreetvirdi: We will then
update the GitHub issue with our findings
... does anybody else have feedback on this?
chcunningham: In the GitHub,
there was a lobby for a bool due to fingerprinting
concerns
... there was a discussion on whether the boolean is actually
functional. I wanted to ask Jon if the boolean on screen is
sufficient
jon: With the latest comment from markw, yes. If the display is not SDR, the content provider will assume it's a better screen. And if it is SDR, then we likely want to provide a SDR stream
chcunningham: It'll be fun to propose the boolean
Jon: BT709 vs. BT2020. Our
thought would be that BT709 is true and anything else is false
or the other way around depending on how you name the
boolean
... 709 to 2020 is much easier than 2020 to 709
vi-dot-cpp: Can we close on this now so that we have a full proposal that we can take through fingerprinting exercise?
markw: My comment is if we now
have everything with these enums in the decodingInfo() side
that the decoder actually checks renderer and display
capabilities
... with that understanding, is decodingInfo() the right name
for the method?
chcunningham: History is that
this API used to be called mediaCapabilities.query(). I take
your point that decoding and rendering are not the same thing.
I'm not sure I'm at peace with encodingInfo API
... Happy to reconsider calling this renderingInfo(). Also okay
with pretending the word decoding means playback
vi-dot-cpp: is playback reasonable?
jernoble: I'm not sure why
casting is implemented differently. If things are run on a
Chromecast versus a normal UA, there shouldn't be different
code required
... what I'm wondering here is whether we can remove the
wording around casting devices checking display
markw: depends on the capabilities of the casting device. If the casting device needs to offload to the display, then we need to know what the display can support
jernoble: This is fallout from moving to a boolean for HDR capabilities because this doesn't give enough information about mapping from input to output types
markw: Having more details on the
window.screen side doesn't solve the problem either. You can
determine what the decoder and the display supports, but you
don't know what the renderer (or the piece in the middle)
supports
... by lumping renderer and decoder together, you solve that,
but then you still need to check display via the boolean to
make the decision between multiple streams
chcunningham: My intention was not to call out these devices separately in the spec, but instead to motivate an API shape that could support them both
gurpreetvirdi: From the sound of
it, it sounds like we're heading towards a consensus of three
enums on decodingInfo() and a boolean on window.screen
... any other concerns?
Jon: Some detailed comments on
some of the values. If you look at colorGamut, p3 and rec2020
aren't really different
... p3 is a subset of rec2020
... nobody does more than p3 (difficult to hear)
chcunningham: This comes from the CSS Media Queries world
mounir: Correct. I don't expect any screen to return rec2020. That being said, I don't think we should change what CSS is doing
<gregwhitworth> here's the CSS spec for that: https://drafts.csswg.org/css-color-4/#predefined
jernoble: The current CSS spec is hand-wavy when it comes to defining what it means to support "p3". I don't think we should remove anything and instead be forward looking here
markw: Here aren't we asking whether the system actually supports the rec2020 primaries?
jernoble: Those are kind of conflated. It's really easy to understand primaries in 2020, but this isn't quite how CSS works. CSS is more about whether the screen can render. I'd hesitate to deviate from these/the web's understanding of color
markw: We're asking whether the device can support content in a certain format. This is easy to define
Jon: I agree with markw. Do you support is a different question from how much do you support?
jernoble: Is there ever going to
be a TV that can only display SRGB but advertises that it can
understand the full 2020?
... we can shortcut to say it's not whether you understand
2020, but whether you're capable of rendering a specific color
gamut
... I was thinking about this on screen, not on
decodingInfo().
chcunningham: Jon brought up a
good point around whether this enum makes sense in terms of the
colors you can understand versus display
... AV1 and VP9 string describe transferFunction but also color
primaries and color matrix which is a different structure from
what we're seeing here
... how do we backport this to older codecs that don't have
these things. Should color primaries and color matrix serve as
the units here rather than gamut?
... rec2020 defines a set of primiaries explicitly, but not
sure if p3 does
... there is a spec that covers these three things that is
separate from a separate codec spec. There is a file in Chrome
that has the enums defined for these
Jon: <inaudible>
<chcunningham> the spec I'm referring to - ISO 23001-8:2016
<jya> nit: only AV1 defines in the bitstream things like the transfer function, primaries etc, for vp9 you find this in the container
jernoble: In the end, do TVs actually render in 709? Or do you convert to SRGB?
Jon: <inaudible>
<chcunningham> @jya - the new vp9 string has additional bits
<chcunningham> https://www.webmproject.org/vp9/mp4/
<jya> @chcunningham I'm referring to the bytestream.
<chcunningham> ah, apologies - read to fast
Jon: <mostly inaudible comment on metadataType>
<markw> https://www.webmproject.org/vp9/mp4/
markw: Question was raised whether we should use gamut or primaries and matrix for color
<markw> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069661_ISO_IEC_23001-8_2016.zip
markw: difference is that when we
say color primaries here, it's still an enum. Just has more
values
... matrix coefficients sometimes they're there and sometimes
they're not. Not certain whether we need to call these out
explicitly or not
jernoble: If this is going to be
a decoding thing, can we say "if you say you support HDR, you
must support reading all three color gamuts"?
... is that an issue for set top boxes?
markw: Not as far as I know
... we would be overloading 2020 value to mean we want you to
support 2020 as well as most common luma/chroma profiles
used
gurpreetvirdi: Lots to ponder
over. We'll publish a new proposal on the GitHub repository. I
wanted to touch on spatial audio. What's left?
... what's the process on getting sign off from key folks on
this?
<chcunningham> @jer - can you write up that proposal on the github?
jernoble: Should there be a
separation between the act of decoding and the act of
rendering? We don't have a capability to choose a rendering
device yet
... if we want to design the API to be future proof, we may
want to move that to a future API that lets you choose which
device you want to pick
... we should look at WebRTC community for this as they're
doing some work here
... it might be best to remove the language about "can you
render spatialized audio" to "can you decode it"
... if you make it hand-wavy enough, then UAs can implement it
as both decode and render
... then we can move spatial audio queries to device picker
later
chcunningham: If we do that, is
this question already answered?
... for Dolby, there's Atmos and DD Plus. I believe the codec
strings are unique.
Isuru: Correction, there are different codec strings. Atmos can be embedded into multiple codecs so there's no specific codec string that can be used to detect this
jernoble: Ah. So Atmos is metadata. So you can have a file of the same codec with spatialization and without?
Isuru: Correct
jernoble: So we will still need the boolean. Does that answer your question chcunningham?
chchunningham: Yes. So even if there is a future where you have a device picker, we'd still need this boolean?
jernoble: Correct
Isuru: There's also DTS:X
jernoble: Great. Then this
boolean makes sense
... We might want to say something about "default destination"
to future proof the API
mounir: WebRTC has the idea of a deviceId. Could we imagine the spec returning a deviceId that will match the device that matches the specified configuration?
jernoble: We may want to reach out to WebRTC to see if we can solve this problem together. Let's make the language handwavy enough to say if the "current device" is able to decode spatial audio, then we're good
<scribe> ACTION: vi-dot-cpp to make the proposed changes to spatial audio issue
gurpreetvirdi: Assuming we edit this text, what is the process for sign off?
mounir: In general, as long as folks are in agreement, either chcunningham or I can merge
gurpreetvirdi: Thanks again everyone for an ad hoc meeting. Regrets that I can't make it to TPAC, but thanks for meeting!
<scribe> ACTION: jernoble to review vi-dot-cpp's PR and provide feedback that vi-dot-cpp can incorporate