W3C

– DRAFT –
Media WG Call

09 July 2024

Attendees

Present
Bernard_Aboba, Eugene_Zemtsov, Francois_Daoust, Greg_Freedman, Jean-Yves_Avenard, Jianjun_Zhu, Tommy_Steimel
Regrets
-
Chair
Chris
Scribe
cpn, tidoust

Meeting minutes

cpn: [reviewing agenda items, Media Session, Media Capabilities, and TPAC planning]

eugene: Some proposal to add some stuff to VideoFrameMetadata

Media Session

Add voiceactivity action

Add voiceactivity action

jianjun: When we have conferencing activity, we propose to add a new action to Media Session to do voice activity detection.
… This event was discussed in the Media Capture Extensions spec, but I was encouraged to move it to the Media Session API because of the setMicrophoneActive method.
… With the method, we can easily enable a microphone after the event is received.

Tommy: It seems reasonable to me. I would like to run it through [missed] as well, but he's out today.
… I'm ok with this.

Jean-Yves: There was a discussion on Media Session vs. MediaStreamTrack in GitHub

cpn: Yes, and I thought the argument was reasonable.

Jean-Yves: We support this too from a Webkit perspective.

Jianjun: It looks like there is OS support in MacOS and iOS.

Jean-Yves: Yes, there is a user interface showing the information currently.
… I'm showing up on behalf of Youenn to voice support.

cpn: Tommy, do you want some more time to think about it?

Tommy: I'm comfortable with this, I can approve the PR today.

cpn: I may have some minor editorial things to suggest on the PR.

Jianjun: If you have any detailed comment about this new API, please raise it on the PR.

cpn: OK, it seems we all agree on the direction this is going, that's good news.

Media Capabilities

cpn: Marcos and I were doing a review of the spec last week, and dug into a couple of issues that we'd like to discuss.

Issue #44 - powerEfficient

cpn: We looked at the issue and based on the discussion that we've had in the group over time, we tried to add a more precise definition of what we mean by power efficient. The text leverages the suggestion from Mounir back in the days.
… Based on power draw being optimal, not restricted to hardware/software.
… Also, the return that you get does not take into account the current power source of the device, unless that has some side effects such as enabling/disabling encoding/decoding hardware.
… The intent is to clarify what we mean and make it more precise. There's a PR #221 for that.

Jean-Yves: It may good to look at how implementations have done. I've worked on implementations for Gecko and Webkit. There is no check of the power source indeed.
… The basic check is really hardware-accelerated or not.
… For software, anything below a certain resolution is considered power efficient.
… The thought is "ok to use on a mobile device".
… I don't believe implementations actually test on power usage.

Eugene: For Chromium, a long time ago, Chromium measured in the lab whether to send things to the GPU or in software.
… That's how the resolution boundaries were determined, based on actual power usage. That's a heuristics, for sure.
… The browser cannot measure power draw on each and every device.
… It's not very precise, but the intent was there.
… Anyway, it is in a way motivated by power consumption, just proxied by heuristics.

Jean-Yves: It was motivated by AV1 at some point.

Eugene: I don't think it was tested for particular codecs. More about copying stuff to GPU and back.

Jean-Yves: Right. Knowing the context on how it was implemented could provide more input on what "power efficient" means.
… It is vague by definition.
… I don't think that we want a crisp definition in any case due to fingerprinting issues.

cpn: Right. What we have here does not commit us to anything. It's really about clarifying things. We're trying not to make that a query for "are you using hardware decoding or not?"
… My question is: does the rephrasing make an improvement to the spec, or should we go further?

Bernard: I think it explains what we see in practice.
… And in particular the fact that it does not equal hardware-accelerated.

cpn: That was the goal, yes.
… The PR is up. Please do review and comment.

Eugene: I approved it!

Jean-Yves: No more substantive comment from me.

cpn: OK, I'll take this as approved then.

Interoperablity of MIME type handling

cpn: Next issue we looked at was #69, which started from a comment from Anne not to use valid MIME type.
… Looking at implementations, Chrome works per spec, rejecting an invalid MIME type
… But both Firefox and Webkit accept the invalid MIME type.
… This is reflected in a test in WPT.
… We need alignment.
… Two ways: make Firefox and Safari implementaions more strict.
… Or relax the implementation in Chrome.
… It's not about the codec. Chrome would accept the "audio/mpeg;" string without ";"

Jean-Yves: adding a WPT is fine and it's up to every implementer to align.

cpn: I'm not sure a new test is needed.
… There's follow up to be done there.
… We started a PR. There was a previous PR a while ago, to turn this into an algorithmic kind of step.

Bernard: We're in the process of closing the RTP payload registry that the spec references.
… The reference will need to be updated.
… There was confusion between two sources. We're merging things up to avoid it.

cpn: The MIME type parsing is dependent on the codec for parameters. I think we want to point at this registry to avoid going into details.
… This is all invoked as part of an algorithm that validates an audio/video configuration.
… You might be tempted to say little about the MIME type.
… At the other extreme, you may be more strict.
… We're trying to be somewhere in between.
… That's my reading of what we have at the moment.

<cpn> Francois: My understanding from Anne is that "valid media mime type" is a grammar check. I don't know browsers enforce the grammar, but other constructs might be valid per grammar

... mimesniff defines how to parse a mime type, so by that algorithm parsing with an added semicolon still gives you a mime type
… So we'll need to use the mimesniff algorithm, but do we add the grammar check
… We should check the mimesniff spec

Chris: And check what implementations do

https://mimesniff.spec.whatwg.org/#ref-for-valid-mime-type%E2%91%A1

Francois: There's a similar example in the mimesniff spec for this particular case
… So I'm wondering if implementations validate the structures or just parse per mimesniff, which is what Firefox and Safari do, but Chrome seems to do something extra

Jean-Yves: You pass in the string and it gives you a structure, check whether the codecs value is there or not. The semicolon may not make a difference

cpn: This may be something we want to pick up next time we talk with Marcos.
… It's not clear which way we should go.
… Or whether we think these differences are insignificant.
… If both are correct interpretations of the spec, then perhaps the tests are being too strict.

Other ongoing PRs

cpn: We have a bunch of open pull requests against Media Capabilities, some from Bernard and me. I'd like to get these merged in and then we can sort of continue with those other changes.
… If you could have a quick look through these PRs, that would be great.

WebCodecs - VideoFrameMetadata

PR #813

cpn: Agreement that this is useful stuff.

Bernard: The PR just defines these things but it does not include any codec behavior.
… The last piece that Youenn suggests is defining where this metadata is created.
… He suggests putting it in the source.
… These specs don't talk about VideoFrame at all.
… The question is where we might want to add this.

cpn: Perhaps requestVideoFrameCallback is not the right thing to reference.

Eugene: You may not always have receiveTime.
… When we discussed VideoFrameMetadata, there are a number of specs that want to put stuff into VideoFrame. It's expected that the WebCodecs API will not change because of this.
… At this point, as I see it, we just put extensions that are useful for other types of media capture.
… This is the extension point where other APIs add metadata. Segmentation, Face detection, etc.
… receiveTime will be available in some implementations when the video frame comes from WebRTC.
… All of them will be defined in their own specs.
… I would just put it there and leave it for other APIs to use.

Bernard: If it doesn't affect WebCodecs, why leave it there in WebCodecs and not use the registry?

Eugene: We had that discussion before. The consensus in the WG was that it would be great to advertise the metadata.

Bernard: But that was the registry.

Eugene: But this is a registry entry, right?

Bernard: Ah, ok.

cpn: Do we plan to add these times to WebCodecs at some point?

Eugene: That's the sort of can of worms that I would prefer not to open.
… This is going to be very separate discussions.
… Let people copy whatever metadata they want.
… It shouldn't be hard to copy timestamps around.
… That's what Google Meet does. Capture timestamps are used for A/V sync. They copy it over through side channels.

Bernard: The PR makes sense in that it modifies the registry without touching WebCodecs.

cpn: If Media Capture is where these get surfaced.

Bernard: Media Capture Transform would be the source for these. We can change that.

cpn: Could requestVideoFrameCallback then refer back to those?

Eugene: I don't know.
… Feedback would be that we would love to see PRs against the relevant specs that promise to emit these timestamps.

cpn: So not merging this as is, only with references.
… Having these things defined at the source where they are emitted. That's aligned with Youenn's feedback, I think.

TPAC 2024

cpn: I'm preparing the agenda for TPAC 2024. We have a number of slots planned. Bernard, should we coordinate on what the WebRTC joint meeting should look like?

Bernard: Yes, we need to get human beings signed up to do the slides.

cpn: That's really what I want to get for the other time slots.
… What are the main things that you'd like to talk about? What is the best use of our time?
… Please label issues. If there are higher level discussions to have, that's great as well.

Bernard: Some discussion on references to WebCodecs, related to WebRTC, from [missed]. He'll be there in person at TPAC.

Jean-Yves: For MSE, the main issue for me is merging the tests. I haven't looked at PRs and issues in the repository for now.

cpn: Thank you all!

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: s/hardware-decoded/hardware-accelerated

Succeeded: s/Topic: VideoFrameMetadata/Topic: WebCodecs - VideoFrameMetadata

Maybe present: Bernard, Chris, cpn, eugene, Francois, Jean-Yves, jianjun, Tommy

All speakers: Bernard, Chris, cpn, eugene, Francois, Jean-Yves, jianjun, Tommy

Active on IRC: cpn, tidoust