W3C

Media WG - TPAC 2023

11 September 2023

Attendees

Present
Alastor Wu, Andreas Tai, Andy Estes, Bernard Aboba, Chris Lorenzo, Chris Needham, Christopher Cameron, Eric Carlson, Eugene Zemtsov, Francois Daoust, Greg Freedman, Harald Alvestrand, Jean-Yves Avenard, John Riviello, Marcos Caceres, Mark Watson, Nigel Megitt, Patrick Griffis (observer), Paul Adenot, Peter Thatcher, Riju Bhaumik, Tatsuya Igarashi, Tommy Steimel, Will Law, Wolfgang Schildbach (observer), Xiaohan Wang, Youenn Fablet, Zoltan Kis
Regrets
-
Chair
-
Scribe
cpn, tidoust

Meeting minutes

Agenda

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0008/Media_WG_Meeting_11_Sep_2023.pdf

cpn: [goes through reminders]

[Slide 5]

cpn: I don't think that we'll have time to go through all of the specifications developed in the group. We'll prioritize.

Brief introduction of the Media WG

[Slide 6]

cpn: Media foundations on the web have created a powerful platform for media. Our group is looking into consolidating those and opening up new cases.
… We just rechartered until end of 2025.
… New co-chair, Marcos, who has a conflict right now but will join later on today.

WebCodecs

[Slide 7]

cpn: Spec is currently a Working Draft. We have done some of the horizontal reviews. Accessibility and internationalization reviews still need to be achieved, even if we believe that WebCodecs would not impact these.
… For each of the specs, we need to look into doing self-reviews. If anyone interested for any of the APIs, please reach out.
… Required by W3C process.
… Not necessarily time critical.

padenot: In the Audio WG, this has happened through knowledgable people.
… And we found small things here and there related to internationalization, so that's useful.
… No real things to fix in our case fortunately, but still very useful.

cpn: Self-review first then request to horizontal groups. That's what I propose we do.

nigel: What about the other ones?

cpn: The other ones were done already for WebCodecs.
… It does beg the question on the stage at which we're ready to enter CR.
… Scope? Implementation status?

padenot: A number of products are using it in production. It's been shown working. I'd like to see a bit more diversity in the type of applications. Some industries use features a bit more.
… A number of things that we don't do and that we will do, but not part of first iteration.
… E.g. related to sample rate, HDR.
… I also would like to get feedback on heavy scenarios, e.g., transcoding while doing other things at the same time. Right now, applications do WebCodecs alone.
… Of course, another implementation shipping would be useful to validate the API design.
… We're working on it in Mozilla, also saw some code in Webkit.
… So more implementation, and more feedback on the spec.

WebCodecs Container Format

cpn: First more technical item is to talk about WebCodesc Container Format

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0019/slides-117-moq-webcodes-container-00.pdf

[Slide 1]

Peter: We're looking into ways to point WebCodecs at a document, so that it uses the right metadata linked to the WebCodecs registry.

[Slide 2]

[Slide 3]

Bernard: CMAF is bit too high-level, we're looking for something simpler with less complexity

[Slide 4]

Bernard: Format for sending encoder chunks. We have a registry for video frames, but not for attributes encoded in encoded chunks.
… It would be desirable entries for these as well.
… to avoid duplicating an IANA registry

Peter: Third bullet point is the main one. We want to refer to the WebCodecs Codec registry.
… If this is a more general problem where people want to serialize what comes out of WebCodecs, it might be nice to have a registry specifically on metadata, but I don't know if that's worth it now.

[Slide 5]

Bernard: We have a VideoFrame metadata registry but we don't have an encoded chunk registry

cpn: Would this be in the existing registry or in a new registry?

Bernard: I think you also need a registry for that.

cpn: Would that work for the MOQ group?

Wilaw: Two formats that we support: CMAF and this specific LOC format.
… Some overlap between the two, with attributes different in different places.
… For the MOQ side, a common reference for both would be useful. Otherwise we have a very fragmented spec and test suite.

Bernard: I'm looking at the EncodedVideoChunk interface. There isn't a lot there. Type, timestamp, duration.

Peter: I'm looking at video decoder config.
… The format itself defines things that are very well-known but people may want to add metadata over time.

Bernard: So not going to be a need for any more EncodedVideoChunk attributes?

Peter: It's possible over time if anything gets added to the API.
… but I don't know how many will arise.
… Maybe the registry that was created recently would be the right place for this.

cpn: I think that if we can avoid maintaining a new registry, that's good.

Peter: The work on this LOC format is in early stages. It may be that we continue work in IETF and come back later on.

Bernard: To the extent that we can do the work in W3C, it's cleaner, otherwise we may end up with a number of extensions to WebCodecs defined in IETF.
… I do have a concern starting to take WebCodecs into other SDOs.

Peter: If we were to define "here is how to serialize what's coming out of WebCodecs", would this be of interest to the Media WG?

cpn: Two things: whether this is the direction we're taking, whether we need to change something in the charter. It feels like the next thing would be to capture this in an issue.
… Until we're ready to start defining these things.
… The Open Screen model gives us examples of how this could work.

API for conversion between pixel formats

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0008/Media_WG_Meeting_11_Sep_2023.pdf

[Slide 8]

eugene: Goal is to take any kind of VideoFrame and not have to do any kind of pixel format conversion before it gets fed in libraries such as TensorFlow.

[Slide 9]

eugene: Currently, you can copyTo() and then conver pixel format in some technology, but that requires copies and is sort of complicated.
… or use a Canvas, but that's synchronous.

[Slide 10]

eugene: Proposal is to extend copyTo to also do pixel format conversion as needed. Currently, this would support only RGBA and RGBX, and the original VideoFrame format.
… That's really equivalent to calling getImageData() on a canvas, which all browsers support.

[Slide 11]

eugene: [Going through code example]

[Slide 12]

eugene: We tried to do it in a more generic way, handling all possible formats and conversions. We still want to leave the door open. But just thoughts for the future, not the current proposal.
… We just want to make sure that this can be extended in the future.
… Would this address issues that people have faced?

youenn: If the format is not specified, no change?

eugene: Same as what we have right now.

youenn: And NotSupportedError for non supported format and color spaces?

eugene: That's a good question. Ignoring could be an option. Feature detection would still be possible.

youenn: Why not raise an error? This would seem easier.

eugene: If a user agent does not know about this extension, it cannot throw at all.

youenn: You can add a getter that would throw. That's how you can do feature detection in this case. There's a way to do it, whether you reject or not.
… For an exotic value, would you ignore it or reject?

eugene: We would like to reject.
… But I assume that browsers will simply know about these values.

youenn: OK, just going into small use cases here, which we can review at PR time.

Harald: When you set an enum, the application cannot detect that it has been passed the value at all.
… Which dictionary fields are stripped out?

youenn: Inner value throws.

ccameron: Was there any consideration into separating into "convert" and "copyTo"?

eugene: We're trying to minimize copies already.
… We can imagine that we have a virtual VideoFrame that references another video frames. That's all possible but this sounds more complex.

youenn: My understanding is that you want the pixel data into some format for processing. Not because you want to construct a new VideoFrame.

eugene: Correct.

nigel: Confusing for someone who's not used to it. RGBA is 4 values. When you say pixel format, does it say something about alpha transparency?

eugene: That's really the layout of data in the memory.
… Coupled with the color space, you put all the information you need for processing.

nigel: But if you use RGBA to convert from something that uses another set of primaries, you won't get RGBA.
… I wonder if there's a better term to use than RGBA?

ccameron: I think you're imagining something YUVA. I don't think that's a common format.

eugene: Changing the name would not change much.

jean-yves: RGBA is a format like the other ones.

nigel: I suspect that I'm not going to be the only one confused here.

padenot: It's called RGB because it R, G, B. Common to all systems. And then you can add color space on top of that. That's a family of color spaces, RGB.

eugene: OK, I understand what you mean. It's a name collision. Here, RGB is a memory layout thing, not a color space.

padenot: For instance, you also have BGRA and you can tell the ordering of pixels. RGBX, with padding for transparency that is not being used.
… We did a survey of ffmpeg, Windows APIs, Linux APIs, they all use the same terms.

nigel: OK, still disturbing but a good answer.

padenot: Design is good otherwise. It needs to be possible, easy, and allow to skip copies.

cpn: So this would also solve the sub-issues mentioned in the agenda.

ccameron: With support for non SDR formats, is the only place where it would be observable is copyTo?

padenot: Right now, yes.

ccameron: Is the expectation that there be no extra quantization when feeding into WebGPU?

padenot: Generally, this is at the discretion of the browser. Rendering is in the spec.

ccameron: Is it expected that the VideoFrame object not have lost any data?

padenot: You can tell. We had a request from someone working in Hollywood not to make any conversion. It's envisioned that that should work.

Add human face metadata to VideoFrameMetadata registry

cpn: We have a registry, which is currently empty. I'm wondering about the status.

youenn: Metadadata defined by Media Capture extensions.
… We need this document to point to the Media Capture extension. I tihnk it's ready for PR.
… Maybe we need to rethink our extension model, but I don't think this is blocking.

Riju: Adding more segmentation types would be useful.

cpn: OK, we can follow-up either during the joint meeting on Friday, or later on.

Media Source Extensions

[Slide 13]

cpn: We identified a collection of issues earlier on as v2 features. Workers (specified), changeType (specified), a number of workers related issues that need to be tackled.
… Might be worth talking about next steps for ManagedMediaSource.
… We got a very long discussion thread

jean-yves: In the latest version of the proposal, I removed most points of contentious, such as failure for append.

<padenot> nigel (IRC): following up, and as examples, here are `ffmpeg`'s names: https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a9a8e335cf3be472042bc9f0cf80cd4c5, and e.g. Windows: https://learn.microsoft.com/en-us/windows/win32/wic/-wic-codec-native-pixel-formats#rgbbgr-pixel-formats, etc.

jean-yves: The state of the current text is following these comments.

w3c/media-source#320

cpn: There was a question about folding into the normal MediaSource object.

jean-yves: Jer answered in the thread. I cannot recall why it's separate. It could be folded as part of the normal video element.
… It's an operation all video players will be doing.
… That particular event was added from feedback from video player folks.

jean-yves: I don't think that this particular event type should delay the rest.
… The issue on the media element is for normal playback of a video, it serves no purpose.
… It could help to recaculate the "buffered" data that you may show at the bottom, but so many ways to do that.
… More useful when playback happens through JavaScript code.

cpn: Next step would be a pull request, then?

jean-yves: Yes, I've been asked about that.

Marcos: Do we have general consensus in the group to proceed? Nobody objects, right?

Wilaw: Any time I see ManagedMediaSource is that there is a component somewhere that has access to some information that the application does not know about.
… Then we will get to a stage where there will be a new impact that the component won't understand.

jean-yves: First, ManagedMediaSource is a term that was coined some time ago, to discuss SourceBuffers that could evict content when needed and not only when appendBuffer is called.
… That's the managed part.
… That's the maximum level of management that it has.

<padenot> nigel (IRC): e.g. https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a9a8e335cf3be472042bc9f0cf80cd4c5a7ffbd399b88a48196e3cba416eec4dac, but they have a `0` standing in for the `X`: https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a9a8e335cf3be472042bc9f0cf80cd4c5a7ffbd399b88a48196e3cba416eec4dac

jean-yves: The other part in terms of startstreaming is to have something that lowers the power usage.
… We found that, if you're doing streaming, you will kill your battery very quickly.

<padenot> nigel (IRC): the `X` comes from Android iirc: https://developer.android.com/reference/android/graphics/PixelFormat#RGBX_8888

jean-yves: That part is purely a hint. All it tells the applications is that you're going to have a hit on the battery.
… I wouldn't focus on the name itself.

<philn> padenot: the GStreamer raw video formats are documented there fwiw :) https://gstreamer.freedesktop.org/documentation/additional/design/mediatype-video-raw.html?gi-language=c

jean-yves: SelfAutoEvictingSourceBuffer should have been the name.

eugene: Would a user agent be complying to this new managed media source if it fires startstreaming in the very beginning and then never worries about that.

jean-yves: yes, that's up to the user agent. Internally, we fire startstreaming, until enough buffering has happened, and then issue stopstreaming.
… The idea was to get closer to how HLS players operate. We don't want a power surge when switching from HLS to MSE-based streaming.
… Initially, we thought we would reject appends, but what we found is that, switching 5G off was enough to get back to normal battery usage levels. We removed the rejection.
… These are hints. If you follow these hints on iOS, you'll get access to 5G, so quicker fetch times. If you don't follow the guidance, you're restricted to 4G and quality will drop.

eugene: So an application can ignore these events and proceed as before.

jean-yves: If you do live stream, you won't get access to 5G. Same thing as if you're using HLS for live stream, actually.
… The longer the batter lasts, the better for end users, that's the driving incentive.

cpn: Does that address your concern, Eugene?

eugene: Yes.

padenot: I'm largely positive as well.

cpn: Seems that we can proceed then.
… I think it's worth for us to have a closer look at this and get TV manufacturers input as well.

Wilaw: Also good to hear that iOS will support MSE.

jean-yves: Only ManagedMediaSource but yet.

Francois: What would happen on iOS for regular MSE code?

jean-yves: No start. Will throw.

ericc: Also note user preference linked to data plan.

jean-yves: There used to be an quality attribute in the initial proposal, linked to that. I moved that out of the proposal for now based on feedback.

Wilaw: I might want to use 5G radio at the beginning of streaming to start as soon as possible. But then ok to switch to 4G.

jean-yves: Yes, in the implementation, more buffering allowed to start with. At the start, 5G will be available.

ericc: That's also part of how the user configures them.

jean-yves: Pretty much, startstreaming/endstreaming assumes that any XHR request that you're going to make is media-related, and will tag any network request this way.
… And then hands over to the underlying OS network stack.
… It's about tagging the current application transaction.

ericc: We just give the lower-level network a hint about the request that we're going to make, and it's up to it to decide what radio it's going to fire.
… We could lie, but there are other benefits for the user.
… That's a low-level policy decision.

jean-yves: We've done some experiment. Lower the battery usage by ~50% for Big Buck Bunny. Big impact.

cpn: Matt Wolenetz left some comments on the issue.

jean-yves: Yes, did not lead to specific changes.

cpn: Good, so next step, a Pull Request!

Media Capabilities

[Slide 14]

cpn: Working Draft, i18n and security reviews to do. A number of issues that have been around for a while. I'd like to get an understanding of our current thoughts on them.
… Joing meeting with WebRTC WG on Friday.
… First one is on transition() ergonomics.
… Transition from one encoding to another encoding. In some cases, that requires re-initialization of the hardware.
… Is this something that we're still interested in solving as a group?
… Where does it sit in terms of priorities?

Wilaw: Very useful feature. DASH needs this. Some hints about whether you can decode smoothly.
… Today, you fail. You have a glitch on the video and/or audio.

cpn: I think that there was a question about the meaning of seamless.

Francois: Only spec editor is Jean-Yves so far, right?

cpn: Johannes is joining from Chrome team.

cpn: There is a particular pull request where this is being proposed.

#165

cpn: What might be more useful to look at is a couple of WebCodecs integration points.
… First one from Dale, suggesting that we align on WebCodecs terms.

#205 Replace ColorGamut and TransferFunction with VideoColorSpace?

cpn: Proposal to use a common set across WebCodecs and Media Capabilities

eugene: This is mostly renaming. No substantive change. Just aligning specs that are meant to work together.
… Nobody has shipped it yet.
… I think Dale can prepare a pull request.

Marcos: That seems good, then!

ccameron: The color gamut media query, that's implemented. The transfer function media query, I'm curious what that does.
… I wonder what API uses this.
… I don't see any value in the existence of that. Color gamut is sufficient, I think.

cpn: In media, we make the distinction. It's not only about rendering.

padenot: It's also about whether it can be made efficient. Different levels of support.
… No energy used for instance. That would tell you that.
… Not that fine-grained, but still different levels.

ccameron: To identify a video format that would be particularly efficient?

padenot: Yes.

<alwu> seems the remote video is lost?

<philn> yeah

<philn> ah just got back

padenot: You can unplug your GPU, be in low-battery state. Lots of variants. Not stable. Browser policy may be to fallback to software if hardware decoder crashes multiple times.

ccameron: I can also see that being useful in some mobile context where tone mapping is done a certain way.
… In term of the names, I like the simple ones. With respect to the media capabilities, it's nice to maybe know where path will be efficient, especially if conditions change.

padenot: No, but still vastly more efficient than wild guess.

GregFreedman: This shipped in Safari some time ago. We've been using these. I'm happy to change. But noting that this shipped already.

GregFreedman: We use a combination of values for testing HDR support.

cpn: It sounds we're ok doing this if there's no implementation issue.

eugene: Given the new evidence, we need to go back and do our due diligence.

#202 What is the interaction of media capabilities with WebCodecs

youenn: From WebRTC to WebTransport + WebCodecs, probably the encoders will be very similar. Maybe if the application wanted to know whether encoding would be smooth, it would use Media Capabilities, but can it use media capabilities for WebCodecs?
… Different modes.
… the capabilities are flagged as WebRTC specific.
… That's a question as API developer, but perhaps no need to worry about it.

padenot: The types that we have internally match. It may just be a naming issue.
… Has anybody shipped this?

youenn: Yes.

padenot: Ah, we also did :)
… We ship it as transmission.

eugene: Using WebRTC is fine. I don't expect that user agents will do dramatically different. As for adding new entries, we could add a generic WebCodecs entry.
… But the video frame rate can be different. Would it make such a big difference with current "webrtc" entry?
… I feel that people who care about this kind of stuff can just use "webrtc".

youenn: OK, just changing the definition to say that it's about peer-to-peer transmission.

cpn: Seems like a reasonable outcome.

#176 General approach to capability negotiation

cpn: This issue opens up the whole rationale for how we came to create Media Capabilities in the first place.
… I don't know how to answer it better than "for historical reasons, we ended up where we are".

cpn: Then there are a couple of more WebRTC-specific issues that perhaps we can pick up on Friday.

[break]

Encrypted Media Extensions

[Slide 16]

GregFreedman: Progress towards FPWD has been slow. I merged a few PRs last months.
… I have something to send the other co-editors. Then two issues to look at.

cpn: We have some issues tagged as v2.

cpn: It seems to me that, once we have a FPWD, we can progress the spec towards CR relatively quickly.
… Goal is to progress relatively soon on the Recommendation track, gather horizontal reviews.
… Anything that we can do to help that move along?

GregFreedman: Any discussion on HDCP would be helpful. Otherwise, I don't think any of the other issues requires group discussion.

Xiaohan: Possible to discuss new features?

cpn: We're scoped to specific features. If there are issues that could be labeled as v3, that's something that the group could do as well, although not work on them.
… Proposal is to get to FPWD, and then once we're there, we can review open issues and take a decision on all of them.

Xiaohan: Regarding HDCP, implemented in Chrome, horizontal review?
… It does affect privacy implications.

Media Session

[Slide 18]

cpn: Far too many issues on Media Session.

youenn: Plan is to go through v1 issues.

cpn: A number of requests for new actions.

youenn: Maybe we should have a registry.

cpn: Among these, is there a sense on how we prioritize? Nothing prevents us from moving foward in theory.

[Slide 19]

cpn: There is a question on the scope of this. How would you prioritize?

youenn: Breaking changes, integration with feature policy.
… For the freezing, there's a solution where we basically keep the current implementations happy.
… enterPictureInPicture is new, better add it sooner than later if we add it.

youenn: There might be a need for a use case for an action. I'm not sure we would implement them very quickly but a decision now would be good. v1 or v2, I don't know.

Tommy: We'd be still interested in this for v1.

cpn: There was a question of whether a single boolean was enough to express things.
… is boolean automatic enough or do we need an enumeration?

youenn: Or do we need automatic at all?

Tommy: We don't currently have a use case for it so can't argue too strongly, but that seems a natural thing to have.

youenn: If we're not sure that it will be used, it seems appropriate to add an action without it, then we can revisit later when there's more established demand.

Tommy: Sounds reasonable.

[browsing open issues]

cpn: Is the principle here that whatever the media element does, Media Session should do the same thing?

youenn: Yes, but I don't know how HTMLMediaElement.duration handles 0.
… [looking at HTML spec]

cpn: Question was specifically about setting the value to null.

youenn: It will be changed to 0, it will never be null.

alwu: Issue was filed a long time ago. From the discussion that we have so far, the approach sounds reasonable to me.

youenn: We could throw.

padenot: It throws right now because it's a restricted number. We need it to be unrestricted.

youenn: Unrestricted changes the behavior for infinity. That's good. But it also changes the semantics for NaN. It seems fine to throw, as for negative numbers.

cpn: So we accept 0, positive, positive infinity but throw on negative and NaN.

alwu: Sounds good!

#221 Integration with Feature Policy

youenn: Main issue is whether it is still web compatible or not to go through Feature Policy. We try to enforce deny by default.
… I don't know how it's being used today.

cpn: You mention the artwork.

youenn: It's for all the APIs. The artwork, clearly, because it comes with an origin.

Tommy: I think that makes sense. The thing I'm worried about is backwards incompatibility. Things we may be breaking.
… I don't think I have metrics on how it is being used.
… but I can try to add some.

cpn: In terms of specification, this goes into which spec?

Marcos: I think you define this into your own spec and then reference that in the registry

cpn: Confused about Feature Policy vs. Permission Policy

Marcos: We're going to merge them into a single one.

cpn: OK, it seems that we could prepare a spec change then.
… Interaction with alternate sources?

#261 mediaSession and stream srcObject?

youenn: My understanding is that this is an implementation issue more than a spec issue. Chrome does not support live streaming for instance. I think Webkit does support it.
… Would it be fine to say that it's an implementation issue and clarify in the spec that Media Session is for all media sources?
… We should write tests.

Tommy: I think it makes sense. I'll need to think this through.

cpn: OK, let us know if there's a type of srcObject where it doesn't make sense.

audio-session#11 Does media session work with WebAudio?

cpn: This got moved to Audio Session, but seems more related to Media Session.
… Interaction with Web Audio, Audio Session, and then something else playing in the back, and how that gets exposed into a media session.
… I don't think you would want a media session for transient audio

youenn: Exactly.
… Can we agree on when something becomes a media session? That's the question.
… To me, it's not really part of v1. I think that we have time there.
… Do we find to define somehow how a media session becomes active given use of Web Audio? Or do we want to leave that up to user agents?

Tommy: I don't know.

cpn: One way forward is to wait until we have an Audio Session draft and reconsider once we're at that point.

[Slide 21]

cpn: We have some bigger issues, no time to look into it now though.

Audio Session

[Slide 22]

alwu: Question is how do we define the MVP?
… Android and iOS platforms being the first targets. But I also wonder about Windows and Linux where the feature could be slightly different? Do we want to focus on mobile platforms to start with?

youenn: My recommendation would be to include these platforms. Linux, Windows, MacOS. It's true that the API related to interruption will not happen there by default.
… But other features will still be useful.
… I would concentrate on which API we want.
… Safari is already shipping part of this API in iOS but also in MacOS I think.

Francois: I note that we generally try not to target specific platforms

youenn: My undesrtanding is that, if we focus on type and state, that's the scope. Still lots of edge cases to dig into.
… Only one default audio session.
… Describe the algorithm as steps in a way that is precise enough so that different implementations will behave the same on the same platform for instance.

alwu: I think for MVP discussion, current spec is already enough and I can leave input for future discussions.

cpn: In terms of filling the spec?

youenn: Alastor and I need to sync up. Paul mentioned a partial implementation in Firefox. We have one in Safari. Bringing things together will be beneficial.

Web Video Filter API proposal

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0018/TPAC_2023_VideoFilters_WebCodecs.pdf

[Slide 1]

Riju: This is an experiment to see if the idea makes sense

[Slide 2]

Riju: WebCodecs did a great job of bringing encoders and decoders to web apps
… The creator market is growing, so we're looking into filters
… Memory copies are a constant source of pain, so what if we can directly manipulate the video frames
… For visual feedback, you sometimes need to render to a canvas or WebGL, memory issues there
… I also was thinking of filters, like you can do on command line

[Slide 3]

Riju: Francois and Dom did a great job of comparing the various video processing options today
… and how processing can be included in the pipeline
… Using WASM with threads to convert OpenCV to OpenCV.js performance was close to CPU native
… WebGL is popular for post-processing, widespread usage
… DX issue with writing shaders
… WebGPU is modern option, we worked with Google to reduce memory copy issues importing external textures
… WebNN in future can use not just the GPU to do processing
… The main issue, for all these options there's still a DX issue of writing the shaders yourself or a lot of scaffold code
… Couldn't find a way to solve that without importing libary dependencies

[Slide 4]

Riju: In WebRTC, one-liner with getUserMedia - no complex scaffolding
… Capture pipeline in the native platform is invoked
… All platforms have sophisticated AI models, so why not elevate some to the web?
… Background blur
… Hooks to the native platform are a good option to check out

[Slide 5]

Riju: Native platforms: MS APIs for brightness, contrast, hue, saturation. Most are quite basic, low computation, memory copy as main bottleneck
… With new AI stuff coming out, complex filters are coming to the platform. We can also combine simple filters to create something more complex

[Slide 6]

Riju: Want to gauge interest here. Who are the stakeholders, can we learn their pain points
… Photoshop uses WASM and threads, maybe it makes sense to have something in their main pipeline

[Slide 7]

Riju: Strawman API proposal is a copy of the WebCodecs interface. Call into native filters
… In my PoC I use the encoder interface mainly

[Slide 8]

Riju: The VideoFilterConfig is similar to the decoder config, can treat as a base class
… Different filters have differnt knobs, e.g., a denoising filter with blur factor
… If you write a denosiing filter in GLSL, it'll be 200 LOC
… When there's something optimised in the native platform, why not use it directly?

[Slide 9]

Riju: Here are some examples, forking the WebRTC samples and adding filter code

[Slide 10]

Riju: Most OSs expose a filter pipeline that processes video, with a chain of branches
… Combine two video streams into one, e.g., picture in picture
… Break one video into a main output and thumbnail
… Passing one filter to another can be more efficient
… Looked at gstreamer's plugin pipeline, create plugin elements

[Slide 11]

Riju: Here's a code example leveraging a streams API, breakout insertable streams
… Pipeline exposes one or more readableStreams, allows data to pass in and out
… Keep streams as data agnostic
… This is just a thought, questions like how to query a set of filters, data formats, passing metadata, signalling, buffering

[Slide 12]

Riju: What next? I'd like some feedback, is this part useful at all?

Eugene: You're trying to compete with very heavy machinery. HTML canvas has this kind of features
… It's interesting to learn how this API is different from creating a canvas and drawing video frames into it
… Canvas features exist. Pixel format conversions, if you're trying to leverage OS features, there'll be forced pixel format conversions

Riju: Another feature is apart from OS features in the list, vendors have their own media SDK, when GPU people bring out new features, a filter API is a way to try out new features

Eugene: We'll be a couple years behind OS features

cpn: My organization was inspired by the Web Audio API to write a library for video. This was doing the exact same kind of things but not using any natively provided features. It worked well with poor performance.
… We haven't tried to rewrite it with WebCodecs.
… I don't know whether that proposal solves our use cases. In principle, it's something that we'd be interested to look at.
… If there's a set of operations that are accelerated and that we could leverage from the operating system, that would be great.

Francois: Wondering about fallback if there's no support for a specific feature? Could be a pain to maintain support code that maintains both cases

Paul: Also performance can make the use case. If you don't have hardware support you may not meet the frame rate. Sometimes it either works or doesn't , no fallback is possible

Media Session Coordinator

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Sep/att-0020/media-session-coordinator-tpac-2023.pdf

[Slide 1]

Andy: I'll cover the concepts of shared media, Apple's implementation, and how we expose it to the web, how it can be improved

[Slide 4]

Andy: Features like SharePlay involve coordination of playback across users. Not frame for frame synchronised, like for a video wall, but users are at the same time in the playback
… and any user can play or pause or seek, and these are reflected to all users in the shared session
… There's also a shared communication channel, text or video
… Other platforms, Google Duo supports live sharing. Many streaming platforms have watch party features, Hulu, etc

[Slide 6]

Andy: SharePlay is Apple's platform implementation, based on group FaceTime
… there's a public framework in Apple's SDK. Supports watch together, listen together, screen sharing, and collaborative editing using the group messaging primitive
… Communication side channel of the Group FaceTime call or shared text messages

[Slide 8]

Andy: On the Web, can have a native app and web app, so we've exposed this capability as MediaSessionCoordinator

[Slide 9]

Andy: It's a new property on MediaSession. One thing you can do is join or leave a session. The API exposes state whether a session is available to join

[Slide 9]

Andy: Listen for changes and react, by joining for instance
… We have shared playback controls

[Slide 11]

Andy: How it looks in an implementation, listen for a coordinatorstatechange event, provide an affordance to allow the user to join, show they're joined, or leave the session
… In order to support coordination, instead of playing the media directly you need to ask the coordinator to start playback
… Browser controls don't allow you to override the actions, so needs custom controls. We want to improve that in future

[Slide 12]

Andy: This ties into MediaSession, when you ask the coordinator to play it makes sure all participants are ready, e.g., by loading the media, and play simultaneously
… Feedback to the app via the normal MediaSession action handlers, e.g., by dispatching a 'play' action

[Slide 14]

Andy: Improvements we can make: session initiation, the API lets you join an existing session, apps can initiate the session so websites should be able to also
… Support for standard controls. A more limited API surface could associate a media element with a coordinator, then the controls work with the coordinator under the hood
… Having an interoperable implementation interests us. We'd be happy to see other platform technologies supported by an API
… Any questions?

cpn: At the BBC, we built a watch party feature using our own sync server. It has the functionalities you describe.
… We exchange messages for the syncrhonization.
… How would the Media Session Coordinator integrate with something that we build ourselves?
… What advantages are there for moving from the application level to the browser level?

Andy: Those details are somewhat out of scope, the sync engine we use, the includes just the commands
… It would be possible to polyfill an implementation using another sync engine
… There could be room for a companion spec that describes the sync side of the problem, rather than playback and control
… So BBC could provide an implementation behind a standard API
… But we saw that as out of scope

Eric: Chris, was the solution you describe an app or web based?

Chris: Only on web

Eric: We decided to make this proposal, as we had the platform framework to do the group sync, so this is about bringing it to the web ... so people with apps can join sessions with people on web
… So you could write an app or web client. You'd also have to handle all the server logic

Xiaohan: As who's going to do the coordination, the actual sync has to happen somewhere. If there's no general support in the OS, not sure how it can be easily implemented

Andy: That's a fair concern. The landscape of these technologies, is there is no general solution to it being multi-platform. Could be tied to a particular platform, FaceTime or iMessage
… Not sure if MediaSessionCoordinator needs to invent a general solution to be successful or simply provide the existing platform capabilities for users who've chosen to use that feature
… E.g., a group of users on Hulu who've decided to use it, just need the technology to enable it.

Xiaohan: WebRTC allows real time communication, could be the vehicle for the communication. Is there any possibility to integrate that, to have an OS independent web based solution?

Eric: I can imagine doing that with DataChannel, it could work
… We're in a similar position with the Presentation API, a browser API but all the detail below that for how you discover and communicate with devices is currently platform specific
… I agree it's not a great place to be, you want solutions that are more platform agnostic
… It's not a problem unique to this proposed

Chris: The Second Screen WG designed the Open Screen Protocol for that

Chris: Next steps - would be interesting to see a polyfill, also are other implementers potentially interested

Chris: My sense is that the WG is interested, but needs implementation interest from others to progress

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 222 (Sat Jul 22 21:57:07 2023 UTC).