WebML WG Teleconference – 27 June 2024

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please welcome our most recent new participants Natasha Gaitonde and Andrew Nolan to the WebML WG, both representing Microsoft

Announcements

WebML WG F2F at TPAC 2024

W3C TPAC 2024 main site

WebML WG F2F at TPAC 2024 issue for questions, comments, early agenda ideation

<gb> Issue 23 WebML WG Hybrid Meeting at TPAC 2024 (by reillyeon)

anssik: W3C TPAC 2024 takes place in Anaheim, CA, USA at the Hilton Anaheim on 23–27 September 2024
… The WebML WG will meet F2F on Monday, 23 September 2024, 09:00–18:00 PDT. We can split this time flexibly to cater for possible remote participants, please let me know any special requirements timing-wise
… e.g. preference for mornings vs evenings for particular topics
… WebML WG meets on the first day of the TPAC week

<jsbell> I'll be present in person. Conflicts with Web Apps on the 23rd, so may end up bouncing back and forth as specific agenda firms up.

anssik: I'd like to hear from you how you'd like to use our F2F time together and start ideating on the broad strokes our F2F agenda
… good TPAC topics are higher-level topics that benefit from sync discussion
… the expectation is not to do a line-by-line PR reviews at the meeting, those are better done in an async manner
… TPAC is of course an opportunity to network and resolve bigger technical and also social issues, including across the groups
… I've seen many long-running debates being settled at F2F meetings
… the W3C-level expectation is we'll announce the detailed F2F agenda latest two weeks in advance, but we can definitely start drafting already now, it helps people who may want to join but don't follow our group's work closely

dom: agenda sent in advance helps if people are participating remotely, they can ask to adjust timing

anssik: if this is your first TPAC, please check the schedule for other meetings of interest that take place during the TPAC week and feel free to request to join them as an observer

TPAC 2024 calendar

dom: on Wednesday there's a full day of breakout sessions

TPAC 2024 - Breakout sessions

<jsbell> +1

dom: this is when new ideas and proposals are discussed, if you've never been at TPAC, I strongly encourage you to be there on Wednesday at least

anssik: usually, the event hotel sells out fast so please check the hotel booking instructions, I've heard there are other events at the same time in the city

Hotel booking w/ special rate

anssik: instruction on how to register with a special rate are available at the TPAC site:

Registration for TPAC 2024

anssik: the most urgent things to do are hotel, possible flight bookings, and registration

anssik: hotel cancellation policy is 72 hours prior to arrival, gives you flexibility should your plans change

anssik: I propose we use the existing TPAC meeting issue to solicit proposals for meeting agenda:

WebML WG F2F at TPAC 2024 issue

<gb> Issue 23 WebML WG Hybrid Meeting at TPAC 2024 (by reillyeon)

anssik: in August I plan to produce a more baked in agenda to be shared with the entire W3C community
… any questions, comments at this stage, higher-level topics you'd like to see on our F2F agenda or things you'd be interested to drive?
… we can also provide slots for a few presentations on relevant topics to prime discussions, but traditionally TPAC F2F meetings have been interactive discussions on interesting topics rather than one-way presentations

Summer break

anssik: we'll take a break in July and will resume our WG telcons 8 August 2024
… as usual GH remains open 24/7 and welcomes your contributions while the meetings are on a pause
… non-controversial changes that receive appropriate review are OK to merge while our meetings are on a pause
… we'll do a debrief of PRs merged in our Aug telcon to keep everyone on top

Transformers.js update

Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Jun/att-0004/Transformers.js-update.pdf

<dom> Joshua: was about to achieve 16x speedup on my Windows 10 with RTX

<dom> Joshua: with fp16, got 100x speedup (when available)

<dom> Whisper WebGPU demo

<dom> Joshua: with Depth Anything v2, we get real time depth estimation in the browser

<dom> Joshua: WebGPU brought 50x speed up on background removal

anssik: Thanks Joshua for this exciting update! Questions?

McCool: are you aware of MLBuffer proposal? Plan for WebNN backend?

Joshua_Lochner: have been thinking of alternative WebGPU ML inference library backends, currently relying a lot on ONNX Runtime Web implementation, not exactly on top of MLBuffer developments
… WebNN support to be handled by ONNX RT
… working with ONNX RT contributors on the WebNN backend for ONNX RT

jsbell: what is needed from WebNN? it sounds like ONNX RT Web support?

Joshua_Lochner: correct

ningxin: dynamic shapes, some models require them, but WebNN is static shape, is this a gap we should address?

Joshua_Lochner: one useful benefit for dynamic shapes in for vision models, for user to select the image size they want to process, precompile ONNX model with static shapes
… decoder models that pass key value, dynamic shape is important if you want to use dynamic cache
… maybe there are way to export models so users export a model, or define it to take small images, in my case for depth anything, could select what range to use, there are other ways to define the model with static shapes that work for your use case

dom: you described perf improvements, compared to Wasm?

Joshua_Lochner: correct

dom: any guesses how much perf boost you'd get from WebNN?

Joshua_Lochner: you'd get low-level access so perf would be a lot better, but haven't been able to experiment it with yet, depends on ONNX RT implementation optimization, will collaborate with ONNX RT team on benchmarks

dom: as part of TPAC another effort is to collect recorded videos of demos of technologies group's are working on, but if we are able to have some early benchmark showing performance improvements applied to transformers it's be great

Joshua_Lochner: we can talk with ONNX RT team about this opportunity

Device selection

anssik: on our last call we agreed to evolve MLContextOptions and other API controls for device selection informed by further implementation experience and new use cases from the wider web community
… the spec has a Device selection section and defines MLContextOptions API surface for hints:

Device selection

MLContextOptions

anssik: per our earlier decision, we concluded MLContextOptions is under active development and added an explicit inline issue to the spec to solicit further feedback

"MLContextOptions is under active development" spec issue

anssik: I'd like to now kick off a new workstream to further explore future-proof device selection mechanisms and assess early design proposals
… (there's a PR #712 to add "device selection" GH label to help us keep track of proposals)

<gb> Pull Request 712 Process: Add "device selection" label (by anssiko)

anssik: as discussed, a number of early proposals and ideas are already documented in issue #623

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

anssik: additional proposals are welcome in that issue, or in new issues if they're self-contained
… for example, MikeW provided feedback in PR #712 that should be its own "device selection" related issue
… MikeW please open a separate issue for your feedback so the group can discuss it carefully, this comment:

https://github.com/webmachinelearning/webnn/pull/696#pullrequestreview-2123677923

<gb> MERGED Pull Request 696 Add MLDeviceType npu (by fdwr)

anssik: the proposals shared with the group for consideration in issue #623 include:
… - a fallback device
… - multiple devices in a preferred order
… - an exclusion of a specific device
… also discussed in this context were:
… - error handling
… - ultimate fallback
… - quantized operators
… I'd like to ask interested group participants to evaluate these proposals and explore them further so we can discuss them in the group to drive toward consensus
… I'd also want to encourage participants to identify and share additional use cases for device selection
… Ningxin shared the following device selection use cases earlier:
… - compute offloading: a game engine may want to run ML tasks on CPU to avoid interfering with the GPU time budget. See WebNN API in gaming scenarios discussion of WG 2022/04/07.

WebML WG 2022-04-07 telcon: WebNN API in gaming scenarios
… - op fallback: a ML framework may want to create a CPU context with fallback to Wasm option, that would avoid expensive cross-device tensor data copy between WebNN graph inference and Wasm operators execution. Custom operations #6 and Support CPU - WebAssembly scenario of the op level execution use case #156

<gb> Issue 6 Custom operations (by dsmilkov) [v2]

<gb> CLOSED Issue 156 Support CPU - WebAssembly scenario of the op level execution use case (by huningxin)

RafaelCintron: in Chromium there has been CLs landed for "npu" device type, we're testing with independent HW vendors on how it works across models

ningxin: WebNN samples have landed "npu" data type for image classification fp16 for MobileNet, SqueezeNet, ResNet
… you can test with those samples already
… object detection, part of WebNN samples, also has fp16 model for NPU device type testing

<ningxin> https://webmachinelearning.github.io/webnn-samples/image_classification/index.html has npu device type and fp16 models support

MLBuffer

Direct buffer sharing proposal

anssik: issue #688

<gb> Issue 688 [MLBuffer] Support interop with WebGPU (by bbernhar) [webgpu interop]

anssik: I believe our next step is still to talk to the WebGPU group
… Rafael, Bryan, MikeW to be the messengers toward WebGPU group
… anything specific to discuss for the direct buffer sharing proposal since the last discussion?

Bryan: correct, another member of my team has been working on the implementation proposal

<ningxin> https://webmachinelearning.github.io/webnn-samples/object_detection/ also has npu device type and fp16 model (SSD MobileNet) for testing

MLGraphBuilder and MLBuffer construction alignment proposal

anssik: issue #697

<gb> Issue 697 Inconsistency between MLGraphBuilder and MLBuffer construction (by reillyeon) [webgpu interop]

anssik: the MLGraphBuilder and MLBuffer construction alignment proposal has received a proposal from Bryan:
… createBuffer() return a promise
… It would:
… 1. Make MLBuffer construction consistent.
… 2. Offer a better OOM debugging experience.
… feedback or comments? Can we more forward with this proposal? Ningxin gave thumbs up.

Bryan: feedback would be great

TAG feedback: resource contention considerations for GPU/CPU/accelerators (last call revised)

TAG review feedback issue

<gb> CLOSED Issue 933 Updated review of WebNN API (by dontcallmedom) [Priority: urgent] [Progress: review complete] [Review type: small delta] [Review type: horizontal review] [Venue: WebML CG] [Resolution: satisfied with concerns] [Mode: breakout]

<gb> … [Topic: Machine Learning] [Focus: Web architecture (pending)] [Focus: Security (pending)] [Focus: Privacy (pending)]

anssik: TAG was happy to see our group was responsive to their feedback
… special thanks to Rafael and Reilly for chiming in on that TAG review issue
… TAG review is now resolved as "satisfied with concerns"

TAG review resolution "satisfied with concerns"

anssik: quoting the TAG response for the concerns part:

"Our main concern is: has this API considered the full range of hardware that it might need to run on? We see this running on CPUs without neural processing extensions, GPUs without extensions, CPUs with extensions, GPUs with extensions, and dedicated ML hardware. What steps have you taken to ensure that this runs across all of these targets, considering the range of hardware that exists and might exist?"

"Our second and related concern is about multi-implementer support. If this is going to be ubiquitous as an approach to "do NN on the web" then it really needs to be implemented across different platforms and different hardware."

anssik: I believe the first concern we can address by demonstrating wide implementation experience across hardware, we're doing this already and should keep on diversifying our implementations
… for the second concern, we are well positioned as a group, we have participants from nearly all major browsers (except Mozilla), all major OS vendors, and nearly all hardware vendors who have products in this space, and the group is open for new participants from the ecosystem who wish to join

<RafaelCintron> +1 to what Anssi said

anssik: concretely, to satisfy TAG review feedback, it looks like we should add the following to the WebNN spec:
… - implementation considerations for GPU but also for CPU and ML accelerators based on Rafael's and Reilly's contributions
… - a security considerations for denial of service attacks similar to WebGPU
… other things we should continue to do is to further diversify our implementation experience and our participation -- we're doing the right things on this front so we just have to keep up the momentum
… any comments on our TAG review response? Who is the first to craft a PR to address the TAG review feedback will surely get a virtual TAG Thank You Acknowledgement :-)

dom: quick reaction to the first concern, maybe there's something deeper to look into, what we do from an architecture perspective to make sure WebNN will adapt as new hardware emerges
… extending to NPU, other future purpose-built hardware
… how well does WebNN fallback if requested hardware does not exists?
… the impact of WebNN on low-end devices? that don't have fancy hardware acceleration capabilities?

anssik: should our response be in the explainer or in the spec?

dom: depends, good to first write it up and then see where to land it

Open issue and PRs

Debrief on PRs merged recently

Recently merged PRs

anssik: issue n/a fixed by PR #708

<gb> MERGED Pull Request 708 Link fix: Link to "is empty" not "empty" when an adjective is intended (by inexorabletash)

anssik: issue n/a fixed by PR #709

<gb> MERGED Pull Request 709 Write example decompositions as complete functions (by inexorabletash)

anssik: issue n/a fixed by PR #710

<gb> MERGED Pull Request 710 Style: Consistently use backticks for expressions in prose (by inexorabletash)

anssik: issue n/a fixed by PR #711

<gb> MERGED Pull Request 711 Remove unnecessary subsections, make order consistent (by inexorabletash)

anssik: thanks to Josh for these PRs and Dwayne, Ningxin, Zoltan, others for timely review

jsbell: nothing big, no need to discuss these PRs today

See you in August

anssik: Thank You all for all the great work done in the first half of 2024!
… it's been super busy time for our group and we've accomplished a lot e.g.:
… the group delivered a major spec update in a new Candidate Recommendation with transformers support
… the group participation continued to grow and diversify
… the group participants advanced and diversified implementations significantly
… the group's work got acknowledged outside this WG in major events and keynotes, quite a compliment and very humbling
… the second half looks super exciting, and we'll have a F2F coming up in September!
… it is my privilege to chair this group of dedicated, professional and ambitious people who like solve hard problems and deliver
… thank you all, have a relaxing summer those of you who'll take time off during this period
… See you all in August!

– DRAFT –
WebML WG Teleconference – 27 June 2024

27 June 2024

Attendees

Meeting minutes

Announcements

WebML WG F2F at TPAC 2024

Summer break

Transformers.js update

Device selection

MLBuffer

Direct buffer sharing proposal

MLGraphBuilder and MLBuffer construction alignment proposal

TAG feedback: resource contention considerations for GPU/CPU/accelerators (last call revised)

Open issue and PRs

Debrief on PRs merged recently

See you in August

Diagnostics