WebML WG Teleconference – 27 March 2025

Meeting minutes

Repository: webmachinelearning/webnn

<tarek> anssik I will have to leave 10 minutes earlier today, sorry

anssik: Please welcome Elena Zhelezina from ARM Limited and Matthew Atkinson & Winston Chen from Samsung Electronics to the WebML WG!
… also, please welcome Laszlo Gombos, Matthew Atkinson, Winston Chen and Raghavendra Ghatage from Samsung Electronics, and Jared Parr, a Google Cloud Partner, to the WebML CG!

Incubations summary

anssik: no summary today, our next WebML CG meeting is next Monday, 31 March 2025, 07:00–08:00 UTC:

https://github.com/webmachinelearning/meetings/blob/main/telcons/2025-03-31-cg-agenda.md

anssik: This is the EU and APAC timezone friendly WebML CG Teleconference option
… We alternate between the AMER-APAC (Tue/Wed 00:00-01:00 UTC) and this EU-APAC (Mon 07:00-08:00 UTC) option to cater to our geographically diversified participants
… Feel free to join the option that fits your schedule better

W3C Breakouts Day 2025 summary

anssik: W3C Breakouts Day 2025 took place 26 March 2025

List of all proposed sessions
… a few us us attended the "How would AI Agents change the Web platform?" session proposed by Dom

AI Agents session description

<gb> Issue 7 How would AI Agents change the Web platform? (by dontcallmedom) [session]

AI Agents presentation

AI Agents minutes

anssik: we had an active discussion, early exploration
… if this group is interested, I can invite Dom to have a brainstorming session with us

<Zakim> jsbell, you wanted to discuss when we start (brief announcement re: blinkon)

jsbell: questions about AI agents, someone asked, if we want to discuss this, what is the proper venue?

McCool: Anssi proposed we have an initial discussion in this group

anssik: another session relevant to this group was about "Web Platform Documentation"

Web platform documentation session

<gb> Issue 10 Web platform documentation (by Elchi3) [session] [track: browsers]

[Draft] W3C Docs CG Charter

anssik: this is an opportunity collaborate with this proposed new group to improve dev docs for WebNN API and built-in AI APIs
… we're already contributing in this space as a group, via mdn/browser-compat-data, caniuse.com, webnn-samples, test frameworks, w-p-t, etc.
… I'm talking with Florian Scholz who leads this effort and will connect him with interested people
… any other updates from the breakouts day?

<jsbell> https://www.chromium.org/events/blinkon-20/

jsbell: wanted to share that Chromium community has BlinkOn conference, schedule WIP, AI track proposed with WebNN and adjecent things
… allows remote attendance

McCool: is anyone working on enabling WebNN in Deno?

WebNN wide review update

anssik: WebNN API is on the W3C Recommendation Track and that comes with an expectation we request review from horizontal groups for substantive changes, typically annually
… I submitted a new round of review requests to all horizontal groups last week
… please take a look at the review requests, fine-tuning is possible and welcome because it takes a while for the reviewers to pick up
… we expect review feedback to be delivered within the next three months
… the work on the API spec continues as usual, these reviews are non-blocking

Accessibility

<gb> Issue 105 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [CR] [REVIEW REQUESTED] [pending] [agenda+] [s:webnn]

TAG

<gb> Issue 1072 Updated review of Web Neural Network API (by anssiko) [Progress: untriaged]

i18n

<gb> Issue 258 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [REVIEW REQUESTED] [CR]

Privacy

<gb> Issue 156 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [CR] [pending] [REVIEW REQUESTED]

Security

<gb> Issue 85 Web Neural Network API 2025-03-20 > 2025-06-20 (by anssiko) [REVIEW REQUESTED] [pending] [CR]

anssik: TAG usually takes longer due to broad scope and full pipeline
… as discussed, when we deliver u/int4 spec update, we can append that issue/PR to the TAG review request
… any comments?
… general comments and questions also welcome via the wide review tracker #239

<gb> Issue 239 Wide review tracker (by anssiko) [process]

Caching mechanism for MLGraph

anssik: issue #807

<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]

anssik: wanted to poll your interest to initiate work on an explainer, prototyping intent, identify any blockers
… it is important to keep us grounded to reality, focus on designs that map to the frameworks and backends used
… three design considerations discussed in the issue for now:
… - reusability across context
… an example of a device-specific compiled model given was OpenVINO, while TFLite and Core ML are device-agnostic i.e. the same compiled model package works for all XPUs
… - implicit vs explicit API
… only explicit API can avoid redundant graph building step, as it allows developer to listGraphs() without the need to build the graph
… - composing a supergraph from multiple cached subgraphs
… it was noted this is not implementable with current frameworks
… is there interest to explore the caching mechanism feature in the near term?

jsbell: ack McCool

McCool: we should define our goals clearly, download time, compilation time, do we want to extend to x-site caching in the future, behaviour
… explainer posted tries to document these things, proposal for explicit API makes sense
… the default update should do nothing, null cache, makes it easy to implement at least, backend issues are relevant

jsbell: Michael thanks for the summary
… we have ideas how to do explicit cache for one site, x-site cache was a big deal, at BlinkOn we are discussing that topic
… do we want to block any work on an explicit same-site caching API on solving or addressing x-site issue?

<zkis> Working on the suggested explicit API is a good start.

McCool: prefer extensible for x-site in the future

RafaelCintron: wanted to answer Josh, I don't think we should block one on another, since some models are large, I don't think we can rely on auto cache models
… if in the future we can extend to x-origin can explore that separately, inclined to say the key should be a friendly name string
… developer is in change of naming

ningxin: want to add one more point, Reilly mentioned last time that web framework like ONNX Runtime Web can use MLGraph cache, we need to consider that and invite folks working on e.g. ORT Web for input
… native frameworks are already doing model cache, ONNX Runtime EP Context, has design docs
… we could see the connection to MLGraph

<ningxin> https://onnxruntime.ai/docs/execution-providers/EP-Context-Design.html

anssik: anyone planning to do prototyping?

McCool: would be interested

<ningxin> +1 for prototyping

McCool: can work on the explainer further

Remove pool2d MLRoundingType

anssik: issue #324 (#374) and PR #770

<gb> Issue 374 Simplify `MLPool2dOptions` by removing the `outputSizes` option (by huningxin) [operator specific]

<gb> Pull Request 770 Remove pool2d MLRoundingType - Simplify the operand layout support of conv2d and pooling 2d operations (by fdwr)

<gb> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (by huningxin) [feature request] [operator specific] [interop]

anssik: this issue proposed to remove pooling's rounding direction for the output dimensions
… this PR has been open for a while, and Josh notes this needs a redo
… do we have open questions for this change that'd be helpful to discuss today?

DwayneR: no open design questions

jsbell: one of the issues with the update is padding that passed, seems redundant?

DwayneR: the other option is to only use padding, not pass output sizes
… concerned of having two different ways to do the same thing

jsbell: no strong preference

ningxin: my previous comment re validating the output size, with Josh's work we already have that, so we can leverage that

Behavior when there are NaNs in argmin/max inputs

anssik: issue #811

<gb> Issue 811 Behavior when there are NaNs in argmin/max inputs (by philloooo) [interop]

anssik: question fom Phillis on how backends handle NaNs in the inputs
… Dwayne framed this decision as "performant but implementation defined" vs "slower but consistent"
… Dwayne's proposal shared in the issue:
… - leave NaN behavior implementation defined, and update spec to indicate that
… - add to the spec a mitigation code snippet to sanitize code before calling argMin/argMax
… - add an isNaN operator
… Ningxin's feedback was requested, Phillis gave thumbs up

Dwayne: we just got Ningxin's +1

ningxin: +1

DwayneR: with the proposed actions nothing to do for Phillis
… if you have NaNs we should have a way to have deterministic behaviour, should satisfy all concerns

jsbell: WFM, did we have anything explicit in the spec that implementation can do maths "their own way", but should not crash, we mention clamping in scatter and gather ops

Dwayne: we don't have a lot of verbiage about NaNs currently

anssik: OK to advance to a PR with this issue

(de)quantization behaviors on Core ML

anssik: issue #822

<gb> Issue 822 (de)quantization behaviors on CoreML (by philloooo) [interop]

anssik: Phillis shared quantize/dequantize Core ML implementation feedback, would like to discuss it with the group
… Dwayne responded and it looks like (u)int4 being unsupported on Core ML for non-constant inputs is the remaining open question
… is it possible to work around this limitation using Dwayne's pseudocode?
… what is the performance penalty, is it reasonable?
… I also see Dwayne's suggestions for future Core ML improvements:
… - Add blockwise support to dequantize and quantize
… - Add u/int4 support to dequantize and quantize (or emulation path via u/int4 cast & bitwise ops)
… - Allow the scale and zero point to also be dynamic too (not just constant)

Dwayne: int4 was the only uncertain, anything else can be emulated, int4 can be also emulated I think, it is not just pretty
… Phillis gave thumbs up, not sure if that's confirmation

jsbell: can't speak to that

DwayneR: will wait Phillis

Mike_Wyrzykowski: I will take a look after the meeting

Query mechanism for supported devices

anssik: issue #815

<gb> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]

anssik: recent updates:
… - a question from Mike about the "capacity" concept use cases
… - a new use case for the "capacity" concept from the Google Meet team via Markus
… the question was "could you provide some insight as to why the capacity aspect should be exposed from the UA or what use case we are trying to support?"
… also note "The web app / website only knows its own workloads while the UA has access to OS system calls and other apps running on the system."

anssik: Markus responded with a use case, "ML receive side audio processing is currently running on the GPU. Then the app would likely want to avoid launching a large non-realtime LLM task there to avoid unacceptable glitching."
… another use case: "There is also great value ensuring the CPU doesn't get selected [due to] the very large performance difference between "fallback"/"cpu" and "accelerated"/"gpu-like"/"npu""
… I think both are right, local (own workload) and global (all workloads on the OS) optizations can be done and are not mutually exclusive

anssik: I wonder can we design an API that supports both capacity and capabilities?

<zkis> Or separate APIs for them?

Mike_Wyrzykowski: I think that there could be some value for website to be able to add a hint

RafaelCintron: asking Mike, at TPAC you said high-performance hint would be OK?

Mike_Wyrzykowski: high-performance / low-power would be OK
… don't want to add hints that could be useless
… Core ML has some hints

https://developer.apple.com/documentation/coreml/mlcomputeunits#Processing-Unit-Configurations

zkis: I wanted to ask Mike about Reilly's suggestion "gpu-like" and "cpu-like" loads, does that category fit well with you, not explicit CPU or GPU, but along the lines of Reilly's definitions

Mike_Wyrzykowski: on Apple's platforms implementers would use Core ML, not implementable on that due to that, could drop down to frameworks below, would make the implementation on browser more complicated, because ANE is via Core ML, BNNS and MPS for CPU and GPU

zkis: looking at the processing unit configs, there's a possible mapping, the question is is it useful

Mike_Wyrzykowski: it seems high-performance / low-power options overlap with those

zkis: I try to make a few attempt to satisfy both sides, I think it's Google's turn to check if there's any way to satisfy the use cases using existing hints, some kind of mapping
… or if we need an explicit query API, can we do it so it can be implemented on Apple platforms in a conformant way
… capacity and capabilities together might be hard, we might want to separate the concerns
… can we have separate API surfaces for capacity and capabilitities?

Mike_Wyrzykowski: it'd be great to show a use case where the website has better understanding of capacity than the browser engine
… once we add this to the spec we can't remove that easily

zkis: how to implement the use case from Markus? webmachinelearning/webnn#815 (comment)

<gb> Issue 815 Query mechanism for supported devices (by anssiko) [device selection]

anssik: any learnings from WebGPU capabilities?

Mike_Wyrzykowski: some capabilities are interdependent
… not always obvious when writing tests

<Mike_Wyrzykowski> gpuweb/gpuweb#4025 anssik I think this is the issue tracking simplifying webgpu limits

<gb> Issue 4025 Consider simplifying limits with overlapping meaning (by kainino0x) [limits] [api]

– DRAFT –
WebML WG Teleconference – 27 March 2025

27 March 2025

Attendees