WebML WG Teleconference – 19 December 2024

Meeting minutes

anssik: Welcome to our new participants Yang Gu, Jiajia Qin and Shaobo Yan from Microsoft
… also warm welcome to Yuichiro Tachibana from Hugging Face joining as a guest!

Yuichiro: I met W3C members at a conference discussing Transformers.js, thanks for inviting me, interested in client-side ML execution

WebML Working Group Charter update

Repository: w3c/machine-learning-charter

anssik: Working Group charter is up for renewal in 2025
… today we'll review the current charter, discuss and solicit input on proposed changes, triage open charter issues, and confirm the timeline

anssik: the key principle for WG rechartering is "if it ain't broken don't change it"
… this concerns the scope expansion specifically
… if the scope expands, existing participants need to re-join

Current WG Charter

WG Charter 2023-2024

anssik: Motivation and Background is informational
… Scope is an important section and is future-proofed with "well-known model architectures" and "major platform APIs"
… Out of Scope section includes training, hardware features and hardware algorithms
… Deliverables are WebNN API, Model Loader API as tentative, Ethical Principles WG Note
… Success Criteria is the standard one, two independent interoperable implementations
… Coordination enumerates groups we usually work with, but does not prevent us from working with other groups or projects outside this list
… the rest is standard charter boilerplate

Dom: good overview Anssi, Scope and Deliverables are the most important
… think no need to change Scope or add new Deliverables
… submitted a PR #39 to align with the latest charter template, thanks!

<gb> Pull Request 39 Align with latest charter template (by dontcallmedom)

Diff
… any questions or comments about the current charter or Dom's proposed changes?

anssik: next we'll look at open charter issues

WG Charter open issues

Model Loader API, keep as tentative or remove from scope?

anssik: issue #38

<gb> Issue 38 Model Loader API, keep as tentative or remove from scope? (by anssiko)

anssik: Model Loader API incubation has been on a pause since Feb 2023 and its known implementation was removed from Chromium after experimentation phase
… unless there's interest, I'd propose to remove this from the WG Charter and revive this work in the WebML CG as appropriate

jsbell: not currently being persued, supportive of removing

Core operator set, scope and coordination

anssik: issue #37

<gb> Issue 37 Core operator set, scope and coordination (by anssiko)

anssik: the current WG charter scope section is abstract enough to not warrant a revision to allow work on core op set to happen
… we could update the informative list of major platform APIs if there are changes:

"The APIs in scope of this group are not tied to any particular platform and are implementable on top of existing major platform APIs, such as Android Neural Networks API, Windows DirectML, and macOS/iOS Metal Performance Shaders and Basic Neural Network Subroutines."
… this is an open ended list, not being on this list does not exclude any platform APIs

anssik: related, in Coordination we note StableHLO, while we also consider MLIR Linalg, PyTorch Prims IR, TOSA for our compositional fundamentals research
… we probably don't want to enumerate all possible projects (MLIR, PyTorch, TOSA), so perhaps it is more balanced to remove the StableHLO reference?

Dom: being open-ended and informative, the only thing is this can be useful as a reminder of communities who to seek wide review from
… perhaps under that lens, consider removing

Christian: seems good to remove, that seems quite specific one

jsbell: we normally list WHATWG in charters, most specs have dependencies to WebIDL, Infra and other specs an standards

Dom: we usually do that when there's a specific requirement
… e.g. new types for WebIDL required, any browser WG has a WHATWG relationship, would not put that as a hard requirement, can put it in if wanted

Task-based APIs and Prompt API

anssik: issue #36

<gb> Issue 36 Task-based APIs and Prompt API (by anssiko)

anssik: Task-based APIs and Prompt API were adopted into the WebML Community Group earlier this month
… these APIs are now incubated in the WebML CG
… proposal is to check for readiness for WebML WG adoption from time to time considering implementation experience, end user feedback

Dom: we could list them as tentative if we feel they are likely target for adoption, concern is this space is very active with interesting IP questions
… bringing into charter might create friction in the AC review and require everyone re-join
… proposal is to wait and see the traction

Christian: personally, would support adding the APIs, but understand and appreciate Dom's perspective
… compared to Model Loader API these APIs seem compatible

Dom: we are free to charter again 6 months from now when the incubations have made more progress
… Model Loader API has different IPR scope from task-based APIs

jsbell: given what Dom said, it has a very specific API called out, maybe in 6 months from now we would be in a better position to see what would make sense to capture as Tentative then

Dom: as part of our communication to AC, I will make sure we mention these APIs have been adopted in the CG and are consideration for the WG future revision

christianliebel: what Dom says makes sense, having publicly visible commitment we are looking at these new APIs, it makes sense to not add to the WG charter right now to not create friction

Speech Synthesis

anssik: issue #31

<gb> Issue 31 Speech synthesis and machine learning (by r12a) [deferred]

anssik: suggestion to mention Speech Synthesis for symmetry with Speech Recognition in informative Motivation and Background
… I propose we update the informative text e.g. as follows:

"Speech Recognition and Speech Synthesis enable computers to recognize and translate spoken language into text and vice versa."

On-device training

anssik: no feedback that'd suggest we should bring on-device training in scope for the WG
… issue #27

<gb> Issue 27 On-device training (by anssiko) [deferred]

Charter development timeline

anssik: I'll prepare an updated charter with Dom early new year for your review
… then complete horizontal review, around mid-Feb we will initiate the AC review
… new charter start date would be 2025-05-01

Device selection abstractions update

Repository: webmachinelearning/webnn

anssik: PR #784

<gb> Pull Request 784 Add device selection explainer (by zolkis)

anssik: Zoltan has updated the explainer proposal, added new use cases, known implementation limitations, added MVP solution to remove explicit deviceType to make contexts device agnostic, allow multiple devices per context

Updated explainer (preview)

Zoltan: folks will need some time to digest the space, we have documented intro, history, key use cases and requirements, considered alternatives
… recently added Minimum Viable Solution for your review
… one pain point is we were tied to device per context, while some platforms can execute on multiple devices
… also our device selection mechanism did not map well to platform APIs
… proposal is to:
… - Remove MLDeviceType as explicit context option.
… - Update MLContext so that it becomes device agnostic, or default/generic context. Allow supporting multiple devices with one context.
… - Add notes to implementations on how to map power preference to devices.
… - Improve the device selection hints in context options and define their implementation mappings.
… - Check if requesting a certain device type or combination of devices is still a use case.
… please review and chime in the PR if this direction has issues, otherwise I will proceed with a spec PR per this design

jsbell: explainer looks great, support getting the explainer merged
… where to capture feedback to support the explainer, is there an issue?
… Google team is on a vacation for the next few weeks

Zoltan: I can wait over the holiday for feedback

RafaelCintron: thanks for putting this together, couple of questions about the explainer
… at the end it says "allow supporting multiple devices in one context"
… does "other devices" mean a new type?

Zoltan: this would be abstracted away, if defining a context as a combination of devices, should be also possible

Rafael: low-latency, what kind of device would be low latency?

<jsbell> +1 that I had the same question as Rafael at first, so clarifying the text would be great.

Zoltan: low latency, if you want to optimize e.g. LLM throughput tell that to the implementation and it's do the best to satisfy that, consider it a hint
… just a suggestion in which direction to extend the context options, I found low-latency hint in OpenVINO, we need to validate the use cases and craft hints based on that

<zkis> https://blog.openvino.ai/blog-posts/automatic-device-selection-and-configuration

Rafael: high-performance and low-power I know how to deal with, they exist on Windows, low-latency I'm not familiar with, CPU could be low-latency

Zoltan: I'd spec these as hints that if they cannot be fulfilled they're not errors

Zoltan: PR #784 would be the perfect place for feedback

<gb> Pull Request 784 Add device selection explainer (by zolkis)

<ningxin> +1 to provide feedback on PR

WebNN Operator Update Wave 3

anssik: Dwayne has a WIP PR for op set Wave 3, thanks Dwayne!
… OK to push WIP PR to upstream repo and mark it as a Draft
… this enables PR Preview and CI checkers that can be helpful for development
… and folks can help contribute

Dwayne: Austin has a few question on uint4 on CoreML, does not block the spec PR

Core op set: MLIR Linalg findings revisited

anssik: we discussed Dwayne's extensive mapping table month ago, Linalg specifically

Mapping table
… I wanted to bump this topic to get feedback on the 6 primitive ops proposed for inclusion into WebNN API informed by this reasearch, they are:

1-D convolution with no channels

3-D convolution with no channels conv_3d

Fill output with random numbers fill_rng_2d

Sum pooling pooling_nchw_sum

Min pooling pooling_nhwc_min

Round(x) elementwise round

Dwayne: this set 6 is implementable across backends
… no direct mapping for all of them, most directly implementable
… I can fill in the table with all the ops, take a subset of three backends Chromium uses and show how they map to these

jsbell: following the usual process, if implemetable across backends, use cases, sounds good

Get to know Task-specific APIs and Prompt API

Incubations landing page

anssik: companion WebML Community Group now incubates selected task-specific APIs to enable reuse of the built-in models that are distributed as part of the browser or the underlying software platform
… as an observation, it seems Prompt API as a more general purpose and flexible API has received the most feedback
… on our last meeting we discussed how to versioning the models, and it was suggested as a topic to be discussed in this group, adapters and LoRAs, model management

Dom: open-ended question, one topic I'd expect to be brought up is questions on ethical considerations for ML
… would a model that is used by these APIs be documented in a way developers can learn how they're trained, bias, other qualities, a la Model Cards style info
… as we think bringing specific models as part of API surface in browsers, thinking about ethical aspects is important

Christian: good questions, tough to answer, I have given dozen of presentation on these APIs, can confirm Prompt API has the most interest
… restricting only for extensions is problem for developers
… ethical part, how to make sure the answers are safe?
… is there filtering, can I query?
… overall I'm happy we have these APIs here, and look forward to see more
… for specific issues, output constraining is important, our customers use function calling, multi-modal, Prompt API represents what LLMs were year ago

jsbell: +1 to what Christian said, we want to work through the Prompt API issues, also Dom thanks for ethical considerations

<dom> +1 that WebNN delegates that question rather than solving it :)

jsbell: in WebNN API there's transparency, but it pushes complexity to developer on how "responsible" the model is

jsbell: this is not the first time we have ML-backed APIs in the browser, we have Web Speech API

McCool: STT and TTS, use cases for Prompt API?

jsbell: we'll explore multimodal in 2025, current APIs are only text-to-text

Happy Holidays!

anssik: Thank You everyone for your significant contributions during 2024!
… our Working Group accomplished a lot this year
… a few highlights:
… WebNN API evolved driven by research into popular more advanced models, more diverse implementations
… WebNN API Candidate Rec Snapshot milestone was met in 2Q 2024
… the WG made in total ~100 spec publications and merged +180 PRs this year
… many new active contributors joined, Dwayne as a co-editor, the group grew and diversified further
… we converged on new API abstractions for tensors, device selection, defined op set principles
… we improved the spec quality significantly with expert advice
… we organized our first F2F in Anaheim and it was a blast
… we made strong progress on the implementations across 3 backends, XPUs, multiple OSes
… we witnessed positive buzz in the tech industry around WebNN, made a few keynote appearances
… a lot of exciting demos and samples were published, wpt test coverage improved
… and much more!
… we're entering an exciting phase of development in 2025
… the WebNN API is expected to get in the hands of more developers for large-scale trials, and more
… feedback from developers and users will help guide our priorities
… Happy Holidays everyone -- please relax, disconnect, and recharge
… see you on our next call 16 Jan 2025!

<anssik> s/… Dom submitted/anssik: Dom submitted

– DRAFT –
WebML WG Teleconference – 19 December 2024

19 December 2024

Attendees