WebML WG Teleconference – 29 June 2023

Meeting minutes

Repository: webmachinelearning/webnn

Announcements

Implementation Status of WebNN Operations

anssik: we've launched a web developer-focused implementation status page to provide better visibility into implementations of WebNN API
… The total number of WebNN v1 ops is 60.
… this table currently lists ops that are fully implemented or WIP by multiple backends.
… that's why you see percentages lower than 100% below the table.
… the table is maintained on GitHub, PRs welcome

webnn-status.md source
… questions, comments?

WebNN v2: text-to-image and text-to-text use cases and requirements

anssik: today we will discuss prototyping findings from a transformer-based generative AI use case exploration
… the goal is to solicit input on new ops and data types required to support important models informed by our prototyping efforts
… on our last meeting Joshua shared his experiences with Transformers.js, please check out the great prensentation if you missed it

Transformers.js presentation
… today I'm happy to welcome Dwayne from Microsoft to share his explorations and findings in this space
… Dwayne has experimented with a WebNN DirectML prototype fdwr/chromium-src-webnn-dml#1 to inform us with running code
… as a reminder, we have a meta issue #375 for transformer-based models and use cases discussion

<ghurlbot> Issue 375 Support for transformers (dontcallmedom) v2

anssik: without further ado, let me welcome Dwayne to present what he has been up to lately

Slideset: https://lists.w3.org/Archives/Public/www-archive/2023Jun/att-0005/2023-06-29_WebNN_and_Transformers_Progress_W3C.pdf

[Slide 1]

[Slide 2]

dwayner: Chromium fork with DML backend that is fairly mature

[Slide 3]

dwayner: ONNXRuntime for Web, in master branch
… v1 ops half implemented, "v2" 14/20 implemented

[Slide 4]

dwayner: SegmentAnything, Stable Diffusion model targets

[Slide 5]

dwayner: Segment Anything pretty small model, 39 ops, works in ORT for Web via WebNN EP, not in master Chromium but the fork currently
… caveat, static shapes only

[Slide 6]

dwayner: Stable Diffusion quite much bigger, f16 is 2.3 GB
… 38 ops, 5 models coordinated

[Slide 7]

dwayner: native implementations of ORT+DML, WebNN proto needs a bit more work, 2 ops in Chromium and 5 in ORTW
… unique challenges in fitting in 4GB limit in Wasm32
… browsers will also evict cache so need to download again sometimes

[Slide 8]

(small text warning)

[Slide 9]

dwayner: v2 ops enumerated on this slide
… notably, we need conversion between datatypes, every model has at least one cast op
… squeeze and unsqueeze needed

[Slide 10]

dwayner: missing MLOperandDataTypes: bool8, int64
… ORT Web could map these, can be emulated when backends don't support these types

[Slide 11]

dwayner: missing concepts, 0D scalars that everyone supports
… another gap, retrieving the built shape

[Slide 12]

dwayner: reshaping ops, WebNN has squeeze but no unsqueeze, flattenTo2d needed

[Slide 13]

dwayner: MVN, many ways to arrange axes, spatial normalization, spatial+batch, spatial+channel, spatial+grouped channel normalization

[Slide 14]

dwayner: dedicated ops for efficiency and semantics
… Pow(x, 0.5) and Div(1/x) as examples, implementation have dedicated instructions

[Slide 15]

dwayner: optimized operators, Olive optimizations rearrange the model, fure new ops, e.g. MultiheadAttantion
… many variations, prefer the community to settle on blessed approach

[Slide 16]

dwayner: a bunch of links for references

anssik: when do you plan to share running demos?

dwayner: anyone can now build and experiment with SegmentAnything demo
… I could package a file with components

Joshua_Lochner: nice to meet you!
… amazing presentation, it is very interesting to see how low-level things are implemented

<Joshua_Lochner> mlc-ai/web-stable-diffusion

Joshua_Lochner: Stable Diffusion, have you seen the recent demo mlc-ai/web-stable-diffusion
… WebGPU backend to run SD in the browser, quite large LLMs
… 4 GB Wasm32 limit I've also run into myself
… they're doing some swapping to avoid hitting the 4 GB limit, interesting in what optimizations are done there and can we learn from them

<Joshua_Lochner> huggingface/optimum#1078

Joshua_Lochner: SegmentAnything, how are you splitting encoder and decoder, precomputing embeddings, use mask decoder ...?

dwayner: compute embeddings offline

Joshua_Lochner: about separation, have you experimented with it to reuse embedding in the decode pass?

dwayner: no

Joshua_Lochner: too heavy operation?

dwayner: it could be possible, not tried that yet

ningxin_hu: thanks for sharing this exploration!
… ops v2, re squareroot, I can share on update that for XNNPACK backend we are implementing this, this is a good proposal if we can propose a dedicated one we can handle this special case
… helps align with the implementation in Chromium
… MVN is a good one too, want to look into this more and see how it can be handled by other backends
… in general good list, we can open GH issues for all these to investigate

anssik: tracking issue and specific issues for specific v2 ops sounds good

chai: thanks Dwayne! the v2 list we need to look over and see what makes sense to add to WebNN API
… when we first defined this WebNN spec one assumption has been that ops that are more facilitator-style, not requiring major compute should stay in the framework
… the WebNN purpose is to accelerate compute
… has been the case for ORT when we consider different backends that implement it
… some ops mentioned here need to be looked into more closer
… the number of ops we may want to add to WebNN eventually may not be as big as this list
… if it is something that facilitates graph execution we should think if those can be make in the framework instead, maintaining a huge number of ops is a burden
… what we add are the ones that are really needed
… because we talk about v2 we should have an API that supports versioning so the framework can detect what the browser supports
… we need to discuss on an approach for versioning / feature detection so we don't break content
… a topic for v2 op discussion

chai: implementation specific feedback we need for the spec
… what are the columns in https://webmachinelearning.github.io/webnn-status/ and how are they implemented in TFLite

ningxin_hu: earlier we sent RFC to TF.js community to propose WebNN delegate for TFLite, it was approved by TF team in their SIG
… this column in the table represents the prototype implementation of that TFLite delegate
… this is working code and works with Chromium Canary
… we are discussing with TF community to integrate this TF delegate
… WebNN EP in ORT, this is a counterpart to that for TF
… TFLite column represents WebNN delegate implementation status

chai: second column is the browser implementation of CPU backend?

ningxin_hu: yes, in master

chai: I'd adjust the table to include all WebNN ops
… that'd be more understandable way to present this data
… caveat, XNNPACK column should highlight this is the impl status in the browser, and the last two columns are framework statuses

<ningxin_hu> +1

chai: this is great information to have
… would also add DML implementation status, not in Chromium master, so can add a footnote for that
… people coming to this page want to know what is in flight and coming up

WebIDL and Infra standard conventions

anssik: The zk-conventions-integration integration branch is now ready for review.
… These changes align the entire WebNN API with modern spec conventions and add stylistic improvements on top that make navigating this specification more delightful experience.
… The following resources are made available to the group to assist in this review task:

Preview of zk-conventions-integration

Diff between main and zk-conventions-integration

Issues addressed and PRs merged to zk-conventions-integration

Commits ahead of main

anssik: Thanks Zoltan for this significant effort!

anssik: Next step is to review the entire thing and apply it as an atomic change to main.
… it is up to the editors whether to squash all of it to one commit or land this with its history.
… Zoltan, anything you'd like to say about this work

Zoltan: plan to use the integration branch for comments, can comment on PRs even if they are merged and closed
… will create another PR for the atomic change
… one more issue to discuss with Chai re sync/async PR whether to land in main of integration branch
… my staging area contains those changes so prefer to clear that

chai: thanks Zoltan!
… once approved we have everything merged, then no need for this branch?

Zoltan: all changes merged we have a single commit on the main, if we want to keep the history can merge from the branch

chai: there are multiple PRs over time that caused the issue in the queue, because reviewers did not know in which order to review, so atomicity preferred
… the second issue is the ongoing updates to align with WebIDL conventions, important for the overall pipeline, also important for other upcoming changes
… want to have these changes settled on the doc once for all, because subsequent changes need to align with these convention
… once this is merged, is this all settled for conventions and for style?
… once WebIDL conventions updates are done, any subsequent change will be more specific, correct?

Zoltan: correct
… that is why we stay a bit longer on the integration branch to ensure we meet the editorial quality expectations wrt conventions

chai: once done we abandon the integration branch?

Zoltan: correct

chai: sounds good

Zoltan: this does not stop main branch from advancing, I will track main

chai: we should try to apply the new conventions asap

Zoltan: only open is external review, otherwise we're good

chai: would like to review the content in the integration branch and merge it into main in a one commit everything squashed into one commit

<ningxin_hu> +1

ningxin_hu: +1 to what chai said

webmachinelearning/webnn#210 (comment)

Summer break

anssik: we'll take a break in July and will resume our WG calls 10 August.
… GH remains open 24/7 and welcomes your contributions while the meetings are on a pause.
… Thanks for the amazing first half of 2023! We've achieved a lot, published a CR, grew our group, advanced implementations. And the second half looks even more exciting!

– DRAFT –
WebML WG Teleconference – 29 June 2023

29 June 2023

Attendees

Meeting minutes

Announcements

WebNN v2: text-to-image and text-to-text use cases and requirements

WebIDL and Infra standard conventions

Summer break

Diagnostics