Meeting minutes
Repository: webmachinelearning/webnn
Announcements
Implementation Status of WebNN Operations
anssik: we've launched a web developer-focused implementation status page to provide better visibility into implementations of WebNN API
… The total number of WebNN v1 ops is 60.
… this table currently lists ops that are fully implemented or WIP by multiple backends.
… that's why you see percentages lower than 100% below the table.
… the table is maintained on GitHub, PRs welcome
webnn-status.md source
… questions, comments?
WebNN v2: text-to-image and text-to-text use cases and requirements
anssik: today we will discuss prototyping findings from a transformer-based generative AI use case exploration
… the goal is to solicit input on new ops and data types required to support important models informed by our prototyping efforts
… on our last meeting Joshua shared his experiences with Transformers.js, please check out the great prensentation if you missed it
Transformers.js presentation
… today I'm happy to welcome Dwayne from Microsoft to share his explorations and findings in this space
… Dwayne has experimented with a WebNN DirectML prototype fdwr/
… as a reminder, we have a meta issue #375 for transformer-based models and use cases discussion
<ghurlbot> Issue 375 Support for transformers (dontcallmedom) v2
anssik: without further ado, let me welcome Dwayne to present what he has been up to lately
dwayner: Chromium fork with DML backend that is fairly mature
dwayner: ONNXRuntime for Web, in master branch
… v1 ops half implemented, "v2" 14/20 implemented
dwayner: SegmentAnything, Stable Diffusion model targets
dwayner: Segment Anything pretty small model, 39 ops, works in ORT for Web via WebNN EP, not in master Chromium but the fork currently
… caveat, static shapes only
dwayner: Stable Diffusion quite much bigger, f16 is 2.3 GB
… 38 ops, 5 models coordinated
dwayner: native implementations of ORT+DML, WebNN proto needs a bit more work, 2 ops in Chromium and 5 in ORTW
… unique challenges in fitting in 4GB limit in Wasm32
… browsers will also evict cache so need to download again sometimes
(small text warning)
dwayner: v2 ops enumerated on this slide
… notably, we need conversion between datatypes, every model has at least one cast op
… squeeze and unsqueeze needed
dwayner: missing MLOperandDataTypes: bool8, int64
… ORT Web could map these, can be emulated when backends don't support these types
dwayner: missing concepts, 0D scalars that everyone supports
… another gap, retrieving the built shape
dwayner: reshaping ops, WebNN has squeeze but no unsqueeze, flattenTo2d needed
dwayner: MVN, many ways to arrange axes, spatial normalization, spatial+batch, spatial+channel, spatial+grouped channel normalization
dwayner: dedicated ops for efficiency and semantics
… Pow(x, 0.5) and Div(1/x) as examples, implementation have dedicated instructions
dwayner: optimized operators, Olive optimizations rearrange the model, fure new ops, e.g. MultiheadAttantion
… many variations, prefer the community to settle on blessed approach
dwayner: a bunch of links for references
anssik: when do you plan to share running demos?
dwayner: anyone can now build and experiment with SegmentAnything demo
… I could package a file with components
Joshua_Lochner: nice to meet you!
… amazing presentation, it is very interesting to see how low-level things are implemented
<Joshua_Lochner> mlc-ai/
Joshua_Lochner: Stable Diffusion, have you seen the recent demo mlc-ai/
… WebGPU backend to run SD in the browser, quite large LLMs
… 4 GB Wasm32 limit I've also run into myself
… they're doing some swapping to avoid hitting the 4 GB limit, interesting in what optimizations are done there and can we learn from them
<Joshua_Lochner> huggingface/
Joshua_Lochner: SegmentAnything, how are you splitting encoder and decoder, precomputing embeddings, use mask decoder ...?
dwayner: compute embeddings offline
Joshua_Lochner: about separation, have you experimented with it to reuse embedding in the decode pass?
dwayner: no
Joshua_Lochner: too heavy operation?
dwayner: it could be possible, not tried that yet
ningxin_hu: thanks for sharing this exploration!
… ops v2, re squareroot, I can share on update that for XNNPACK backend we are implementing this, this is a good proposal if we can propose a dedicated one we can handle this special case
… helps align with the implementation in Chromium
… MVN is a good one too, want to look into this more and see how it can be handled by other backends
… in general good list, we can open GH issues for all these to investigate
anssik: tracking issue and specific issues for specific v2 ops sounds good
chai: thanks Dwayne! the v2 list we need to look over and see what makes sense to add to WebNN API
… when we first defined this WebNN spec one assumption has been that ops that are more facilitator-style, not requiring major compute should stay in the framework
… the WebNN purpose is to accelerate compute
… has been the case for ORT when we consider different backends that implement it
… some ops mentioned here need to be looked into more closer
… the number of ops we may want to add to WebNN eventually may not be as big as this list
… if it is something that facilitates graph execution we should think if those can be make in the framework instead, maintaining a huge number of ops is a burden
… what we add are the ones that are really needed
… because we talk about v2 we should have an API that supports versioning so the framework can detect what the browser supports
… we need to discuss on an approach for versioning / feature detection so we don't break content
… a topic for v2 op discussion
chai: implementation specific feedback we need for the spec
… what are the columns in https://
ningxin_hu: earlier we sent RFC to TF.js community to propose WebNN delegate for TFLite, it was approved by TF team in their SIG
… this column in the table represents the prototype implementation of that TFLite delegate
… this is working code and works with Chromium Canary
… we are discussing with TF community to integrate this TF delegate
… WebNN EP in ORT, this is a counterpart to that for TF
… TFLite column represents WebNN delegate implementation status
chai: second column is the browser implementation of CPU backend?
ningxin_hu: yes, in master
chai: I'd adjust the table to include all WebNN ops
… that'd be more understandable way to present this data
… caveat, XNNPACK column should highlight this is the impl status in the browser, and the last two columns are framework statuses
<ningxin_hu> +1
chai: this is great information to have
… would also add DML implementation status, not in Chromium master, so can add a footnote for that
… people coming to this page want to know what is in flight and coming up
WebIDL and Infra standard conventions
anssik: The zk-conventions-integration integration branch is now ready for review.
… These changes align the entire WebNN API with modern spec conventions and add stylistic improvements on top that make navigating this specification more delightful experience.
… The following resources are made available to the group to assist in this review task:
Preview of zk-conventions-integration
Diff between main and zk-conventions-integration
Issues addressed and PRs merged to zk-conventions-integration
anssik: Thanks Zoltan for this significant effort!
anssik: Next step is to review the entire thing and apply it as an atomic change to main.
… it is up to the editors whether to squash all of it to one commit or land this with its history.
… Zoltan, anything you'd like to say about this work
Zoltan: plan to use the integration branch for comments, can comment on PRs even if they are merged and closed
… will create another PR for the atomic change
… one more issue to discuss with Chai re sync/async PR whether to land in main of integration branch
… my staging area contains those changes so prefer to clear that
chai: thanks Zoltan!
… once approved we have everything merged, then no need for this branch?
Zoltan: all changes merged we have a single commit on the main, if we want to keep the history can merge from the branch
chai: there are multiple PRs over time that caused the issue in the queue, because reviewers did not know in which order to review, so atomicity preferred
… the second issue is the ongoing updates to align with WebIDL conventions, important for the overall pipeline, also important for other upcoming changes
… want to have these changes settled on the doc once for all, because subsequent changes need to align with these convention
… once this is merged, is this all settled for conventions and for style?
… once WebIDL conventions updates are done, any subsequent change will be more specific, correct?
Zoltan: correct
… that is why we stay a bit longer on the integration branch to ensure we meet the editorial quality expectations wrt conventions
chai: once done we abandon the integration branch?
Zoltan: correct
chai: sounds good
Zoltan: this does not stop main branch from advancing, I will track main
chai: we should try to apply the new conventions asap
Zoltan: only open is external review, otherwise we're good
chai: would like to review the content in the integration branch and merge it into main in a one commit everything squashed into one commit
<ningxin_hu> +1
ningxin_hu: +1 to what chai said
Summer break
anssik: we'll take a break in July and will resume our WG calls 10 August.
… GH remains open 24/7 and welcomes your contributions while the meetings are on a pause.
… Thanks for the amazing first half of 2023! We've achieved a lot, published a CR, grew our group, advanced implementations. And the second half looks even more exciting!