WebML WG Teleconference – 7 April 2022

Meeting minutes

Ethical Web Machine Learning

james: we've published a 2nd of a WG note on ethical considerations for ML https://webmachinelearning.github.io/ethical-webmachinelearning/
… we ran 2 brainstorm sessions for feedback & input on risks & mitigations this week
… both to generate content and set up a process for continuous concrete input on these topics
… we have 8 people on Monday, 10-12 on Tuesday
… they went really well and generated lots of useful input and positive feedback
… the next step will be to integrate the feedback into the document
… and bring content into the operationalization section that would include a white label version of that workshop as a tool to continue thinking on these topics
… we plan to bring the doc for approval by WG for publication as a note - I'll be pausing my involvement for a few months, but hope the work can continue forward

Proposed new use cases

WebNN API in gaming scenarios

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Apr/att-0000/WebML_WG_Intro.pdf

Anssi: introducing Tracy, senior engineer from Unity, interested in bringing WebNN into their Web backend

[ Slide 1 ]

Tracy: I was at Microsoft, most recently working on DirectML & +

[ Slide 2 ]

Tracy: moved to Unity in January to work on Barracuda, a lightweight runtime for inferences
… running on phones, handheld consoles, games consoles, PCs
… it also supports a WebGL-based Web player

[ Slide 3 ]

Tracy: It is used for a variety of use cases internally: reinforcement learning on objects to bring more intelligent behaviors
… lots of R&D with neural rendering
… public API for unity customers to use for the ML use cases
… Some of the use cases we also have is where Unity is both the editor and the runtime player

[ Slide 4 ]

Tracy: the content tools are done with web pages - it allows to distribute them more quickly to end users
… some of the neural rendering techniques are using WebGL, which can't keep up with what would be expected from a desktop GPU
… WebGPU would already help with improvements, but Chai mentioned WebNN could provide another target to achieve best performance while operating in Web pages
… we're exploring the topic, may be doing prototyping and benchmarking performance

Ningxin: very exciting use case!
… do you have any special requirement in terms of ML functionality to work with the rendering component to work together?
… for example, real-time super-resolution in games

Tracy: operating with a game engine is a constrained environment, even in non Web cases
… the frame budget is restricted
… many of the models in the game cases (e.g. smart agent) end up running on CPU to avoid interfering with the GPU time budget
… this needs careful management
… in terms of the editor experience (where we're using the Web today) don't have the same constraints

ningxin: one of the target for WebNN aims to serve as backend for frameworks such as tf.js or onnx
… it would be interesting to collaborate on experiments

dom: any requirements for storage or memory management in the editor context on desktop? is that a constraint you're hitting

dom: any particular constraint on memory & storage in your use case

tracy: not that I know of atm

ningxin: we are discussing having an async API vs a synchronous API only within a web worker
… based on our prototypes with other frameworks transpiled from C++ into WASM, we see the requirement to call API synchronously
… but the sync API can have issues, e.g. making the API janky if running in the main thread
… in your framework, is there any requirement for sync / async calls? main thread vs worker?

Tracy: everything is asynchronous in our API
… nothing should block the rendering of the game thread

ningxin: with the work done in the worker - is the code running the inference sync or async?

tracy: this would be flexible, depending on the backend

ningxin: ok, so no restriction on sync vs async

anssi: the WebNN API started with a bunch of use cases; are we getting new requirements from this scenario? or does the API satisfy its requirements?

tracy: I'll review the spec to give a more informed answer on potential gaps

RafaelCintron: in your usage of ML in Unity - is the expectations that models will be pre-set, or will the graph be updated during runtime?

Tracy: more of the former; some of the weights might change, but mostly baked

Context-based graph execution methods for different threading models

Rafael: I won't be on the call in two weeks - the crux of the question is how much dependency we want to have on WebGPU

ningxin_hu: we need a clear boundary between WebNN & WebGPU

Context-based graph execution methods for different threading models PR #257

ningxin_hu: we should target the WebNN based on typical use; if WebNN is going to be expected to be used mostly in standalone mode, WebGPU integration should be hidden
… MLContext.compute & MLContext.asyncCompute would be the primary usage
… for WebGPU integration, I would support the typical WebGPU usage
… with a common buffer and queue that a WebGPU developer would be familiar with to make WebNN more friendly to them
… MLCommandBuffer or MLCommandEncoder would be closer to that approach
… we haven't had much investigation on the integration with the WebGL pipeline - more work may be needed there
… we should target typical usages of the API
… this will help orient the developers toward the right path

RafaelCintron: this is a useful classification; remains to be seen how much this needs to be reflected into different API shapes
… there is a discussion in the WebGPU WG on similar topics
… on async to avoid jank
… with similarly people porting from native wanting to have a sync version
… e.g. a Snap funny hat implementation
… the group is leaning towards sync only in worker; but struggling with very similar discussions indeed

Candidate Recommendation readiness

anssi: in terms of use cases - do we feel that we have captured the use cases for this API? any major gap?
… e.g. anything emerging from the background blur discussion that should be reflected in the use cases?

<RafaelCintron> WebGPU Sync vs Async Github Issue: mapSync on Workers - and possibly on the main thread (https://github.com/gpuweb/gpuweb/issues/2217 )

Integration with real-time video processing

anssi: please review use cases for potential gaps

dom: the best demonstration will be having frameworks that run with a WebNN backend, with associated demonstrated performance benefits

ningxin_hu: the video processing prototype has a dependency to WebGPU integration - right now it relies on the WebGPU API to import video frame, pre-process it with a WebGPU shader into a WebGPU buffer on which the inference is running
… but there is also an alternative where VideoFrame becomes a directly usable primitive for WebNN, as Chai suggested in the discussion
… There may be value in supporting other models beyond the one we're using for segmentation

The first-wave models

ningxin_hu: we may want to revisit the model list to integrate such a support

WebNN should support int8 quantized models #128

ningxin_hu: For Data Types, there are open issues about quantized and int8 models - we need to investigate this in more depth
… Typical AI accelerators in GPU hardware optimize for quantized data types
… Performance benefits are key to the work

<ningxin_hu> +1

dom: maybe useful to derive semi-formal requirements from the use cases, incl performance

anssi: I've assumed documenting them as issues was enough, but would not oppose to document them formally if people find this useful
… maybe we can label issues as requirements

Ethical Considerations

anssi: we'll have need ethical considerations
… and document implementation experience
… supported by Web Platform Tests, the webnn-baseline and a browser implementation (targeting Chromium atm)

dom: note that implementation experience is not a requirement to get into CR
… (although double implementation experience is definitely need to get out of CR)
… in terms of wide review, we had TAG review, first security review (now tagged for horizontal review)
… we'll do another privacy review, and will have to lauch accessibility and internationalization
… and then WebGPU integration review
… we're on good track

<anssik> s|Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Apr/att-0000/WebML_WG_Intro.pdf\\nAnssi: introducing Tracy|Anssi: introducing Tracy|

– DRAFT –
WebML WG Teleconference – 7 April 2022

07 April 2022

Attendees