WebML WG Teleconference – 5 September 2024

Meeting minutes

Repository: webmachinelearning/webnn

TPAC F2F agenda published

<gb> Issue 180 not found

<gb> Issue 25 WebML WG - TPAC 2024 agenda (by anssiko)

anssik: TPAC F2F agenda has been published. We'll do fine-tuning and gardening: confirm our topic leads, timeboxes, discuss group contributions that'd be welcome by F2F.

anssik: current F2F plan is as follows
… Start meeting 9:00 local / 16:00 UTC
… 15-sec WG participant intros: name, affiliation, top interest [10 mins]
… currently 20 group participants and 17 observers registered in person
… Agenda bashing for last-minute drops/adds, not much wiggle room to add things [5 mins]
… Charter orientation mainly to inform observers and folks from other WGs [5 mins]

Ethics

anssik: - 9:30-10:00 local / 16:30-17:00 UTC in-browser explainability libraries and viz tools (Jay Wang, OpenAI / Georgia Tech) [30 mins]
… for a sneak-peek, see Jay's PhD defense:

Democratizing Human-Centered AI with Visual Explanation and Interactive Guidance

Spec orientation

anssik: - Triage pass through open issues: breaking changes, priorities, next steps for the issue (All)
… to prep for this triage exercise, please bring with you your favorite issues and prepare to share your suggestions for next steps
… focus on issues w/o agenda time already allocated
… first come, first served

Triage guidance

New features

anssik: - A refreshed analysis of popular models, operator & data type gaps
… Dwayne would you be able to present a summary that fits into 20 mins?

Dwayne: 15 minutes is good
… - Quantization and dequantization (QDQ), a bag of multiple issues

Austin: someone from Google can present this

anssik: - Platform capability detection
… spec definition and prototype done!
… discuss impl experience, deferred rankRange feature

Austin: someone from Google side can talk to this, 5 min, TBD if needs discussion

anssik: ... - Future-proof device selection abstractions
… MikeW, can prepare a concrete proposal that'd work for WebKit for the group to discuss?

Mike: of course, can talk to this issue and provide a proposal, 5 minutes should be fine

Customer feedback & collaborations

anssik: 13:30-14:00 local - WebLLM and MLC
… Tianqi Chen to share with us the latest from the WebLLM, MLC, Apache TVM
… he was interested in WebNN-WebGPU interop progress, so good to share that with him
… - Transformers.js WebNN backend

JoshuaL: will attend virtually, will showcase Transformers.js WebNN demo, EP selection via ONNX Runtime, demo in the works with up-to-date visual models
… 10-15 mins user-centric demonstration, not too low-level on the details
… - ONNX Runtime Web & WebNN EP

JoshuaL: can ask Gunther if he could present on this topic
… 10-15 mins

anssik: - Google Chrome Feedback revisited #453

<gb> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar) [process] [opset] [use case]

Austin: a lot of the points in this are overlapping with other issues, so we will check if we should keep this on the agenda
… do not need a lot of time regardless

Interop and cross-group coordination

Austin: - Interop issues across different backends
… Ningxin, how much time to allocate for this?

<ningxin> https://github.com/webmachinelearning/webnn/issues?q=is%3Aissue+is%3Aopen+label%3Ainterop

ningxin: I want to work through issues labeled with "interop", some are covered in "device selection", ConstantOperand also has a dedicated issue
… int64 is one interesting issue with MikeW feedback
… zero-sized dimensions
… want to triage all "interop" issue for discussion

anssik: -Core operator #573

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

Austin: related to custom op incubation, we can merge this with custom op

ningxin: we may reveal other options in incubation section, core op set is a good foundation

Austin: we can have a few minutes for this

anssik: -MLTensor #754

<gb> Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]

anssik: Austin, possibly Corentin from WebGPU joining
… - MLConstantOperand #668

<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question] [interop]

Implementation plans and trials

anssik: Next step for implementations, Origin Trial or equivalent and align with framework developer feedback

ningxin: we're at CR "call for implementations", use this as an opportunity to get signals from implementers

Advancement on the W3C Rec Track

anssik: - Wide review status, close on TAG review feedback
… - W3C “living standards” expectations and plan with Dom

Dom: personally would suggest to not go to living CR model

Incubation

anssik: - Custom ops
… Ningxin, 20 mins
… - Built-in APIs for translation and prompting
… Domenic, ~20 mins
… - Model management
… Michael McCool, ~10-15 mins

McCool: will have an additional breakout session

Wrap up & Dinner

anssik: our F2F group is quite big so we're not able to book one restaurant

<dom> AI-related TPAC breakouts

anssik: one option would be to go to Anaheim Packing District 2.5 miles from the meeting venue
… "There are several ways to enjoy the culinary artistry from our small businesses, including takeout, delivery, indoor dining and outdoor dining."
… a wide range of options so could split by food preference
… if some Anaheim local foodie has recommendations, please share!

https://www.anaheimpackingdistrict.com/merchants

Device selection abstractions

anssik: We agreed to evolve MLContextOptions and other API controls for device selection informed by further implementation experience and new use cases from the wider web community.
… I wanted to discuss how frameworks and backends approach configuring inference sessions, understand framework-level use cases and requirements. Distill learnings to help evolve MLContextOptions.
… ONNX Runtime Web does provide some knobs, which vary by backend

Session Options API

anssik: for example:
… executionMode?: "sequential" | "parallel" - Controls whether multiple operators in the graph across nodes run sequentially or in parallel
… graphOptimizationLevel?: "basic" | "all" | "disabled" | "extended" - graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions and layout optimizations
… there's also WebNN EP Options

WebNN EP Session Options API

anssik: WebNNExecutionProviderOption: WebNNOptionsWithoutMLContext | WebNNOptionsWithMLContext | WebNNOptionsWebGpu
… interface WebNNOptionsWithMLContext extends MLContextOptions with numThreads
… I wanted to understand whether all these knobs are purely ONNX implementation details or whether some of these could be WebNN MLContextOptions hints.

JoshuaL: would like to have Gunther involved, he'd be an expert in this
… number of threads used for Wasm EP, proxy by WebWorker
… I talked with Gunther that ORT docs are not up to date
… for Transformers.js we set some good defaults, set num of threads

anssik: I also wanted to remind us of CoreML limitations and on what is implementable on Apple platforms, see issue #749 where MikeW provided information about CoreML MLComputeUnits configuration limitations

<gb> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection]

https://developer.apple.com/documentation/coreml/mlcomputeunits/

<Joshua_Lochner> Here's the env: https://github.com/microsoft/onnxruntime/blob/main/js/common/lib/env.ts

anssik: the current WebNN API device selection mechanism we declared subject to change is:

Device selection

MLContextOptions

MikeW: not closely following framework configuration options, but will check this ahead of TPAC discussion

MLTensor

anssik: MLTensor explainer PR #754 received a lot of great feedback and review, thank you!

<gb> Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]

MLTensor Explainer (preview)

Austin: thanks for all the feedback!
… re Corentin's feedback, MLTensor makes an assumption the MLTensorOptions whether readable from script (or not) is sufficient to know where to allocate the tensor on the system
… open questions if this is true, if device selection is tied to context
… Corentin pointed out there's issues with making GPU wait on a fence from another place in the system
… there's still some open questions, with more clarify on device selection not sure how much progress we can make

anssik: it looks like GH feedback from Rafael, Dwayne, Ningxin addressed
… GH feedback from Bryan, Phillis and Corentin is still being worked on

anssik: merging an explainer is lower bar than merging a spec PR

Austin: will add notes for Corentin's feedback in the explainer

Dom: what would be the next step before getting this into the spec prose and IDL?
… is it device selection clarification?

Austin: we need to specify timelines
… especially for WebGPU interop
… WebGPU has its own concept of timelines and need to define how WebNN interacts with that timeline
… when WebNN rents MLTensor WebNN timeline needs to wait when returned
… WebNN timeline relationship with WebGPU timeline
… for context, Chai had an earlier PR that said WebNN's timelime is same as WebGPU CommandBuffer
… that matches DirectML model, but not CoreML, TFLite
… from spec perspective timelines must be a bit more decoupled

McCool: I was planning to prototype caching based on MLBuffer/MLTensor, weights from model that can be restored

Austin: MLTensor used for input and outputs, the challenge is, sharing weights across models is not something that is supported by DirectML

<dom> Meeting Taxis for TPAC 2024

See you at TPAC F2F!

anssik: see you at TPAC F2F on 23 September 2024 for a day of exciting discussions and more!
… I'll cancel the bi-weekly telcons that have close proximity with the TPAC week:
… - cancel WebML WG Teleconference – 19 September 2024
… - cancel WebML WG Teleconference – 3 October 2024
… Our next call after TPAC will be 17 October 2024

<anssik> s/thread user/threads used

– DRAFT –
WebML WG Teleconference – 5 September 2024

05 September 2024

Attendees