Meeting minutes
Repository: webmachinelearning/webnn
TPAC F2F agenda published
<gb> Issue 180 not found
webmachinelearning/
<gb> Issue 25 WebML WG - TPAC 2024 agenda (by anssiko)
anssik: TPAC F2F agenda has been published. We'll do fine-tuning and gardening: confirm our topic leads, timeboxes, discuss group contributions that'd be welcome by F2F.
anssik: current F2F plan is as follows
… Start meeting 9:00 local / 16:00 UTC
… 15-sec WG participant intros: name, affiliation, top interest [10 mins]
… currently 20 group participants and 17 observers registered in person
… Agenda bashing for last-minute drops/adds, not much wiggle room to add things [5 mins]
… Charter orientation mainly to inform observers and folks from other WGs [5 mins]
Ethics
anssik: - 9:30-10:00 local / 16:30-17:00 UTC in-browser explainability libraries and viz tools (Jay Wang, OpenAI / Georgia Tech) [30 mins]
… for a sneak-peek, see Jay's PhD defense:
Democratizing Human-Centered AI with Visual Explanation and Interactive Guidance
Spec orientation
anssik: - Triage pass through open issues: breaking changes, priorities, next steps for the issue (All)
… to prep for this triage exercise, please bring with you your favorite issues and prepare to share your suggestions for next steps
… focus on issues w/o agenda time already allocated
… first come, first served
New features
anssik: - A refreshed analysis of popular models, operator & data type gaps
… Dwayne would you be able to present a summary that fits into 20 mins?
Dwayne: 15 minutes is good
… - Quantization and dequantization (QDQ), a bag of multiple issues
Austin: someone from Google can present this
anssik: - Platform capability detection
… spec definition and prototype done!
… discuss impl experience, deferred rankRange feature
Austin: someone from Google side can talk to this, 5 min, TBD if needs discussion
anssik: ... - Future-proof device selection abstractions
… MikeW, can prepare a concrete proposal that'd work for WebKit for the group to discuss?
Mike: of course, can talk to this issue and provide a proposal, 5 minutes should be fine
Customer feedback & collaborations
anssik: 13:30-14:00 local - WebLLM and MLC
… Tianqi Chen to share with us the latest from the WebLLM, MLC, Apache TVM
… he was interested in WebNN-WebGPU interop progress, so good to share that with him
… - Transformers.js WebNN backend
JoshuaL: will attend virtually, will showcase Transformers.js WebNN demo, EP selection via ONNX Runtime, demo in the works with up-to-date visual models
… 10-15 mins user-centric demonstration, not too low-level on the details
… - ONNX Runtime Web & WebNN EP
JoshuaL: can ask Gunther if he could present on this topic
… 10-15 mins
anssik: - Google Chrome Feedback revisited #453
<gb> Issue 453 Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability (by vsekhar) [process] [opset] [use case]
Austin: a lot of the points in this are overlapping with other issues, so we will check if we should keep this on the agenda
… do not need a lot of time regardless
Interop and cross-group coordination
Austin: - Interop issues across different backends
… Ningxin, how much time to allocate for this?
<ningxin> https://
ningxin: I want to work through issues labeled with "interop", some are covered in "device selection", ConstantOperand also has a dedicated issue
… int64 is one interesting issue with MikeW feedback
… zero-sized dimensions
… want to triage all "interop" issue for discussion
anssik: -Core operator #573
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
Austin: related to custom op incubation, we can merge this with custom op
ningxin: we may reveal other options in incubation section, core op set is a good foundation
Austin: we can have a few minutes for this
anssik: -MLTensor #754
<gb> Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]
anssik: Austin, possibly Corentin from WebGPU joining
… - MLConstantOperand #668
<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question] [interop]
Implementation plans and trials
anssik: Next step for implementations, Origin Trial or equivalent and align with framework developer feedback
ningxin: we're at CR "call for implementations", use this as an opportunity to get signals from implementers
Advancement on the W3C Rec Track
anssik: - Wide review status, close on TAG review feedback
… - W3C “living standards” expectations and plan with Dom
Dom: personally would suggest to not go to living CR model
Incubation
anssik: - Custom ops
… Ningxin, 20 mins
… - Built-in APIs for translation and prompting
… Domenic, ~20 mins
… - Model management
… Michael McCool, ~10-15 mins
McCool: will have an additional breakout session
Wrap up & Dinner
anssik: our F2F group is quite big so we're not able to book one restaurant
<dom> AI-related TPAC breakouts
anssik: one option would be to go to Anaheim Packing District 2.5 miles from the meeting venue
… "There are several ways to enjoy the culinary artistry from our small businesses, including takeout, delivery, indoor dining and outdoor dining."
… a wide range of options so could split by food preference
… if some Anaheim local foodie has recommendations, please share!
Device selection abstractions
anssik: We agreed to evolve MLContextOptions and other API controls for device selection informed by further implementation experience and new use cases from the wider web community.
… I wanted to discuss how frameworks and backends approach configuring inference sessions, understand framework-level use cases and requirements. Distill learnings to help evolve MLContextOptions.
… ONNX Runtime Web does provide some knobs, which vary by backend
anssik: for example:
… executionMode?: "sequential" | "parallel" - Controls whether multiple operators in the graph across nodes run sequentially or in parallel
… graphOptimizationLevel?: "basic" | "all" | "disabled" | "extended" - graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions and layout optimizations
… there's also WebNN EP Options
anssik: WebNNExecutionProviderOption: WebNNOptionsWithoutMLContext | WebNNOptionsWithMLContext | WebNNOptionsWebGpu
… interface WebNNOptionsWithMLContext extends MLContextOptions with numThreads
… I wanted to understand whether all these knobs are purely ONNX implementation details or whether some of these could be WebNN MLContextOptions hints.
JoshuaL: would like to have Gunther involved, he'd be an expert in this
… number of threads used for Wasm EP, proxy by WebWorker
… I talked with Gunther that ORT docs are not up to date
… for Transformers.js we set some good defaults, set num of threads
anssik: I also wanted to remind us of CoreML limitations and on what is implementable on Apple platforms, see issue #749 where MikeW provided information about CoreML MLComputeUnits configuration limitations
<gb> Issue 749 MLContextOptions.deviceType seems unnecessary outside of conformance testing (by mwyrzykowski) [device selection]
https://
<Joshua_Lochner> Here's the env: https://
anssik: the current WebNN API device selection mechanism we declared subject to change is:
MikeW: not closely following framework configuration options, but will check this ahead of TPAC discussion
MLTensor
anssik: MLTensor explainer PR #754 received a lot of great feedback and review, thank you!
<gb> Pull Request 754 Add MLTensor explainer (by a-sully) [webgpu interop]
Austin: thanks for all the feedback!
… re Corentin's feedback, MLTensor makes an assumption the MLTensorOptions whether readable from script (or not) is sufficient to know where to allocate the tensor on the system
… open questions if this is true, if device selection is tied to context
… Corentin pointed out there's issues with making GPU wait on a fence from another place in the system
… there's still some open questions, with more clarify on device selection not sure how much progress we can make
anssik: it looks like GH feedback from Rafael, Dwayne, Ningxin addressed
… GH feedback from Bryan, Phillis and Corentin is still being worked on
anssik: merging an explainer is lower bar than merging a spec PR
Austin: will add notes for Corentin's feedback in the explainer
Dom: what would be the next step before getting this into the spec prose and IDL?
… is it device selection clarification?
Austin: we need to specify timelines
… especially for WebGPU interop
… WebGPU has its own concept of timelines and need to define how WebNN interacts with that timeline
… when WebNN rents MLTensor WebNN timeline needs to wait when returned
… WebNN timeline relationship with WebGPU timeline
… for context, Chai had an earlier PR that said WebNN's timelime is same as WebGPU CommandBuffer
… that matches DirectML model, but not CoreML, TFLite
… from spec perspective timelines must be a bit more decoupled
McCool: I was planning to prototype caching based on MLBuffer/MLTensor, weights from model that can be restored
Austin: MLTensor used for input and outputs, the challenge is, sharing weights across models is not something that is supported by DirectML
See you at TPAC F2F!
anssik: see you at TPAC F2F on 23 September 2024 for a day of exciting discussions and more!
… I'll cancel the bi-weekly telcons that have close proximity with the TPAC week:
… - cancel WebML WG Teleconference – 19 September 2024
… - cancel WebML WG Teleconference – 3 October 2024
… Our next call after TPAC will be 17 October 2024
<anssik> s/thread user/threads used