14:47:17 Meeting: WebML WG Teleconference – 7 March 2024
14:47:21 Chair: Anssi
14:47:26 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-03-07-wg-agenda.md
14:48:01 Present+ Anssi_Kostiainen
14:48:05 Regrets+ Ningxin_Hu
14:51:47 grgustaf has joined #webmachinelearning
14:54:24 Present+ Geoff_Gustafson
14:56:10 Present+ Michael_McCool
14:56:43 Present+ Etienne_Noel
14:58:59 jsbell has joined #webmachinelearning
14:59:14 Present+ Joshua_Bell
14:59:26 Present+ Rachel_Yager
14:59:31 Rach has joined #webmachinelearning
15:00:17 Present+ Joshua_Lochner
15:00:30 Joshua_Lochner has joined #webmachinelearning
15:00:40 McCool has joined #webmachinelearning
15:01:03 etiennenoel has joined #webmachinelearning
15:01:17 Present+ Zoltan_Kis
15:01:19 asully has joined #webmachinelearning
15:01:36 Present+ Sudeep_Divakaran
15:01:44 Present+ Dwayne_Robinson
15:01:53 Present+ Ilya_Rezvov
15:01:59 Present+ Bryan_Bernhart
15:03:04 anssik: Welcome to a new participant, Ilya Rezvov from Google
15:03:52 Ilya: I work for Google on Wasm mostly, recently looking at ML-related efforts, fp16 in Wasm specifically and want to extend my interests to other areas of ML on the web
15:04:29 Topic: Call for Consensus: WebNN API CR Snapshot
15:04:45 anssik: On 28 Feb 2024 we issued a Call for Consensus (CfC) to publish the Web Neural Network API as a new Candidate Recommendation Snapshot (CRS).
15:04:55 -> CfC to publish WebNN API Candidate Recommendation Snapshot - review by 7 Mar 2024
15:04:55 https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2024Feb/0006.html
15:04:59 anssik: the WG has received explicit support for this CfC and no concerns have been raised
15:05:08 anssik: our CR readiness tracker #240 shows all green except for the WIP TAG delta review
15:05:08 https://github.com/webmachinelearning/webnn/issues/240 -> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process]
15:05:31 anssik: the TAG delta review in flight may raise some questions at the transition time so we should be prepared for that
15:05:57 anssik: that said, I hope we can address this in flight by noting this publication in fact explicitly addresses earlier TAG review feedback by removing support for synchronous execution
15:06:21 anssik: anssik: I will handle CRS transition logistics with Dom and will ask the WG for further information as needed
15:06:29 Deepti has joined #webmachinelearning
15:06:41 anssik: we can still merge the currently open PRs before branching for this release, it is important that after each commit the spec remains in a cohesive state
15:07:03 anssik: transition request processing is expected to take a week, so earliest publication date would be on the week 18-22 March 2024
15:07:22 Present+ Deepti_Gandluri
15:07:55 Topic: Hybrid AI exploration
15:08:25 anssik: As you recall, we have a sister WebML Community Group responsible for incubating new ideas for future work, the CG works closely with this WG, sharing many participants
15:08:32 anssik: the WebML CG has received a new proposal called "Hybrid AI exploration":
15:08:35 -> https://github.com/webmachinelearning/proposals/issues/5
15:08:36 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf)
15:08:59 anssik: so I wanted to invite WebML CG participants Michael, Geoff, Sudeep to present this proposal to solicit input from this WG and inform the direction this exploration should take
15:09:09 anssik: the timebox is ~20 minutes including Q&A
15:09:23 anssik: any concrete proposals from this exploration that may impact Web APIs are expected to be incubated in an applicable Community Group first
15:09:30 Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Mar/att-0000/WebML.Discussion.-.Hybrid.AI.for.the.Web.-.Slides.pdf
15:09:53 [slide 1]
15:10:23 Michael: Proposal is titled "Hybrid AI for the Web", probably a bit mistitled, we're looking at general model management too
15:10:51 Michael: started work to improve the fit of WebNN on the client and want to make sure we look at the right problems, we're not proposing solutions at this stage
15:10:57 [slide 2]
15:11:42 Michael: first going through the general status as we undersrand it, specific issues, goals and requirements we see, prioritization, closing with questions for this group
15:11:56 zkis has joined #webmachinelearning
15:11:57 [slide 3]
15:12:11 Michael: looked at WebNN use cases, client AI execution clearly in focus
15:12:39 AramZS has joined #webmachinelearning
15:12:45 Michael: we found some problems, e.g. language translation requires large models, long download times, need to figure out client capabilities
15:12:53 Michael: startup time may be significant
15:13:16 Michael: if we have two different web sites using a model we need to download a model twice,
15:14:03 Michael: clients vary in capabilities, vary in time, clients grow rapidly in performance and want to avoid least common denominator approach
15:14:12 [slide 4]
15:14:16 Michael: Specific issues, three broader categories:
15:14:30 Michael: 1) Model Management
15:14:52 Michael: - Large models cannot be reused across origins
15:15:04 Michael: - Model storage and management opaque to the user
15:15:27 Michael: - Cache eviction may not match user preferences
15:15:47 Michael: 2) Elasticity through Hybrid AI
15:16:13 Michael: - Distributing work between client and server
15:16:32 Michael: - Difficult to predict performance on a client
15:17:04 Michael: - Sharing detailed client capabilities a privacy risk
15:17:27 Michael: (noting possible overlap with PWA caching mechanisms)
15:17:33 Michael: 3) User Experience
15:17:36 Michael: - Privacy behaviour unclear, not match user preferences
15:17:40 Michael: - Managing latency of model downloads
15:17:44 [slide 5]
15:17:49 Michael: Goals and Requirements
15:17:59 Michael: Maximize ease of use for the end user
15:18:19 Michael: - minimize load times and meet latency targets
15:18:32 Michael: Portability and elasticity
15:19:09 Michael: - minimize costs, support varying client capabilities, adapt based on resource availability
15:21:16 Michael: Data privacy
15:21:29 Michael: - Personal and biz data, support user choice and control
15:22:10 Michael: last but not least, developer ease of use and consistency
15:22:30 [slide 6]
15:22:35 Michael: Questions for Discussion
15:22:43 Michael: How to:
15:22:48 Michael: handle model download latency and storage?
15:22:55 Michael: match model reqs to client capabilities?
15:22:59 Michael: choose among model fidelity levels?
15:23:03 Michael: support progressive transmission of models?
15:23:08 Michael: partition single models, support separate models, both?
15:23:20 Michael: Questions to the group:
15:23:25 Michael: - what should be the priorities?
15:23:28 Michael: - Specific use cases for Hybrid AI?
15:23:32 [slide 7]
15:23:35 Michael: Proposed Next Steps
15:23:42 Michael: 1) Make sure we solve the right problem
15:23:47 Michael: We welcome your feedback via the GH issue submitted to the proposals repo:
15:23:50 -> https://github.com/webmachinelearning/proposals/issues/5
15:23:51 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf)
15:23:57 Michael: 2) Build a prototype implementation
15:24:02 Michael: e.g. using the Model Loader API from the CG as a basis, we have some ideas to test
15:24:06 -> Model Loader API https://webmachinelearning.github.io/model-loader/
15:24:20 Michael: 3) Bring results back to the group to discuss further
15:24:57 RafaelCintron has joined #webmachinelearning
15:25:06 q+
15:25:10 Present+ Rafael_Cintron
15:25:26 ack RafaelCintron
15:25:52 RafaelCintron: is there a solution in mind?
15:26:31 Michael: caching strategy on the computational graph, negotiating model requirements and client capablities are a few ideas
15:27:33 q+
15:27:35 anssik: we have discussed Storage APIs for caching large models in this group earlier
15:27:37 ack Joshua_Lochner
15:27:51 anssik: I recall Joshua Lochner has prototyped solutions to cross-site sharing of models with a browser extension
15:28:31 Joshua_Lochner: from Transformers.js perspective we're fortunate that the browser caching API is pretty performant, can do 1.5B param model and refresh the page and it loads from the cache
15:29:09 Joshua_Lochner: however, the issues emerges when I go to another web site on a different origin, my extension idea works but it requires the user to download a random extension, extra effort for the user, not a standards-based feature
15:29:51 Joshua_Lochner: for the size issue, I'm focused on smaller models that can perform in Wasm environment, soon WebGPU
15:30:09 Joshua_Lochner: storage and issues related to exceptionally big models has not been the main focus, 50M to 250M parameters has been the focus of Transformers.js, sweet point
15:30:14 Joshua_Lochner: due to the cache issues
15:30:40 Joshua_Lochner: the main API is Web Cache API, models loaded as HTTP request-response pair from HuggingFace Hub
15:30:41 q?
15:30:53 Michael: are you caching serializations of the models, ONNX files?
15:31:03 Michael: any ideas re adapters?
15:31:32 Joshua_Lochner: caching serializations, single .onnx files, in the future separate graph and weight in separate files
15:31:44 Joshua_Lochner: for adapters, haven't thought about it yet
15:32:11 Joshua_Lochner: ONNX RT is rather limited in this sense, if we want to use an adapter we need to export the whole model
15:32:18 Joshua_Lochner: MMS text-to-speech model was one example where an adapter is at the end
15:32:58 Joshua_Lochner: what we have been able to do is split up the models, only works if the backend is identical, e.g. text-gen model, chop of the head
15:33:16 Joshua_Lochner: that's one way to share weights, not exactly the way you're proposing
15:33:48 Michael: the topology is smaller chunk we're concerned of weight caching at this stage
15:34:22 Michael: progressive transmitting creates questions regarding the API if it's read-only, the best now is to create zero nodes and add weight later
15:34:34 Michael: to progressively enhance the model we need to built it from the scratch
15:34:35 q?
15:34:48 q?
15:35:48 Michael: the prototype will likely be similar to HuggingFace solution, for good UX may need some standardization work later in the future
15:35:50 q?
15:36:24 Michael: what use cases can make use of different level of fidelity?
15:36:48 Michael: big-little mapping to server-client, any specific use cases for Hybrid AI?
15:37:16 q+
15:37:19 q?
15:37:22 ack Joshua_Lochner
15:38:01 Joshua_Lochner: I guess some form of a personalization model that learns things over time, continuous training, a model where you can update the weights over time, private personalization preference learning model
15:38:20 Joshua_Lochner: e.g. you're on Twitter and want to block certain things
15:39:00 Joshua_Lochner: you're probably referring to much larger things, multiple LoRAs on top llama
15:39:42 Michael: fine-tuning on the client, split out the LoRAs so we can select them, a lot of these optimizations are relevant to big models mainly, smaller models are faster to download as is
15:39:43 q?
15:40:58 Joshua_Lochner: another use case, some form of underlying embeddings adapting, speech-to-text, text-to-speech, base model stays the same
15:41:03 q?
15:41:16 https://github.com/webmachinelearning/proposals/issues/5
15:41:17 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf)
15:42:03 Topic: Open issues and PRs
15:42:07 anssik: as usual, let's discuss open issues and review PRs based on your feedback
15:42:11 -> All open issues https://github.com/webmachinelearning/webnn/issues
15:42:16 -> All pull requests https://github (by philloooo) [feature request] 15:47:37 jsbell: different between sync and async errors, in the initial implementation a lot of validation happens syncronously in the renderer process because XNNPACK is also running in that process 15:48:09 dwayner has joined #webmachinelearning 15:48:36 ... sync errors can be moved earlier, async errors are hard for developers and also frameworks to handle 15:49:06 ... thinking how to report those errors, with promise rejection, how to know which node is responsible for the error? 15:49:12 ... proposed solution to follow WebGPU’s practice to define a MLObjectBase with a label field to let MLOperand extend from 15:49:16 -> WebGPUObjectBase.label https://www.w3.org/TR/webgpu/#gpuobjectbase 15:49:41 jsbell: when an async error is raised them the developer has useful information about the reason 15:50:05 ... more interesting if decomp or fusion is done 15:50:22 ... Zoltan proposed we could auto-gen these labels 15:50:34 ... we're interested in any feedback 15:50:36 q+ 15:51:11 zkis: I just agree with the latest comment from Josh 15:51:14 ack RafaelCintron 15:51:31 RafaelCintron: wanted to say I'm in favour of labels and sync 15:51:40 ... I was also a proponent of .label WebGPU 15:52:13 anssik: jsbell Google folks interested in implementing this? 15:52:21 jsbell: yes 15:52:50 ... what happens if developers call build and async build is happening and code modifies the label of an operand, does the build step snapshot all the labels 15:52:58 q? 15:53:26 Dwayne: for debugging this is very helpful 15:54:06 ... what is the format of labels, it would be helpful to raise the errors sooner than later, I wonder if you can know all the backend capabilities early, can do pre-validation, overall like the idea 15:54:08 q? 15:54:29 q? 15:54:31 q+ 15:54:34 ack zkis 15:54:55 zkis: we could keep generated labels separate from user-provided labels 15:55:02 ... to be discused in the issue 15:55:03 q? 15:55:46 Subtopic: Rename inputSize variables as inputRank in algorithms 15:55:50 anssik: issue #588 15:55:51 https://github.com/webmachinelearning/webnn/issues/588 -> Issue 588 Rename inputSize variables as inputRank in algorithms (by inexorabletash) [conventions] 15:56:07 jsbell: this is very simple, comments welcome 15:56:12 Subtopic: Consider alternate styling/linking for method argument definitions 15:56:17 anssik: issue #574 15:56:18 https://github.com/webmachinelearning/webnn/issues/574 -> Issue 574 Consider alternate styling/linking for method argument definitions (by inexorabletash) [question] [editorial] [conventions] 15:56:26 ... question regarding styling method arguments with three alternatives: 15:56:31 ... Alternative 1: Make args into definitions 15:56:34 ... Alternative 2: Style as definition list 15:56:39 ... Alternative 3: Auto-generated table 15:56:41 q? 15:57:13 q? 15:57:32 RRSAgent, draft minutes 15:57:33 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 16:01:47 s/works is/works if 16:02:22 s/chunck/chunk 16:07:19 s/… the WG/anssik: the WG 16:07:19 RRSAgent, draft minutes 16:07:20 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 16:09:07 s/the WG has/anssik: the WG has 16:09:09 RRSAgent, draft minutes 16:09:10 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 18:01:34 Zakim has left #webmachinelearning