14:47:12 RRSAgent has joined #webmachinelearning 14:47:16 logging to https://www.w3.org/2024/03/07-webmachinelearning-irc 14:47:16 RRSAgent, make logs Public 14:47:17 please title this meeting ("meeting: ..."), anssik 14:47:17 Meeting: WebML WG Teleconference – 7 March 2024 14:47:21 Chair: Anssi 14:47:26 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-03-07-wg-agenda.md 14:47:45 Scribe: Anssi 14:47:49 scribeNick: anssik 14:47:56 gb, this is webmachinelearning/webnn 14:47:56 anssik, OK. 14:48:01 Present+ Anssi_Kostiainen 14:48:05 Regrets+ Ningxin_Hu 14:48:15 RRSAgent, draft minutes 14:48:16 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 14:51:47 grgustaf has joined #webmachinelearning 14:54:24 Present+ Geoff_Gustafson 14:56:10 Present+ Michael_McCool 14:56:43 Present+ Etienne_Noel 14:58:59 jsbell has joined #webmachinelearning 14:59:14 Present+ Joshua_Bell 14:59:26 Present+ Rachel_Yager 14:59:31 Rach has joined #webmachinelearning 15:00:17 Present+ Joshua_Lochner 15:00:30 Joshua_Lochner has joined #webmachinelearning 15:00:40 McCool has joined #webmachinelearning 15:01:03 etiennenoel has joined #webmachinelearning 15:01:17 Present+ Zoltan_Kis 15:01:19 asully has joined #webmachinelearning 15:01:36 Present+ Sudeep_Divakaran 15:01:44 Present+ Dwayne_Robinson 15:01:53 Present+ Ilya_Rezvov 15:01:59 Present+ Bryan_Bernhart 15:02:27 RRSAgent, draft minutes 15:02:28 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 15:03:04 anssik: Welcome to a new participant, Ilya Rezvov from Google 15:03:52 Ilya: I work for Google on Wasm mostly, recently looking at ML-related efforts, fp16 in Wasm specifically and want to extend my interests to other areas of ML on the web 15:04:29 Topic: Call for Consensus: WebNN API CR Snapshot 15:04:45 anssik: On 28 Feb 2024 we issued a Call for Consensus (CfC) to publish the Web Neural Network API as a new Candidate Recommendation Snapshot (CRS). 15:04:55 -> CfC to publish WebNN API Candidate Recommendation Snapshot - review by 7 Mar 2024 15:04:55 https://lists.w3.org/Archives/Public/public-webmachinelearning-wg/2024Feb/0006.html 15:04:59 ... the WG has received explicit support for this CfC and no concerns have been raised 15:05:08 ... our CR readiness tracker #240 shows all green except for the WIP TAG delta review 15:05:08 https://github.com/webmachinelearning/webnn/issues/240 -> Issue 240 Candidate Recommendation readiness tracker (by anssiko) [process] 15:05:31 ... the TAG delta review in flight may raise some questions at the transition time so we should be prepared for that 15:05:57 ... that said, I hope we can address this in flight by noting this publication in fact explicitly addresses earlier TAG review feedback by removing support for synchronous execution 15:06:21 ... I will handle CRS transition logistics with Dom and will ask the WG for further information as needed 15:06:29 Deepti has joined #webmachinelearning 15:06:41 ... we can still merge the currently open PRs before branching for this release, it is important that after each commit the spec remains in a cohesive state 15:07:03 ... transition request processing is expected to take a week, so earliest publication date would be on the week 18-22 March 2024 15:07:22 Present+ Deepti_Gandluri 15:07:55 Topic: Hybrid AI exploration 15:08:25 anssik: As you recall, we have a sister WebML Community Group responsible for incubating new ideas for future work, the CG works closely with this WG, sharing many participants 15:08:32 ... the WebML CG has received a new proposal called "Hybrid AI exploration": 15:08:35 -> https://github.com/webmachinelearning/proposals/issues/5 15:08:36 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf) 15:08:59 ... so I wanted to invite WebML CG participants Michael, Geoff, Sudeep to present this proposal to solicit input from this WG and inform the direction this exploration should take 15:09:09 ... the timebox is ~20 minutes including Q&A 15:09:23 ... any concrete proposals from this exploration that may impact Web APIs are expected to be incubated in an applicable Community Group first 15:09:30 Slideset: https://lists.w3.org/Archives/Public/www-archive/2024Mar/att-0000/WebML.Discussion.-.Hybrid.AI.for.the.Web.-.Slides.pdf 15:09:53 [slide 1] 15:10:23 Michael: Proposal is titled "Hybrid AI for the Web", probably a bit mistitled, we're looking at general model management too 15:10:51 ... started work to improve the fit of WebNN on the client and want to make sure we look at the right problems, we're not proposing solutions at this stage 15:10:57 [slide 2] 15:11:42 Michael: first going through the general status as we undersrand it, specific issues, goals and requirements we see, prioritization, closing with questions for this group 15:11:56 zkis has joined #webmachinelearning 15:11:57 [slide 3] 15:12:11 Michael: looked at WebNN use cases, client AI execution clearly in focus 15:12:39 AramZS has joined #webmachinelearning 15:12:45 ... we found some problems, e.g. language translation requires large models, long download times, need to figure out client capabilities 15:12:53 ... startup time may be significant 15:13:16 ... if we have two different web sites using a model we need to download a model twice, 15:14:03 ... clients vary in capabilities, vary in time, clients grow rapidly in performance and want to avoid least common denominator approach 15:14:12 [slide 4] 15:14:16 Michael: Specific issues, three broader categories: 15:14:30 ... 1) Model Management 15:14:52 ... - Large models cannot be reused across origins 15:15:04 ... - Model storage and management opaque to the user 15:15:27 ... - Cache eviction may not match user preferences 15:15:47 ... 2) Elasticity through Hybrid AI 15:16:13 ... - Distributing work between client and server 15:16:32 ... - Difficult to predict performance on a client 15:17:04 ... - Sharing detailed client capabilities a privacy risk 15:17:27 ... (noting possible overlap with PWA caching mechanisms) 15:17:33 ... 3) User Experience 15:17:36 ... - Privacy behaviour unclear, not match user preferences 15:17:40 ... - Managing latency of model downloads 15:17:44 [slide 5] 15:17:49 Michael: Goals and Requirements 15:17:59 ... Maximize ease of use for the end user 15:18:19 ... - minimize load times and meet latency targets 15:18:32 ... Portability and elasticity 15:19:09 ... - minimize costs, support varying client capabilities, adapt based on resource availability 15:21:16 ... Data privacy 15:21:29 ... - Personal and biz data, support user choice and control 15:22:10 ... last but not least, developer ease of use and consistency 15:22:30 [slide 6] 15:22:35 Michael: Questions for Discussion 15:22:43 ... How to: 15:22:48 ... handle model download latency and storage? 15:22:55 ... match model reqs to client capabilities? 15:22:59 ... choose among model fidelity levels? 15:23:03 ... support progressive transmission of models? 15:23:08 ... partition single models, support separate models, both? 15:23:20 ... Questions to the group: 15:23:25 ... - what should be the priorities? 15:23:28 ... - Specific use cases for Hybrid AI? 15:23:32 [slide 7] 15:23:35 Michael: Proposed Next Steps 15:23:42 ... 1) Make sure we solve the right problem 15:23:47 ... We welcome your feedback via the GH issue submitted to the proposals repo: 15:23:50 -> https://github.com/webmachinelearning/proposals/issues/5 15:23:51 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf) 15:23:57 Michael: 2) Build a prototype implementation 15:24:02 ... e.g. using the Model Loader API from the CG as a basis, we have some ideas to test 15:24:06 -> Model Loader API https://webmachinelearning.github.io/model-loader/ 15:24:20 Michael: 3) Bring results back to the group to discuss further 15:24:33 RRSAgent, draft minutes 15:24:34 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 15:24:57 RafaelCintron has joined #webmachinelearning 15:25:06 q+ 15:25:10 Present+ Rafael_Cintron 15:25:26 ack RafaelCintron 15:25:52 RafaelCintron: is there a solution in mind? 15:26:31 Michael: caching strategy on the computational graph, negotiating model requirements and client capablities are a few ideas 15:27:33 q+ 15:27:35 anssik: we have discussed Storage APIs for caching large models in this group earlier 15:27:37 ack Joshua_Lochner 15:27:51 anssik: I recall Joshua Lochner has prototyped solutions to cross-site sharing of models with a browser extension 15:28:31 Joshua_Lochner: from Transformers.js perspective we're fortunate that the browser caching API is pretty performant, can do 1.5B param model and refresh the page and it loads from the cache 15:29:09 ... however, the issues emerges when I go to another web site on a different origin, my extension idea works but it requires the user to download a random extension, extra effort for the user, not a standards-based feature 15:29:51 ... for the size issue, I'm focused on smaller models that can perform in Wasm environment, soon WebGPU 15:30:09 ... storage and issues related to exceptionally big models has not been the main focus, 50M to 250M parameters has been the focus of Transformers.js, sweet point 15:30:14 ... due to the cache issues 15:30:40 ... the main API is Web Cache API, models loaded as HTTP request-response pair from HuggingFace Hub 15:30:41 q? 15:30:53 Michael: are you caching serializations of the models, ONNX files? 15:31:03 ... any ideas re adapters? 15:31:32 Joshua_Lochner: caching serializations, single .onnx files, in the future separate graph and weight in separate files 15:31:44 ... for adapters, haven't thought about it yet 15:32:11 ... ONNX RT is rather limited in this sense, if we want to use an adapter we need to export the whole model 15:32:18 ... MMS text-to-speech model was one example where an adapter is at the end 15:32:58 ... what we have been able to do is split up the models, only works is the backend is identical, e.g. text-gen model, chop of the head 15:33:16 ... that's one way to share weights, not exactly the way you're proposing 15:33:48 Michael: the topology is smaller chunck we're concerned of weight caching at this stage 15:34:22 ... progressive transmitting creates questions regarding the API if it's read-only, the best now is to create zero nodes and add weight later 15:34:34 ... to progressively enhance the model we need to built it from the scratch 15:34:35 q? 15:34:48 q? 15:35:48 Michael: the prototype will likely be similar to HuggingFace solution, for good UX may need some standardization work later in the future 15:35:50 q? 15:36:24 Michael: what use cases can make use of different level of fidelity? 15:36:48 ... big-little mapping to server-client, any specific use cases for Hybrid AI? 15:37:16 q+ 15:37:19 q? 15:37:22 ack Joshua_Lochner 15:38:01 Joshua_Lochner: I guess some form of a personalization model that learns things over time, continuous training, a model where you can update the weights over time, private personalization preference learning model 15:38:20 ... e.g. you're on Twitter and want to block certain things 15:39:00 ... you're probably referring to much larger things, multiple LoRAs on top llama 15:39:42 Michael: fine-tuning on the client, split out the LoRAs so we can select them, a lot of these optimizations are relevant to big models mainly, smaller models are faster to download as is 15:39:43 q? 15:40:58 Joshua_Lochner: another use case, some form of underlying embeddings adapting, speech-to-text, text-to-speech, base model stays the same 15:41:03 q? 15:41:16 https://github.com/webmachinelearning/proposals/issues/5 15:41:17 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf) 15:42:03 Topic: Open issues and PRs 15:42:07 anssik: as usual, let's discuss open issues and review PRs based on your feedback 15:42:11 -> All open issues https://github.com/webmachinelearning/webnn/issues 15:42:16 -> All pull requests https://github.com/webmachinelearning/webnn/pulls 15:42:23 -> Triage guidance https://github.com/webmachinelearning/webnn/blob/main/docs/IssueTriage.md 15:43:28 Subtopic: Core operator set 15:43:50 https://github.com/webmachinelearning/webnn/issues/573 15:43:51 https://github.com/webmachinelearning/webnn/issues/573 -> Issue 573 Core operator set (by philloooo) [question] [opset] 15:43:53 anssik: issue #573 15:43:53 https://github.com/webmachinelearning/webnn/issues/573 -> Issue 573 Core operator set (by philloooo) [question] [opset] 15:43:55 jsbell: we're working on an additional contribution 15:44:38 ... in principle, if there are emerging standards from the ecosystem, we should look at them as well to inform our work, describe them separately 15:46:28 Subtopic: Consider using label to allow better error handling for async errors 15:46:34 anssik: issue #585 15:46:35 https://github.com/webmachinelearning/webnn/issues/585 -> Issue 585 Consider using `label` to allow better error handling for async errors. (by philloooo) [feature request] 15:47:37 jsbell: different between sync and async errors, in the initial implementation a lot of validation happens syncronously in the renderer process because XNNPACK is also running in that process 15:48:09 dwayner has joined #webmachinelearning 15:48:36 ... sync errors can be moved earlier, async errors are hard for developers and also frameworks to handle 15:49:06 ... thinking how to report those errors, with promise rejection, how to know which node is responsible for the error? 15:49:12 ... proposed solution to follow WebGPU’s practice to define a MLObjectBase with a label field to let MLOperand extend from 15:49:16 -> WebGPUObjectBase.label https://www.w3.org/TR/webgpu/#gpuobjectbase 15:49:41 jsbell: when an async error is raised them the developer has useful information about the reason 15:50:05 ... more interesting if decomp or fusion is done 15:50:22 ... Zoltan proposed we could auto-gen these labels 15:50:34 ... we're interested in any feedback 15:50:36 q+ 15:51:11 zkis: I just agree with the latest comment from Josh 15:51:14 ack RafaelCintron 15:51:31 RafaelCintron: wanted to say I'm in favour of labels and sync 15:51:40 ... I was also a proponent of .label WebGPU 15:52:13 anssik: jsbell Google folks interested in implementing this? 15:52:21 jsbell: yes 15:52:50 ... what happens if developers call build and async build is happening and code modifies the label of an operand, does the build step snapshot all the labels 15:52:58 q? 15:53:26 Dwayne: for debugging this is very helpful 15:54:06 ... what is the format of labels, it would be helpful to raise the errors sooner than later, I wonder if you can know all the backend capabilities early, can do pre-validation, overall like the idea 15:54:08 q? 15:54:29 q? 15:54:31 q+ 15:54:34 ack zkis 15:54:55 zkis: we could keep generated labels separate from user-provided labels 15:55:02 ... to be discused in the issue 15:55:03 q? 15:55:46 Subtopic: Rename inputSize variables as inputRank in algorithms 15:55:50 anssik: issue #588 15:55:51 https://github.com/webmachinelearning/webnn/issues/588 -> Issue 588 Rename inputSize variables as inputRank in algorithms (by inexorabletash) [conventions] 15:56:07 jsbell: this is very simple, comments welcome 15:56:12 Subtopic: Consider alternate styling/linking for method argument definitions 15:56:17 anssik: issue #574 15:56:18 https://github.com/webmachinelearning/webnn/issues/574 -> Issue 574 Consider alternate styling/linking for method argument definitions (by inexorabletash) [question] [editorial] [conventions] 15:56:26 ... question regarding styling method arguments with three alternatives: 15:56:31 ... Alternative 1: Make args into definitions 15:56:34 ... Alternative 2: Style as definition list 15:56:39 ... Alternative 3: Auto-generated table 15:56:41 q? 15:57:13 q? 15:57:32 RRSAgent, draft minutes 15:57:33 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 16:01:47 s/works is/works if 16:02:22 s/chunck/chunk 16:07:19 s/… the WG/anssik: the WG 16:07:19 RRSAgent, draft minutes 16:07:20 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 16:09:07 s/the WG has/anssik: the WG has 16:09:09 RRSAgent, draft minutes 16:09:10 I have made the request to generate https://www.w3.org/2024/03/07-webmachinelearning-minutes.html anssik 18:01:34 Zakim has left #webmachinelearning