14:56:41 RRSAgent has joined #webmachinelearning 14:56:45 logging to https://www.w3.org/2024/01/11-webmachinelearning-irc 14:56:46 inviting RRSAgent 14:56:46 Meeting: WebML WG Teleconference – 11 January 2023 14:56:46 RRSAgent, make logs Public 14:56:47 please title this meeting ("meeting: ..."), anssik 14:56:51 Chair: Anssi 14:56:57 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-01-11-wg-agenda.md 14:57:01 Scribe: Anssi 14:57:09 scribeNick: anssik 14:57:14 gb, this is webmachinelearning/webnn 14:57:14 anssik, OK. 14:57:18 Present+ Anssi_Kostiainen 14:58:30 zkis has joined #webmachinelearning 14:58:54 Present+ Zoltan_Kis 15:00:12 jsbell has joined #webmachinelearning 15:00:18 Present+ Joshua_Lochner 15:00:23 Present+ Joshua_Bell 15:00:27 Present+ Dominique_Hazael-Massieux 15:00:32 Ningxin_Hu has joined #webmachinelearning 15:00:59 gdti has joined #webmachinelearning 15:01:03 Present+ Deepti_Gandluri 15:01:09 Present+ Ningxin_Hu 15:01:26 Present+ Rupak 15:01:34 Present+ Chai_Chaoweeraprasit 15:01:41 Present+ Bryan_Bernhart 15:01:50 chai has joined #webmachinelearning 15:01:51 Present+ Rafael_Cintron 15:01:55 Joshua_Lochner has joined #webmachinelearning 15:02:14 Present+ Christos_Bacharakis 15:02:32 Present+ Tianqi_Chen 15:03:10 RafaelCintron has joined #webmachinelearning 15:05:11 Topic: Web LLM project contributions 15:05:11 anssik: Please welcome Tianqi Chen (tq) to the WebML WG! 15:05:29 ... Tianqi is a Professor at CMU, creator of Web LLM, Chief Technologist at OctoML 15:06:06 ... I'm super excited to share Tianqi joined this group's as an Invited Expert and today he will share with the group some learnings from the WebLLM, a hugely exciting and boundary-pushing project 15:06:26 ... this is to initiate a discussion on new use cases and features for the WebNN API informed by the Web LLM project experience 15:06:35 ... I should also note Web LLM has a sister project Web Stable Diffusion, also relevant 15:07:00 ... and that both Web LLM and Web Stable Diffusion build upon a Machine Learning Compilation (MLC) solution that transforms and optimizes ML execution from its development form to its deployment form 15:07:09 ... there's a CMU course of the same name, a great resource 15:07:14 ... before unleashing Tianqi, I want to clarify this discussion is a kick off and introduction to the topic and I expect the group to discuss and advance specific topics of shared interest going forward 15:07:30 ... as a logistical note, we take questions through IRC queuing system and will have a QA after the intro 15:07:34 ... there's also WebNN GH issue #480 for follow up questions and comments 15:07:34 https://github.com/webmachinelearning/webnn/issues/480 -> Issue 480 Proposed use cases from Web LLM project (by anssiko) [use case] 15:07:39 -> Web LLM project https://webllm.mlc.ai/ 15:07:42 -> Web Stable Diffusion https://websd.mlc.ai/ 15:07:45 -> Machine Learning Compilation https://mlc.ai/ 15:07:50 -> Machine Learning Compilation CMU Course material https://mlc.ai/summer22/schedule 15:07:53 ... Tianqi, the stage is yours! 15:08:12 TQ has joined #webmachinelearning 15:08:18 hh 15:08:34 Rupak has joined #webmachinelearning 15:08:39 etiennenoel has joined #webmachinelearning 15:08:54 tq: I tell you a bit what we're being doing, many synergies with MLC project 15:09:14 ... we build ML compilation stack to help take the existing ML models and port them to different environments 15:09:36 ... challenges currently how to port to different environments, optimize them on each 15:11:03 ... key concepts on MLC: 15:11:16 ... composable abstraction for ML Programs 15:12:23 ... Python first development, API for productive and accessible developments through all stages of the stack 15:13:50 ... universal deployment, existing module can run everywhere, mapping IRModule function on the left-hand side to right-hand side (Python, touch.compile, C++, JS) 15:14:46 ... Development pattern: normal development vs normal compiler development vs incremental compilation development 15:15:01 Present+ Etienne_Noel 15:15:44 tq: in short, we provide universal deployment of LLMs 15:17:08 ... unify libraries and complication, bringing library-based offloading and native complation together 15:17:16 q? 15:18:11 tq: question to this WG; how to interrogate WebGPU with WebNN execution environment? 15:18:43 ... a challenge is the web is async in nature and even in the WebGPU environment 15:19:22 ... a thought experiment, relu op with WebGPU, after relu you run WebNN op, e.g. matmul, get them work together 15:19:53 ... with TensorRT and CUDA this is relatively easy following the same command queue 15:20:46 q+ 15:21:05 ... in a runtime environment a lot is dependent on Wasm (or Emscripten in TVM), syncronization of exec queues between WebGPU and Wasm queue a challenge 15:21:12 q? 15:21:16 https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h 15:21:53 tq: in order for WebGPU and WebNN to exchange memory need a standardized way to do so 15:22:20 ... would like to have a JSON schema that compilers can produce 15:22:41 s/JSON schema/JSON schema for WebNN 15:22:52 q? 15:22:55 ack gdti 15:23:08 gdti: thanks for laying out what's needed 15:23:37 ... re async boundary between Wasm and WebGPU, have you experimented with JSPI(?) 15:24:03 q+ 15:24:04 https://v8.dev/blog/jspi 15:24:05 tq: not looked at it, please provide a reference so we can look into it 15:24:33 gdti: native code is not async so we are trying bridge that gap with JSPI 15:25:08 tq: right now the way WebLLM works is, most of the runtimes are written with a Wasm component which is syncronous 15:25:35 ... works well for out current case, but would like the Wasm part and Emscripten part to say I'm doing a sequence of function calls that are sync 15:25:52 ... but at times declare we "await" something 15:26:14 ... when structured in C++ fashion 15:26:36 q? 15:27:06 gdti: when you talk about something like async/await available in Wasm, would that help your project? 15:28:14 tq: WebLLM runs ABCD queued in WebGPU queue, there could be a point where we want to say we'd like to have a JS calls on to Wasm but within Wasm we have graph interpreted that must wait for GPU memory before continuing 15:28:33 ... that synchronization happens within Wasm 15:29:16 ... if there's a way to specify here's Wasm buffer Wasm queue is another way to solve the problem 15:29:16 q`? 15:29:17 q? 15:29:24 ack RafaelCintron 15:29:39 RafaelCintron: thanks you for joining the work and all the good work! 15:29:54 ... there are multiple places in the stack where async happens, Wasm and JS boundary is one 15:30:29 ... another is between process that runs JS and it has to queue things to talk to CUDA others, that must be async 15:30:41 ... third is between CPU and others like GPU 15:31:11 ... prototyping in Chromium is interop between WebGPU and WebNN so you can have a shared queue 15:31:23 ... we hope it will solve these WebNN-WebGPU interop issues 15:31:36 tq: that'd be definitely helpful 15:31:58 RafaelCintron: for serialization formats, WebNN has tried to stay away 15:32:20 ... there are too many serialization formats 15:32:50 ... web developers can create their own serialization format and convert that to WebNN and back, this is done today for ONNX and others 15:33:17 tq: we don't need a standard format necessarily, it is a cost of implementation issue 15:33:35 ... most Wasm based implementation need to implement their on WebNN backend 15:34:02 ... e.g. TVM will need to implement its own JSON format, compiler will need to generate that 15:34:09 ... most frameworks will have to do the same 15:34:17 ... a common library to do that would be useful 15:34:52 RafaelCintron: you can use ONNX Runtime for Web, others are also available, maybe we as a group can write a helper library that can evolve independent of the browser 15:34:54 q? 15:35:09 q+ 15:35:13 ack Ningxin_Hu 15:35:26 Ningxin_Hu: thanks tq for an awesome introduction! 15:35:47 ... re serialization, we discussed how to ship a model, we are shipping JS code that carries weights 15:36:11 ... is there any way to adapt to other environments that generate the code? With WebNN you can generate the code 15:36:22 tq: I think instead of generating JSON you generate JS instead 15:36:32 ... that could be possible and we could do that, a good idea! 15:36:46 ... we didn't think about that 15:36:47 q? 15:37:46 anssik: top things for the WG to do tq? 15:38:06 tq: 1) Experiment build WebNN TVM backend as a JS file 15:38:29 ... for pure WebNN solution we might be able to get something running, still depends on sync component Rafael mentioned 15:38:40 ... want to understand how TVM infra can help us 15:39:03 ... because TVM has a cross-platform nature can take an existing model and port it 15:40:27 q? 15:41:08 Chai: thanks you tq for an awesome presentation! 15:41:37 q? 15:42:05 Present+ Dwayne_Robinson 15:43:03 Joshua_Lochner: great meeting you, your work is fantastic! 15:43:44 ... pushing the boundaries with ONNX Runtime that Transformers.js uses as a backend, ~1B param is Wasm execution provider limitation, WebGPU backend is WIP 15:44:03 ... maybe you can comment how you'd pushed the boundary of protobuf memory limits 15:44:23 tq: TVM compiler works in such a way weights are not embedded in the model 15:45:11 ... another thing is for LLMs to support dynamic shapes, TVM compilation compile everything into a Wasm file, WebGPU does not have so strict limitations as Wasm, TVM runtime does not suffer from that limitation 15:45:49 ... another challenge, there's 4-bit quantizations, 2-bit quantization, we want to keep up with that evolution and use codegen so you can take a subsets of functions and codegen the execution part 15:46:03 ... that particular subset will execute in more efficient fashion and can help 15:46:33 ... I'm a big fan of Transformers.js too! Looking forward to colloborating with you and make TVM a helpful building block! 15:46:36 q+ 15:46:39 Joshua_Lochner: thank you! 15:46:40 q? 15:46:52 ack gdti 15:47:25 gdti: re protobuff limitations, there's active work ongoing to address this, will share details on the latest work 15:47:26 q? 15:48:25 RRSAgent, draft minutes 15:48:27 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 15:48:42 Topic: Efficient WebGPU integration discussion 15:49:19 anssik: During 2023 we've sought for implementation experience and user feedback for the MLCommandEncoder interface that proposes to enable more efficient WebGPU integration 15:49:39 q+ 15:49:40 ... To start the year, I wanted to check any feedback, suggestions, implementation insights related to efficient WebGPU integration 15:49:42 q? 15:49:44 ack chai 15:50:16 chai: the WebGPU interop is an important topic, what is in the spec is what we came up with last year, with new work in MLBuffer I expect some changes in this area 15:50:47 ... we can allow more efficient interop in the platform with that effort 15:51:19 ... although this may sound like shuffling data back and forth, WebNN is defined to be more than GPU, because NPU architecture is coming too 15:51:58 ... so we're considering also broader set of adapters, additional requirements that is consistent with the way the platform moves information across different devices 15:52:22 ... to summarize, ongoing topic and we intend to make progress on this to have a fully interoperable spec 15:52:23 q? 15:52:42 q? 15:53:15 #322 15:53:16 https://github.com/webmachinelearning/webnn/issues/322 -> Pull Request 322 Simplify MLContext creation (by wchao1115) 15:54:12 Chai: #322 is how to define MLContext, related but not the same, active discussion that needs to be finalized prior to integration 15:54:29 ... it says should WebNN have its own GPU device to manage 15:55:04 ... the proposal was to use WebGPUDevice instead 15:55:41 anssik: would prefer to get closure on this prior to next CR refresh 15:55:42 q? 15:55:53 q? 15:55:56 Topic: NPU support discussion 15:56:03 anssik: Another broader and exciting 2024 topic for the group is NPU 15:56:31 q+ 15:56:34 ... I wanted to reinvigorate discussion on any new features, enhancements, possible API surface changes beneficial for NPU support. Also wanted to see if we could nudge forward some of the issues that touch NPU. 15:56:35 q? 15:56:38 ack chai 15:57:14 chai: NPU is brand new, right now we have commercial products with it hitting the shelves right now 15:57:40 ... we need to think how to embrace it the right way, the platform just started to support it 15:57:42 q? 15:58:01 q+ 15:58:05 ack RafaelCintron 15:58:41 RafaelCintron: just quickly mentioning our collegues are prototyping MLBuffer work, solves WebGPU interop and way to keep things in the correct process 15:58:55 ... every inference call gets JS array, returns one 15:59:26 ... roundtrip to JS process slows us down, MLBuffer keeps things in process and not suffer the roundtrip cost 15:59:31 q? 15:59:40 q+ 15:59:47 ack Ningxin_Hu 16:00:11 Ningxin_Hu: a quick heads up, colleagues working on NPU support with Intel hardware 16:00:22 ... this will help inform NPU-related spec work 16:00:23 q? 16:00:56 q? 16:02:05 RRSAgent, draft minutes 16:02:07 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:05:21 s/touch.compile/torch.compile 16:05:22 RRSAgent, draft minutes 16:05:23 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:05:45 s/complication/compilation 16:05:47 RRSAgent, draft minutes 16:05:49 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:06:27 s/complation/compilation 16:06:27 RRSAgent, draft minutes 16:06:28 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:08:09 s/q`?// 16:08:10 RRSAgent, draft minutes 16:08:12 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:14:20 s/group's/group 16:14:22 RRSAgent, draft minutes 16:14:23 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:22:52 s/Experiment build/Experiment with building 16:22:53 RRSAgent, draft minutes 16:22:55 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 16:35:21 s/January 2023/January 2024 16:35:22 RRSAgent, draft minutes 16:35:24 I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik 18:30:59 Zakim has left #webmachinelearning