IRC log of webmachinelearning on 2024-01-11

Timestamps are in UTC.

14:56:41 [RRSAgent]: RRSAgent has joined #webmachinelearning
14:56:45 [RRSAgent]: logging to https://www.w3.org/2024/01/11-webmachinelearning-irc
14:56:46 [Zakim]: inviting RRSAgent
14:56:46 [anssik]: Meeting: WebML WG Teleconference – 11 January 2023
14:56:46 [Zakim]: RRSAgent, make logs Public
14:56:47 [Zakim]: please title this meeting ("meeting: ..."), anssik
14:56:51 [anssik]: Chair: Anssi
14:56:57 [anssik]: Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-01-11-wg-agenda.md
14:57:01 [anssik]: Scribe: Anssi
14:57:09 [anssik]: scribeNick: anssik
14:57:14 [anssik]: gb, this is webmachinelearning/webnn
14:57:14 [gb]: anssik, OK.
14:57:18 [anssik]: Present+ Anssi_Kostiainen
14:58:30 [zkis]: zkis has joined #webmachinelearning
14:58:54 [anssik]: Present+ Zoltan_Kis
15:00:12 [jsbell]: jsbell has joined #webmachinelearning
15:00:18 [anssik]: Present+ Joshua_Lochner
15:00:23 [anssik]: Present+ Joshua_Bell
15:00:27 [anssik]: Present+ Dominique_Hazael-Massieux
15:00:32 [Ningxin_Hu]: Ningxin_Hu has joined #webmachinelearning
15:00:59 [gdti]: gdti has joined #webmachinelearning
15:01:03 [anssik]: Present+ Deepti_Gandluri
15:01:09 [anssik]: Present+ Ningxin_Hu
15:01:26 [anssik]: Present+ Rupak
15:01:34 [anssik]: Present+ Chai_Chaoweeraprasit
15:01:41 [anssik]: Present+ Bryan_Bernhart
15:01:50 [chai]: chai has joined #webmachinelearning
15:01:51 [anssik]: Present+ Rafael_Cintron
15:01:55 [Joshua_Lochner]: Joshua_Lochner has joined #webmachinelearning
15:02:14 [anssik]: Present+ Christos_Bacharakis
15:02:32 [anssik]: Present+ Tianqi_Chen
15:03:10 [RafaelCintron]: RafaelCintron has joined #webmachinelearning
15:05:11 [anssik]: Topic: Web LLM project contributions
15:05:11 [anssik]: anssik: Please welcome Tianqi Chen (tq) to the WebML WG!
15:05:29 [anssik]: ... Tianqi is a Professor at CMU, creator of Web LLM, Chief Technologist at OctoML
15:06:06 [anssik]: ... I'm super excited to share Tianqi joined this group's as an Invited Expert and today he will share with the group some learnings from the WebLLM, a hugely exciting and boundary-pushing project
15:06:26 [anssik]: ... this is to initiate a discussion on new use cases and features for the WebNN API informed by the Web LLM project experience
15:06:35 [anssik]: ... I should also note Web LLM has a sister project Web Stable Diffusion, also relevant
15:07:00 [anssik]: ... and that both Web LLM and Web Stable Diffusion build upon a Machine Learning Compilation (MLC) solution that transforms and optimizes ML execution from its development form to its deployment form
15:07:09 [anssik]: ... there's a CMU course of the same name, a great resource
15:07:14 [anssik]: ... before unleashing Tianqi, I want to clarify this discussion is a kick off and introduction to the topic and I expect the group to discuss and advance specific topics of shared interest going forward
15:07:30 [anssik]: ... as a logistical note, we take questions through IRC queuing system and will have a QA after the intro
15:07:34 [anssik]: ... there's also WebNN GH issue #480 for follow up questions and comments
15:07:34 [gb]: https://github.com/webmachinelearning/webnn/issues/480 -> Issue 480 Proposed use cases from Web LLM project (by anssiko) [use case]
15:07:39 [anssik]: -> Web LLM project https://webllm.mlc.ai/
15:07:42 [anssik]: -> Web Stable Diffusion https://websd.mlc.ai/
15:07:45 [anssik]: -> Machine Learning Compilation https://mlc.ai/
15:07:50 [anssik]: -> Machine Learning Compilation CMU Course material https://mlc.ai/summer22/schedule
15:07:53 [anssik]: ... Tianqi, the stage is yours!
15:08:12 [TQ]: TQ has joined #webmachinelearning
15:08:18 [TQ]: hh
15:08:34 [Rupak]: Rupak has joined #webmachinelearning
15:08:39 [etiennenoel]: etiennenoel has joined #webmachinelearning
15:08:54 [anssik]: tq: I tell you a bit what we're being doing, many synergies with MLC project
15:09:14 [anssik]: ... we build ML compilation stack to help take the existing ML models and port them to different environments
15:09:36 [anssik]: ... challenges currently how to port to different environments, optimize them on each
15:11:03 [anssik]: ... key concepts on MLC:
15:11:16 [anssik]: ... composable abstraction for ML Programs
15:12:23 [anssik]: ... Python first development, API for productive and accessible developments through all stages of the stack
15:13:50 [anssik]: ... universal deployment, existing module can run everywhere, mapping IRModule function on the left-hand side to right-hand side (Python, touch.compile, C++, JS)
15:14:46 [anssik]: ... Development pattern: normal development vs normal compiler development vs incremental compilation development
15:15:01 [anssik]: Present+ Etienne_Noel
15:15:44 [anssik]: tq: in short, we provide universal deployment of LLMs
15:17:08 [anssik]: ... unify libraries and complication, bringing library-based offloading and native complation together
15:17:16 [anssik]: q?
15:18:11 [anssik]: tq: question to this WG; how to interrogate WebGPU with WebNN execution environment?
15:18:43 [anssik]: ... a challenge is the web is async in nature and even in the WebGPU environment
15:19:22 [anssik]: ... a thought experiment, relu op with WebGPU, after relu you run WebNN op, e.g. matmul, get them work together
15:19:53 [anssik]: ... with TensorRT and CUDA this is relatively easy following the same command queue
15:20:46 [gdti]: q+
15:21:05 [anssik]: ... in a runtime environment a lot is dependent on Wasm (or Emscripten in TVM), syncronization of exec queues between WebGPU and Wasm queue a challenge
15:21:12 [anssik]: q?
15:21:16 [TQ]: https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h
15:21:53 [anssik]: tq: in order for WebGPU and WebNN to exchange memory need a standardized way to do so
15:22:20 [anssik]: ... would like to have a JSON schema that compilers can produce
15:22:41 [anssik]: s/JSON schema/JSON schema for WebNN
15:22:52 [anssik]: q?
15:22:55 [anssik]: ack gdti
15:23:08 [anssik]: gdti: thanks for laying out what's needed
15:23:37 [anssik]: ... re async boundary between Wasm and WebGPU, have you experimented with JSPI(?)
15:24:03 [RafaelCintron]: q+
15:24:04 [dom]: https://v8.dev/blog/jspi
15:24:05 [anssik]: tq: not looked at it, please provide a reference so we can look into it
15:24:33 [anssik]: gdti: native code is not async so we are trying bridge that gap with JSPI
15:25:08 [anssik]: tq: right now the way WebLLM works is, most of the runtimes are written with a Wasm component which is syncronous
15:25:35 [anssik]: ... works well for out current case, but would like the Wasm part and Emscripten part to say I'm doing a sequence of function calls that are sync
15:25:52 [anssik]: ... but at times declare we "await" something
15:26:14 [anssik]: ... when structured in C++ fashion
15:26:36 [anssik]: q?
15:27:06 [anssik]: gdti: when you talk about something like async/await available in Wasm, would that help your project?
15:28:14 [anssik]: tq: WebLLM runs ABCD queued in WebGPU queue, there could be a point where we want to say we'd like to have a JS calls on to Wasm but within Wasm we have graph interpreted that must wait for GPU memory before continuing
15:28:33 [anssik]: ... that synchronization happens within Wasm
15:29:16 [anssik]: ... if there's a way to specify here's Wasm buffer Wasm queue is another way to solve the problem
15:29:16 [anssik]: q`?
15:29:17 [anssik]: q?
15:29:24 [anssik]: ack RafaelCintron
15:29:39 [anssik]: RafaelCintron: thanks you for joining the work and all the good work!
15:29:54 [anssik]: ... there are multiple places in the stack where async happens, Wasm and JS boundary is one
15:30:29 [anssik]: ... another is between process that runs JS and it has to queue things to talk to CUDA others, that must be async
15:30:41 [anssik]: ... third is between CPU and others like GPU
15:31:11 [anssik]: ... prototyping in Chromium is interop between WebGPU and WebNN so you can have a shared queue
15:31:23 [anssik]: ... we hope it will solve these WebNN-WebGPU interop issues
15:31:36 [anssik]: tq: that'd be definitely helpful
15:31:58 [anssik]: RafaelCintron: for serialization formats, WebNN has tried to stay away
15:32:20 [anssik]: ... there are too many serialization formats
15:32:50 [anssik]: ... web developers can create their own serialization format and convert that to WebNN and back, this is done today for ONNX and others
15:33:17 [anssik]: tq: we don't need a standard format necessarily, it is a cost of implementation issue
15:33:35 [anssik]: ... most Wasm based implementation need to implement their on WebNN backend
15:34:02 [anssik]: ... e.g. TVM will need to implement its own JSON format, compiler will need to generate that
15:34:09 [anssik]: ... most frameworks will have to do the same
15:34:17 [anssik]: ... a common library to do that would be useful
15:34:52 [anssik]: RafaelCintron: you can use ONNX Runtime for Web, others are also available, maybe we as a group can write a helper library that can evolve independent of the browser
15:34:54 [anssik]: q?
15:35:09 [Ningxin_Hu]: q+
15:35:13 [anssik]: ack Ningxin_Hu
15:35:26 [anssik]: Ningxin_Hu: thanks tq for an awesome introduction!
15:35:47 [anssik]: ... re serialization, we discussed how to ship a model, we are shipping JS code that carries weights
15:36:11 [anssik]: ... is there any way to adapt to other environments that generate the code? With WebNN you can generate the code
15:36:22 [anssik]: tq: I think instead of generating JSON you generate JS instead
15:36:32 [anssik]: ... that could be possible and we could do that, a good idea!
15:36:46 [anssik]: ... we didn't think about that
15:36:47 [anssik]: q?
15:37:46 [anssik]: anssik: top things for the WG to do tq?
15:38:06 [anssik]: tq: 1) Experiment build WebNN TVM backend as a JS file
15:38:29 [anssik]: ... for pure WebNN solution we might be able to get something running, still depends on sync component Rafael mentioned
15:38:40 [anssik]: ... want to understand how TVM infra can help us
15:39:03 [anssik]: ... because TVM has a cross-platform nature can take an existing model and port it
15:40:27 [anssik]: q?
15:41:08 [anssik]: Chai: thanks you tq for an awesome presentation!
15:41:37 [anssik]: q?
15:42:05 [anssik]: Present+ Dwayne_Robinson
15:43:03 [anssik]: Joshua_Lochner: great meeting you, your work is fantastic!
15:43:44 [anssik]: ... pushing the boundaries with ONNX Runtime that Transformers.js uses as a backend, ~1B param is Wasm execution provider limitation, WebGPU backend is WIP
15:44:03 [anssik]: ... maybe you can comment how you'd pushed the boundary of protobuf memory limits
15:44:23 [anssik]: tq: TVM compiler works in such a way weights are not embedded in the model
15:45:11 [anssik]: ... another thing is for LLMs to support dynamic shapes, TVM compilation compile everything into a Wasm file, WebGPU does not have so strict limitations as Wasm, TVM runtime does not suffer from that limitation
15:45:49 [anssik]: ... another challenge, there's 4-bit quantizations, 2-bit quantization, we want to keep up with that evolution and use codegen so you can take a subsets of functions and codegen the execution part
15:46:03 [anssik]: ... that particular subset will execute in more efficient fashion and can help
15:46:33 [anssik]: ... I'm a big fan of Transformers.js too! Looking forward to colloborating with you and make TVM a helpful building block!
15:46:36 [gdti]: q+
15:46:39 [anssik]: Joshua_Lochner: thank you!
15:46:40 [anssik]: q?
15:46:52 [anssik]: ack gdti
15:47:25 [anssik]: gdti: re protobuff limitations, there's active work ongoing to address this, will share details on the latest work
15:47:26 [anssik]: q?
15:48:25 [anssik]: RRSAgent, draft minutes
15:48:27 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
15:48:42 [anssik]: Topic: Efficient WebGPU integration discussion
15:49:19 [anssik]: anssik: During 2023 we've sought for implementation experience and user feedback for the MLCommandEncoder interface that proposes to enable more efficient WebGPU integration
15:49:39 [chai]: q+
15:49:40 [anssik]: ... To start the year, I wanted to check any feedback, suggestions, implementation insights related to efficient WebGPU integration
15:49:42 [anssik]: q?
15:49:44 [anssik]: ack chai
15:50:16 [anssik]: chai: the WebGPU interop is an important topic, what is in the spec is what we came up with last year, with new work in MLBuffer I expect some changes in this area
15:50:47 [anssik]: ... we can allow more efficient interop in the platform with that effort
15:51:19 [anssik]: ... although this may sound like shuffling data back and forth, WebNN is defined to be more than GPU, because NPU architecture is coming too
15:51:58 [anssik]: ... so we're considering also broader set of adapters, additional requirements that is consistent with the way the platform moves information across different devices
15:52:22 [anssik]: ... to summarize, ongoing topic and we intend to make progress on this to have a fully interoperable spec
15:52:23 [anssik]: q?
15:52:42 [anssik]: q?
15:53:15 [anssik]: #322
15:53:16 [gb]: https://github.com/webmachinelearning/webnn/issues/322 -> Pull Request 322 Simplify MLContext creation (by wchao1115)
15:54:12 [anssik]: Chai: #322 is how to define MLContext, related but not the same, active discussion that needs to be finalized prior to integration
15:54:29 [anssik]: ... it says should WebNN have its own GPU device to manage
15:55:04 [anssik]: ... the proposal was to use WebGPUDevice instead
15:55:41 [anssik]: anssik: would prefer to get closure on this prior to next CR refresh
15:55:42 [anssik]: q?
15:55:53 [anssik]: q?
15:55:56 [anssik]: Topic: NPU support discussion
15:56:03 [anssik]: anssik: Another broader and exciting 2024 topic for the group is NPU
15:56:31 [chai]: q+
15:56:34 [anssik]: ... I wanted to reinvigorate discussion on any new features, enhancements, possible API surface changes beneficial for NPU support. Also wanted to see if we could nudge forward some of the issues that touch NPU.
15:56:35 [anssik]: q?
15:56:38 [anssik]: ack chai
15:57:14 [anssik]: chai: NPU is brand new, right now we have commercial products with it hitting the shelves right now
15:57:40 [anssik]: ... we need to think how to embrace it the right way, the platform just started to support it
15:57:42 [anssik]: q?
15:58:01 [RafaelCintron]: q+
15:58:05 [anssik]: ack RafaelCintron
15:58:41 [anssik]: RafaelCintron: just quickly mentioning our collegues are prototyping MLBuffer work, solves WebGPU interop and way to keep things in the correct process
15:58:55 [anssik]: ... every inference call gets JS array, returns one
15:59:26 [anssik]: ... roundtrip to JS process slows us down, MLBuffer keeps things in process and not suffer the roundtrip cost
15:59:31 [anssik]: q?
15:59:40 [Ningxin_Hu]: q+
15:59:47 [anssik]: ack Ningxin_Hu
16:00:11 [anssik]: Ningxin_Hu: a quick heads up, colleagues working on NPU support with Intel hardware
16:00:22 [anssik]: ... this will help inform NPU-related spec work
16:00:23 [anssik]: q?
16:00:56 [anssik]: q?
16:02:05 [anssik]: RRSAgent, draft minutes
16:02:07 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:05:21 [anssik]: s/touch.compile/torch.compile
16:05:22 [anssik]: RRSAgent, draft minutes
16:05:23 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:05:45 [anssik]: s/complication/compilation
16:05:47 [anssik]: RRSAgent, draft minutes
16:05:49 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:06:27 [anssik]: s/complation/compilation
16:06:27 [anssik]: RRSAgent, draft minutes
16:06:28 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:08:09 [anssik]: s/q`?//
16:08:10 [anssik]: RRSAgent, draft minutes
16:08:12 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:14:20 [anssik]: s/group's/group
16:14:22 [anssik]: RRSAgent, draft minutes
16:14:23 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:22:52 [anssik]: s/Experiment build/Experiment with building
16:22:53 [anssik]: RRSAgent, draft minutes
16:22:55 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
16:35:21 [anssik]: s/January 2023/January 2024
16:35:22 [anssik]: RRSAgent, draft minutes
16:35:24 [RRSAgent]: I have made the request to generate https://www.w3.org/2024/01/11-webmachinelearning-minutes.html anssik
18:30:59 [Zakim]: Zakim has left #webmachinelearning