04:56:18 RRSAgent has joined #webmachinelearning 04:56:18 logging to https://www.w3.org/2022/02/09-webmachinelearning-irc 04:56:18 Meeting: WebML CG Teleconference – 9 Feb 2022 04:56:21 RRSAgent, make logs Public 04:56:22 please title this meeting ("meeting: ..."), anssik 04:56:23 Chair: Anssi 04:56:29 Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2022-02-09-cg-agenda.md 04:56:33 Scribe: Anssi 04:56:37 scribeNick: anssik 04:56:45 Honglin has joined #webmachinelearning 04:56:50 Present+ Anssi_Kostiainen 04:59:46 RafaelCintron has joined #webmachinelearning 05:00:00 Present+ Honglin_Yu 05:00:05 ningxin_hu has joined #webmachinelearning 05:00:31 Present+ Jonathan_Bingham 05:00:44 Present+ Rafael_Cintron 05:01:25 Present+ Chai_Chaoweeraprasit 05:01:38 Present+ Jon_Napper 05:01:42 rama has joined #webmachinelearning 05:01:46 Jonathan has joined #webmachinelearning 05:01:48 chai has joined #webmachinelearning 05:01:49 Jon_Napper has joined #webmachinelearning 05:01:52 Present+ Tom_Shafron 05:02:18 Present+ Jim_Pollock 05:02:28 Present+ Ningxin_Hu 05:02:30 Present+ Ganesan_Ramalingam 05:02:42 jimpollock has joined #webmachinelearning 05:02:56 jack has joined #webmachinelearning 05:03:29 Present+ Sam_Bhattacharyya 05:03:32 qjw has joined #webmachinelearning 05:04:02 Present+ tyuvr 05:04:21 RRSAgent, draft minutes 05:04:21 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 05:04:25 present+ 05:04:30 skyostil has joined #webmachinelearning 05:04:38 scribe+ Jonathan_Bingham 05:04:41 SamBhattacharyya has joined #webmachinelearning 05:05:16 Present+ Sami_Kyostila 05:05:46 Hi, nice to meet you all! Dialing in from Google 05:06:02 Topic: Model Loader API 05:06:21 anssik: welcome to the second WebML CG call of 2022! We adjusted the agenda a bit to not overlap with Australia day. 05:06:28 ... We agreed to move to a monthly cadence after this call. 05:06:48 anssik: On our previous 12 Jan meeting Honglin delivered an Update on Model Loader API 05:06:54 -> https://www.w3.org/2022/01/12-webmachinelearning-minutes.html WebML CG Teleconference – 12 Jan 2022 - meeting minutes 05:07:10 ... this presentation included a list of open questions. The plan for today is to continue discuss those 05:08:34 Honglin: updates since previous meeting, thanks to reviewers of the prototype, we've tried accelerate IPC with renderer talking to backend directly 05:09:17 ... after discussion with Google's security teams, shared buffer is security sensitive and there are some concerns with it per their early review 05:09:38 ... guidance is to avoid using shared buffer and use direct IPC 05:09:53 ... 1.3 ms delay, moving forward with direct IPC 05:10:05 ... most open questions still apply so good to discuss them today 05:10:43 ... some float16 issues identified too, need WebNN folks input 05:11:07 ... some resource restrictions also discussed, what resource ML service should be able to use 05:11:28 q+ 05:11:52 ack RafaelCintron 05:12:21 RafaelCintron: question, what did the security people say about using separate process? 05:12:36 ... Why not use another thread in the same process? 05:12:44 Honglin: no concerns with process isolation 05:13:14 Honglin: Security didn't request it. We decided to. If we run in the same process, we have to harden TF Lite even more. 05:13:26 Honglin: the backend process can be safer than the renderer, we can disable more system calls 05:13:49 ... we think separate process is a reasonable starting point 05:14:27 RafaelCintron: Does the ML service have the same security constraints as the renderer? 05:15:02 Honglin: We have one process per model. Even if a hacker takes over, it's isolated. 05:15:16 ... We could put everything in the renderer. It might be more work. 05:15:29 RafaelCintron: In Window, the renderer is the most secure 05:15:41 ... There's overhead to having a separate process. 05:15:53 Honglin: It's an open question. 05:16:02 ... We have a mature ML service for TF Lite. 05:16:11 Present+ qjw 05:16:19 ... It's maybe less work to use the existing service 05:16:25 Present+ Saurabh 05:16:28 RafaelCintron: What if 2 websites load the same model? 05:16:35 Honglin: There are still 2 processes. 05:17:03 RafaelCintron: If you keep everything from the same domain in one process, it should be sufficient. A bad actor can't load 100 models and create 100 processes. 05:17:10 RRSAgent, draft minutes 05:17:10 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 05:17:33 Honglin: We can restrict the # processes, RAM, CPUs 05:17:35 scribe+ Jonathan 05:18:44 Anssik: If you can move more of the security discussion into the public, more people can look at the comments as evidence. Discuss in github or public if possible. 05:18:48 Subtopic: Benchmark discussion 05:19:28 Anssik: Let's discuss and identify next steps. 05:19:47 Honglin: Not much to share. Improved a lot. 05:20:08 ... Model loader performance is very similar to raw backend, and much faster than TF Lite WASM runner, even using the same xnnpack backend 05:20:14 ... for MobileNet image classification 05:20:24 q+ 05:20:49 ack RafaelCintron 05:21:11 RafaelCintron: General question: since everything runs on the CPU, why not all with WASM? What's missing from WASM? 05:21:25 ... Is it just a matter of more or wider SIMD instructions? 05:21:35 ... Or is there something preventing WASM from doing better? 05:21:44 RRSAgent, draft minutes 05:21:44 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 05:22:02 Honglin: We talked to the WASM team and haven't figured out for sure. There are CPU instructions that can't be supported in WASM. There's also compilation issues. 05:22:17 ... When WASM is going to run, it has a JIT step, even if it's precompiled. 05:22:38 q+ 05:22:40 ... That's the best guess. Not sure of exact reason 05:23:22 ack ningxin_hu 05:23:28 Ningxin: Honglin's assessment is good. SIMD instructions support 128 bit. Honglin's platform tests support 256 bit widths, which may explain the difference. 05:23:41 RafaelCintron: If we have 256 in WASM would that close the gap? 05:24:03 Ningxin: Flexible SIMD instructions in WASM will support wider instructions. We can check with them with the early prototype for their performance numbers. 05:24:07 q? 05:24:43 Subtopic: Hardware acceleration 05:25:28 Anssi: What will the interface look like when hardware acceleration is supported? What if data is on a GPU or TPU? Should users know what accelerators are available? Performance vs privacy tradeoff. 05:25:37 ... Web NN is thinking of the same issues. We can share learnings. 05:25:38 q? 05:26:04 Honglin: Not too many updates. We have platform experts joining this group. Maybe they have updates. 05:27:11 jimpollock: I work with Sami and Honglin on hardware acceleration. Mostly on deployment taking Web ML code and plumbing through to hardware acceleration. 05:27:30 ... Use Web NN APi on the backend. Targeting media devices with GPU. Starting with CPU to get the flow working. 05:27:41 ... How do we expose acceleration to the developer? 05:28:08 ... A few options; do we allow partitioned execution, and the accelerator runs what it can and delegates the rest to CPU? Could be less optimal than doing all on CPU 05:28:29 ... Or do we give more info to the user, like what accelerators are supported, or even what ops are supported. But fingerprinting concerns with that. 05:28:37 q? 05:28:40 ... Thinking through design choices. Working on plumbing to get benchmarks. 05:29:08 Anssi: WebNN is also discussing similar issues. 05:29:30 Chai: Is WebNN also available in ChromeOS? 05:30:01 jimpollock: Yes. Rather than coming up with a new paradigm, we decided to use WebNN for some stability against breaking changes in TF Lite. 05:30:03 RRSAgent, draft minutes 05:30:03 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 05:30:13 ... It's been ported from Android. We disabled some features like hardware buffers. A full port. 05:30:32 Chai: For the purposes of this prototype, you're using the CPU backend for the API, but are thinking of GPU? 05:30:53 q+ 05:30:55 jimpollock: There's no driver for the GPU. Just using CPU to prove the integration. Then will run on accelerators with full end-to-end solution. 05:30:55 Minor correction: Jim is referring to NNAPI (i.e., the Android ML hardware adaptation layer) instead of WebNN 05:31:23 ... Run with the same benchmarks as Honglin and see. 05:31:33 Honglin: We can partition the graph with heuristics. 05:31:55 ... But there may be fingerprinting issues if we let the developer or web apps decide how. 05:32:06 ... Transparent vs explicit is the tradeoff. 05:32:49 q? 05:32:49 q? 05:32:53 ack RafaelCintron 05:33:07 RafaelCintron: For accelerators you're targeting, Jim, how does data transfer work? 05:33:31 ... Say you're getting 2 MB. Can you give a pointer to CPU memory? Do you have to copy it and then you can use it? Do you have to copy back after before looking at it? 05:33:47 jimpollock: It varies from accelerator to accelerator. 05:33:49 q+ 05:34:16 ... It runs in the TF runtime in the ML Service binary executable process. The NNAPI delegate walks the graph and calls all the methods for the NNAPI representation including tensors. 05:34:37 ... Then it crosses the boundary to the vendor implementation. Depends. Can be memory copy or shared representation. 05:34:49 ... Some have onboard memory. Examples of both. Vendors keep some details secret. 05:34:56 q? 05:35:03 ... Not a single answer. Could happen in any permutation depending on vendor driver implementation. 05:35:21 RafaelCintron: You give them a pointer to memory and it's up to them to operate directly or copy. 05:35:31 jimpollock: Yes. We have some of the same security issues Honglin mentioned. 05:35:44 ... We need to make sure won't have shared memory concerns. 05:36:00 ... May still need mem copies depending on what the security team says. 05:36:20 Chai: Trying to understand how we achieve hardware portability across different accelerators. 05:36:35 ... You're testing with NNAPI. Drivers are CPU. How does that work across differnet hardware? 05:37:05 ... For WebNN, the whole thing is about defining the interface that anyone can implement in any accelrator in the platform 05:37:08 ack chai 05:37:20 ... If you run the same model and the same content, regardless of hardware you run on, you can accelerate 05:37:42 ... Is it because we want to understand through the benchmark before taking the next step? How does it connect with hardware portability on different systems? 05:37:49 jimpollock: For Chrome OS or more general? 05:38:12 Chai: Looking for education. That's why asking about NNAPI on Chrome OS. The driver might be different. Does it come with Chrome OS? 05:38:26 ... Obviously different PCs have different hardware. How does it target different vendor GPUs? 05:38:53 jimpollock: Consider different platforms. On a simple system with no accelerators, NNAPI wouldn't be enabled. There'd be no delegation. 05:39:03 ... If a system has an accelerator and the vendor has provided a driver. 05:39:17 ... We'd enable NNAPI and when model is processed by TFLite or inferencing system... 05:39:30 ... It queries delegate, which reports which parts of the model it can process. 05:39:45 ... Then partitioning comes up. Support full models, or partitioned delegation? 05:40:21 ... How portability works is the combination of hardware availability, driver installed, inferencing engine takes it into account when setting up interpreter. Delegates as directed to. 05:40:27 ... More complex: multiple accelerators. 05:40:50 ... Gets specific to NNAPI. Can be done. Can partition across multiple. In practice, don't see it often. Usually gives less performance. 05:41:00 ... Marshaling data back and forth between kernels is slow. 05:41:18 ... Dynamic system, looks at what's available, installed, decide what to process where based on caller's constraints for partitioned execution. 05:41:43 Chai: Makes sense. Solution is specific to Chrome OS. Not PC, where you can't determine what accelerator is running. 05:42:14 ... For instance, a PC running a Chrome browser. That's a common environment. Most PCs have accelerators, a GPU. 05:42:40 ... If the solution is specific to Chrome OS, vendors may put together different drivers. 05:43:12 jimpollock: I'm giving the specifics for Chrome OS because I know it. There's an abstraction layer like OpenGL. Your inferencing engine speaks Open GL and the vendor library applies it. 05:43:18 ... NNAPI plays that role in Chrome OS 05:43:20 q+ 05:43:40 ... NNAPI is like OpenGL, a common interface. There are various extensions. They can be used if available. 05:43:41 q? 05:44:12 ... Don't know the Microsoft equivalent, expect it's similar. 05:44:26 Chai: The purpose of WebNN as a spec is to define the abstraction for ML that's higher than OpenGL. 05:44:44 ... That's why the spec was written that way. Then we can accelerate ML differently than regular shaders. 05:45:01 ... The abstraction for the NNAPI solution is what I'm trying to understand 05:45:23 ... To build something that works for chromium, not only on Chrome OS, the abstraction would look very much like Web NN, not NNAPI. 05:45:33 ... In a sense they're the same. Operator-based abstraction. 05:45:51 ... You've answered the question. NNAPI is used to understand how to layer. 05:46:19 jimpollock: I agree NNAPI doesn't exist everywhere. Not on Windows. DirectML might be the equivalent arbitrator 05:46:29 ... Each platform would have its equivalent. 05:46:44 Chai: The way we think about DirectML (I run the DirectML group at Microsoft) 05:47:09 ... Our goal is to have an abstraction that's not specific to Windows. Load Model could sit on top of WebNN so an OS could sit on top. 05:47:34 Honglin: If only using GPU, that can be done. We want to support more accelerators, which is harder. 05:47:39 ... We have to start somewhere. 05:47:50 ... NNAPI is a good starting point. 05:48:01 ack RafaelCintron 05:48:28 RafaelCintron: Yes, it's true WebNN accepts WebGL or WebGPU. That doesn't mean it has to be only GPU. 05:48:34 ... We could add accelerator adaptors later 05:48:55 q+ 05:49:06 ... Then you can instantiate and some features might not be available. We can add overloads that support different kinds of buffers. They can be accelerator speciifc. 05:49:12 q+ 05:49:17 ... There's still a lot of value in looking at it. 05:49:22 q? 05:49:32 Honglin: A lot of PCs have powerful GPUs. On mobile, the ML accelerator may be more valuable. 05:49:46 ... If the ML accelerator buffers can be used by WebNN, Model Loader can also use them 05:49:59 ... What do you think? 05:50:22 RafaelCintron: We could do it a couple ways. The same overloads, or new ones for other kinds of buffers. Accelerator buffers. Or array buffers. 05:50:35 ... Run in a special process instead of CPU. Could use NNAPI under the covers. 05:51:03 ... When Ningxin and Chai specified the operators, they looked at other platforms, not only Windows. They made sure the ops could be implemented, including on NNAPI. 05:51:07 ... If not, we can revise them. 05:51:08 ack chai 05:51:47 Chai: When we started WebNN, the goal was to define the abstraction for ML accelerators. We're not setting out to build something specific to CPU or GPU. 05:51:55 ... It's a portable abstraction across hardware. 05:52:12 ... We acknowledge that the web stack is sitting on an ecosystem with a lot of different hardware. 05:52:35 ... MacOS and Windows are moving all the time. We want to avoid defining something only valid for today's hardware. 05:52:52 ... DirectML is an implementation at one point in time on Windows. We want to support more accelerators in the ecosystem. 05:53:06 ... The abstraction is separate from the implementation. The abstraction is what we aim for. 05:53:13 q? 05:53:37 ack ningxin_hu 05:53:41 Honglin: Great point For model loader, we make the portable thing to the backend. It's the responsibility of the backend. TF has already put a lot of effort into making it portable. 05:53:59 ... What I mean is, when the accelerator is available to WebNN, it's the same that's available to Model Loader. 05:54:31 ... We're facing the same issue. For TF, the ONNX format is very similar to what the WebNN API is doing, defining the computational graph, the tnesors. 05:54:54 Chai: Not exactly. On the surface, WebNN has convolution and ONNX and TF do. 05:55:09 ... WebNN is designed to be the backend, and not the frontend interface. 05:55:41 ... If NNAPI is an implementation that's on the move, that's fine. NNAPI drivers are not available everywhere. 05:55:50 ... For testing and proving the API, it's fine 05:56:07 ... As we do more, let's keep in mind that we want an abstraction like WebNN. 05:56:21 Honglin: ML accelerators are in rapid development. They may change a lot in the future. 05:56:37 ... It's hard to cover all cases. GPUs are more mature than ML accelerators. 05:56:54 ... We're aware NNAPI is not available everywhere. That's where we're starting for Chrome OS. 05:57:09 Ningxin: Regarding NNAPI, I agree. 05:57:37 ... For the WebNN spec, in the working group, we took NNAPI as an implementation backend option when checking each operator's compatibility. 05:57:48 ... We test against DirectML and NNAPI on Android or ChromeOS. 05:58:14 ... I have an ask for help. For some issue discussions, we have questions about how ops can be supported by NNAPI. 05:58:15 RRSAgent, draft minutes 05:58:15 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 05:58:25 ... We'd like to be connected to the right people. 05:58:43 ... Can you help? 05:59:08 jimpollock: You're looking for understanding of whether a WebNN op spec would be supported by NNAPI? Yes, we can help review and see what the mapping is like. 05:59:22 ... A lot of ops have constraints. There may be certain modes that are supported, and others not 05:59:40 Ningxin: We also have WebNN Native project, to evaluate. 05:59:49 -> https://github.com/webmachinelearning/webnn-native/ WebNN-native project 05:59:55 ... We have a PR to add NNAPI backend. We have questions about implementability. 06:00:05 ... We can summarize and ask for your input. 06:00:48 webnn-native nnapi backend PR: https://github.com/webmachinelearning/webnn-native/pull/178 06:02:28 anssik: HUGE thanks to Jonathan for scribing, great job! 06:02:49 ... we'll defer versioning, streaming input and model format discussion topics to our next call 06:03:50 ... we agreed to have our next call one month from now, but Jonathan and Honglin please share if you'd like to have our next call sooner, two weeks from now 06:04:00 RRSAgent, draft minutes 06:04:00 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik 12:00:42 RRSAgent has joined #webmachinelearning 12:00:42 logging to https://www.w3.org/2022/02/09-webmachinelearning-irc 12:00:56 i/Topic: Model Loader API/scribe+ Jonathan 12:00:58 RRSAgent, draft minutes 12:00:58 I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html dom 12:01:22 RRSAgent, bye