IRC log of webmachinelearning on 2022-02-09

Timestamps are in UTC.

04:56:18 [RRSAgent]
RRSAgent has joined #webmachinelearning
04:56:18 [RRSAgent]
logging to https://www.w3.org/2022/02/09-webmachinelearning-irc
04:56:18 [anssik]
Meeting: WebML CG Teleconference – 9 Feb 2022
04:56:21 [Zakim]
RRSAgent, make logs Public
04:56:22 [Zakim]
please title this meeting ("meeting: ..."), anssik
04:56:23 [anssik]
Chair: Anssi
04:56:29 [anssik]
Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2022-02-09-cg-agenda.md
04:56:33 [anssik]
Scribe: Anssi
04:56:37 [anssik]
scribeNick: anssik
04:56:45 [Honglin]
Honglin has joined #webmachinelearning
04:56:50 [anssik]
Present+ Anssi_Kostiainen
04:59:46 [RafaelCintron]
RafaelCintron has joined #webmachinelearning
05:00:00 [anssik]
Present+ Honglin_Yu
05:00:05 [ningxin_hu]
ningxin_hu has joined #webmachinelearning
05:00:31 [anssik]
Present+ Jonathan_Bingham
05:00:44 [anssik]
Present+ Rafael_Cintron
05:01:25 [anssik]
Present+ Chai_Chaoweeraprasit
05:01:38 [anssik]
Present+ Jon_Napper
05:01:42 [rama]
rama has joined #webmachinelearning
05:01:46 [Jonathan]
Jonathan has joined #webmachinelearning
05:01:48 [chai]
chai has joined #webmachinelearning
05:01:49 [Jon_Napper]
Jon_Napper has joined #webmachinelearning
05:01:52 [anssik]
Present+ Tom_Shafron
05:02:18 [anssik]
Present+ Jim_Pollock
05:02:28 [ningxin_hu]
Present+ Ningxin_Hu
05:02:30 [anssik]
Present+ Ganesan_Ramalingam
05:02:42 [jimpollock]
jimpollock has joined #webmachinelearning
05:02:56 [jack]
jack has joined #webmachinelearning
05:03:29 [anssik]
Present+ Sam_Bhattacharyya
05:03:32 [qjw]
qjw has joined #webmachinelearning
05:04:02 [anssik]
Present+ tyuvr
05:04:21 [anssik]
RRSAgent, draft minutes
05:04:21 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
05:04:25 [chai]
present+
05:04:30 [skyostil]
skyostil has joined #webmachinelearning
05:04:38 [anssik]
scribe+ Jonathan_Bingham
05:04:41 [SamBhattacharyya]
SamBhattacharyya has joined #webmachinelearning
05:05:16 [anssik]
Present+ Sami_Kyostila
05:05:46 [skyostil]
Hi, nice to meet you all! Dialing in from Google
05:06:02 [anssik]
Topic: Model Loader API
05:06:21 [anssik]
anssik: welcome to the second WebML CG call of 2022! We adjusted the agenda a bit to not overlap with Australia day.
05:06:28 [anssik]
... We agreed to move to a monthly cadence after this call.
05:06:48 [anssik]
anssik: On our previous 12 Jan meeting Honglin delivered an Update on Model Loader API
05:06:54 [anssik]
-> https://www.w3.org/2022/01/12-webmachinelearning-minutes.html WebML CG Teleconference – 12 Jan 2022 - meeting minutes
05:07:10 [anssik]
... this presentation included a list of open questions. The plan for today is to continue discuss those
05:08:34 [anssik]
Honglin: updates since previous meeting, thanks to reviewers of the prototype, we've tried accelerate IPC with renderer talking to backend directly
05:09:17 [anssik]
... after discussion with Google's security teams, shared buffer is security sensitive and there are some concerns with it per their early review
05:09:38 [anssik]
... guidance is to avoid using shared buffer and use direct IPC
05:09:53 [anssik]
... 1.3 ms delay, moving forward with direct IPC
05:10:05 [anssik]
... most open questions still apply so good to discuss them today
05:10:43 [anssik]
... some float16 issues identified too, need WebNN folks input
05:11:07 [anssik]
... some resource restrictions also discussed, what resource ML service should be able to use
05:11:28 [RafaelCintron]
q+
05:11:52 [anssik]
ack RafaelCintron
05:12:21 [anssik]
RafaelCintron: question, what did the security people say about using separate process?
05:12:36 [Jonathan]
... Why not use another thread in the same process?
05:12:44 [anssik]
Honglin: no concerns with process isolation
05:13:14 [Jonathan]
Honglin: Security didn't request it. We decided to. If we run in the same process, we have to harden TF Lite even more.
05:13:26 [anssik]
Honglin: the backend process can be safer than the renderer, we can disable more system calls
05:13:49 [anssik]
... we think separate process is a reasonable starting point
05:14:27 [Jonathan]
RafaelCintron: Does the ML service have the same security constraints as the renderer?
05:15:02 [Jonathan]
Honglin: We have one process per model. Even if a hacker takes over, it's isolated.
05:15:16 [Jonathan]
... We could put everything in the renderer. It might be more work.
05:15:29 [Jonathan]
RafaelCintron: In Window, the renderer is the most secure
05:15:41 [Jonathan]
... There's overhead to having a separate process.
05:15:53 [Jonathan]
Honglin: It's an open question.
05:16:02 [Jonathan]
... We have a mature ML service for TF Lite.
05:16:11 [anssik]
Present+ qjw
05:16:19 [Jonathan]
... It's maybe less work to use the existing service
05:16:25 [anssik]
Present+ Saurabh
05:16:28 [Jonathan]
RafaelCintron: What if 2 websites load the same model?
05:16:35 [Jonathan]
Honglin: There are still 2 processes.
05:17:03 [Jonathan]
RafaelCintron: If you keep everything from the same domain in one process, it should be sufficient. A bad actor can't load 100 models and create 100 processes.
05:17:10 [anssik]
RRSAgent, draft minutes
05:17:10 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
05:17:33 [Jonathan]
Honglin: We can restrict the # processes, RAM, CPUs
05:17:35 [anssik]
scribe+ Jonathan
05:18:44 [Jonathan]
Anssik: If you can move more of the security discussion into the public, more people can look at the comments as evidence. Discuss in github or public if possible.
05:18:48 [anssik]
Subtopic: Benchmark discussion
05:19:28 [Jonathan]
Anssik: Let's discuss and identify next steps.
05:19:47 [Jonathan]
Honglin: Not much to share. Improved a lot.
05:20:08 [Jonathan]
... Model loader performance is very similar to raw backend, and much faster than TF Lite WASM runner, even using the same xnnpack backend
05:20:14 [Jonathan]
... for MobileNet image classification
05:20:24 [RafaelCintron]
q+
05:20:49 [anssik]
ack RafaelCintron
05:21:11 [Jonathan]
RafaelCintron: General question: since everything runs on the CPU, why not all with WASM? What's missing from WASM?
05:21:25 [Jonathan]
... Is it just a matter of more or wider SIMD instructions?
05:21:35 [Jonathan]
... Or is there something preventing WASM from doing better?
05:21:44 [anssik]
RRSAgent, draft minutes
05:21:44 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
05:22:02 [Jonathan]
Honglin: We talked to the WASM team and haven't figured out for sure. There are CPU instructions that can't be supported in WASM. There's also compilation issues.
05:22:17 [Jonathan]
... When WASM is going to run, it has a JIT step, even if it's precompiled.
05:22:38 [ningxin_hu]
q+
05:22:40 [Jonathan]
... That's the best guess. Not sure of exact reason
05:23:22 [anssik]
ack ningxin_hu
05:23:28 [Jonathan]
Ningxin: Honglin's assessment is good. SIMD instructions support 128 bit. Honglin's platform tests support 256 bit widths, which may explain the difference.
05:23:41 [Jonathan]
RafaelCintron: If we have 256 in WASM would that close the gap?
05:24:03 [Jonathan]
Ningxin: Flexible SIMD instructions in WASM will support wider instructions. We can check with them with the early prototype for their performance numbers.
05:24:07 [anssik]
q?
05:24:43 [anssik]
Subtopic: Hardware acceleration
05:25:28 [Jonathan]
Anssi: What will the interface look like when hardware acceleration is supported? What if data is on a GPU or TPU? Should users know what accelerators are available? Performance vs privacy tradeoff.
05:25:37 [Jonathan]
... Web NN is thinking of the same issues. We can share learnings.
05:25:38 [anssik]
q?
05:26:04 [Jonathan]
Honglin: Not too many updates. We have platform experts joining this group. Maybe they have updates.
05:27:11 [Jonathan]
jimpollock: I work with Sami and Honglin on hardware acceleration. Mostly on deployment taking Web ML code and plumbing through to hardware acceleration.
05:27:30 [Jonathan]
... Use Web NN APi on the backend. Targeting media devices with GPU. Starting with CPU to get the flow working.
05:27:41 [Jonathan]
... How do we expose acceleration to the developer?
05:28:08 [Jonathan]
... A few options; do we allow partitioned execution, and the accelerator runs what it can and delegates the rest to CPU? Could be less optimal than doing all on CPU
05:28:29 [Jonathan]
... Or do we give more info to the user, like what accelerators are supported, or even what ops are supported. But fingerprinting concerns with that.
05:28:37 [anssik]
q?
05:28:40 [Jonathan]
... Thinking through design choices. Working on plumbing to get benchmarks.
05:29:08 [Jonathan]
Anssi: WebNN is also discussing similar issues.
05:29:30 [Jonathan]
Chai: Is WebNN also available in ChromeOS?
05:30:01 [Jonathan]
jimpollock: Yes. Rather than coming up with a new paradigm, we decided to use WebNN for some stability against breaking changes in TF Lite.
05:30:03 [anssik]
RRSAgent, draft minutes
05:30:03 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
05:30:13 [Jonathan]
... It's been ported from Android. We disabled some features like hardware buffers. A full port.
05:30:32 [Jonathan]
Chai: For the purposes of this prototype, you're using the CPU backend for the API, but are thinking of GPU?
05:30:53 [RafaelCintron]
q+
05:30:55 [Jonathan]
jimpollock: There's no driver for the GPU. Just using CPU to prove the integration. Then will run on accelerators with full end-to-end solution.
05:30:55 [skyostil]
Minor correction: Jim is referring to NNAPI (i.e., the Android ML hardware adaptation layer) instead of WebNN
05:31:23 [Jonathan]
... Run with the same benchmarks as Honglin and see.
05:31:33 [Jonathan]
Honglin: We can partition the graph with heuristics.
05:31:55 [Jonathan]
... But there may be fingerprinting issues if we let the developer or web apps decide how.
05:32:06 [Jonathan]
... Transparent vs explicit is the tradeoff.
05:32:49 [anssik]
q?
05:32:49 [anssik]
q?
05:32:53 [anssik]
ack RafaelCintron
05:33:07 [Jonathan]
RafaelCintron: For accelerators you're targeting, Jim, how does data transfer work?
05:33:31 [Jonathan]
... Say you're getting 2 MB. Can you give a pointer to CPU memory? Do you have to copy it and then you can use it? Do you have to copy back after before looking at it?
05:33:47 [Jonathan]
jimpollock: It varies from accelerator to accelerator.
05:33:49 [chai]
q+
05:34:16 [Jonathan]
... It runs in the TF runtime in the ML Service binary executable process. The NNAPI delegate walks the graph and calls all the methods for the NNAPI representation including tensors.
05:34:37 [Jonathan]
... Then it crosses the boundary to the vendor implementation. Depends. Can be memory copy or shared representation.
05:34:49 [Jonathan]
... Some have onboard memory. Examples of both. Vendors keep some details secret.
05:34:56 [anssik]
q?
05:35:03 [Jonathan]
... Not a single answer. Could happen in any permutation depending on vendor driver implementation.
05:35:21 [Jonathan]
RafaelCintron: You give them a pointer to memory and it's up to them to operate directly or copy.
05:35:31 [Jonathan]
jimpollock: Yes. We have some of the same security issues Honglin mentioned.
05:35:44 [Jonathan]
... We need to make sure won't have shared memory concerns.
05:36:00 [Jonathan]
... May still need mem copies depending on what the security team says.
05:36:20 [Jonathan]
Chai: Trying to understand how we achieve hardware portability across different accelerators.
05:36:35 [Jonathan]
... You're testing with NNAPI. Drivers are CPU. How does that work across differnet hardware?
05:37:05 [Jonathan]
... For WebNN, the whole thing is about defining the interface that anyone can implement in any accelrator in the platform
05:37:08 [anssik]
ack chai
05:37:20 [Jonathan]
... If you run the same model and the same content, regardless of hardware you run on, you can accelerate
05:37:42 [Jonathan]
... Is it because we want to understand through the benchmark before taking the next step? How does it connect with hardware portability on different systems?
05:37:49 [Jonathan]
jimpollock: For Chrome OS or more general?
05:38:12 [Jonathan]
Chai: Looking for education. That's why asking about NNAPI on Chrome OS. The driver might be different. Does it come with Chrome OS?
05:38:26 [Jonathan]
... Obviously different PCs have different hardware. How does it target different vendor GPUs?
05:38:53 [Jonathan]
jimpollock: Consider different platforms. On a simple system with no accelerators, NNAPI wouldn't be enabled. There'd be no delegation.
05:39:03 [Jonathan]
... If a system has an accelerator and the vendor has provided a driver.
05:39:17 [Jonathan]
... We'd enable NNAPI and when model is processed by TFLite or inferencing system...
05:39:30 [Jonathan]
... It queries delegate, which reports which parts of the model it can process.
05:39:45 [Jonathan]
... Then partitioning comes up. Support full models, or partitioned delegation?
05:40:21 [Jonathan]
... How portability works is the combination of hardware availability, driver installed, inferencing engine takes it into account when setting up interpreter. Delegates as directed to.
05:40:27 [Jonathan]
... More complex: multiple accelerators.
05:40:50 [Jonathan]
... Gets specific to NNAPI. Can be done. Can partition across multiple. In practice, don't see it often. Usually gives less performance.
05:41:00 [Jonathan]
... Marshaling data back and forth between kernels is slow.
05:41:18 [Jonathan]
... Dynamic system, looks at what's available, installed, decide what to process where based on caller's constraints for partitioned execution.
05:41:43 [Jonathan]
Chai: Makes sense. Solution is specific to Chrome OS. Not PC, where you can't determine what accelerator is running.
05:42:14 [Jonathan]
... For instance, a PC running a Chrome browser. That's a common environment. Most PCs have accelerators, a GPU.
05:42:40 [Jonathan]
... If the solution is specific to Chrome OS, vendors may put together different drivers.
05:43:12 [Jonathan]
jimpollock: I'm giving the specifics for Chrome OS because I know it. There's an abstraction layer like OpenGL. Your inferencing engine speaks Open GL and the vendor library applies it.
05:43:18 [Jonathan]
... NNAPI plays that role in Chrome OS
05:43:20 [RafaelCintron]
q+
05:43:40 [Jonathan]
... NNAPI is like OpenGL, a common interface. There are various extensions. They can be used if available.
05:43:41 [anssik]
q?
05:44:12 [Jonathan]
... Don't know the Microsoft equivalent, expect it's similar.
05:44:26 [Jonathan]
Chai: The purpose of WebNN as a spec is to define the abstraction for ML that's higher than OpenGL.
05:44:44 [Jonathan]
... That's why the spec was written that way. Then we can accelerate ML differently than regular shaders.
05:45:01 [Jonathan]
... The abstraction for the NNAPI solution is what I'm trying to understand
05:45:23 [Jonathan]
... To build something that works for chromium, not only on Chrome OS, the abstraction would look very much like Web NN, not NNAPI.
05:45:33 [Jonathan]
... In a sense they're the same. Operator-based abstraction.
05:45:51 [Jonathan]
... You've answered the question. NNAPI is used to understand how to layer.
05:46:19 [Jonathan]
jimpollock: I agree NNAPI doesn't exist everywhere. Not on Windows. DirectML might be the equivalent arbitrator
05:46:29 [Jonathan]
... Each platform would have its equivalent.
05:46:44 [Jonathan]
Chai: The way we think about DirectML (I run the DirectML group at Microsoft)
05:47:09 [Jonathan]
... Our goal is to have an abstraction that's not specific to Windows. Load Model could sit on top of WebNN so an OS could sit on top.
05:47:34 [Jonathan]
Honglin: If only using GPU, that can be done. We want to support more accelerators, which is harder.
05:47:39 [Jonathan]
... We have to start somewhere.
05:47:50 [Jonathan]
... NNAPI is a good starting point.
05:48:01 [anssik]
ack RafaelCintron
05:48:28 [Jonathan]
RafaelCintron: Yes, it's true WebNN accepts WebGL or WebGPU. That doesn't mean it has to be only GPU.
05:48:34 [Jonathan]
... We could add accelerator adaptors later
05:48:55 [chai]
q+
05:49:06 [Jonathan]
... Then you can instantiate and some features might not be available. We can add overloads that support different kinds of buffers. They can be accelerator speciifc.
05:49:12 [ningxin_hu]
q+
05:49:17 [Jonathan]
... There's still a lot of value in looking at it.
05:49:22 [anssik]
q?
05:49:32 [Jonathan]
Honglin: A lot of PCs have powerful GPUs. On mobile, the ML accelerator may be more valuable.
05:49:46 [Jonathan]
... If the ML accelerator buffers can be used by WebNN, Model Loader can also use them
05:49:59 [Jonathan]
... What do you think?
05:50:22 [Jonathan]
RafaelCintron: We could do it a couple ways. The same overloads, or new ones for other kinds of buffers. Accelerator buffers. Or array buffers.
05:50:35 [Jonathan]
... Run in a special process instead of CPU. Could use NNAPI under the covers.
05:51:03 [Jonathan]
... When Ningxin and Chai specified the operators, they looked at other platforms, not only Windows. They made sure the ops could be implemented, including on NNAPI.
05:51:07 [Jonathan]
... If not, we can revise them.
05:51:08 [anssik]
ack chai
05:51:47 [Jonathan]
Chai: When we started WebNN, the goal was to define the abstraction for ML accelerators. We're not setting out to build something specific to CPU or GPU.
05:51:55 [Jonathan]
... It's a portable abstraction across hardware.
05:52:12 [Jonathan]
... We acknowledge that the web stack is sitting on an ecosystem with a lot of different hardware.
05:52:35 [Jonathan]
... MacOS and Windows are moving all the time. We want to avoid defining something only valid for today's hardware.
05:52:52 [Jonathan]
... DirectML is an implementation at one point in time on Windows. We want to support more accelerators in the ecosystem.
05:53:06 [Jonathan]
... The abstraction is separate from the implementation. The abstraction is what we aim for.
05:53:13 [anssik]
q?
05:53:37 [anssik]
ack ningxin_hu
05:53:41 [Jonathan]
Honglin: Great point For model loader, we make the portable thing to the backend. It's the responsibility of the backend. TF has already put a lot of effort into making it portable.
05:53:59 [Jonathan]
... What I mean is, when the accelerator is available to WebNN, it's the same that's available to Model Loader.
05:54:31 [Jonathan]
... We're facing the same issue. For TF, the ONNX format is very similar to what the WebNN API is doing, defining the computational graph, the tnesors.
05:54:54 [Jonathan]
Chai: Not exactly. On the surface, WebNN has convolution and ONNX and TF do.
05:55:09 [Jonathan]
... WebNN is designed to be the backend, and not the frontend interface.
05:55:41 [Jonathan]
... If NNAPI is an implementation that's on the move, that's fine. NNAPI drivers are not available everywhere.
05:55:50 [Jonathan]
... For testing and proving the API, it's fine
05:56:07 [Jonathan]
... As we do more, let's keep in mind that we want an abstraction like WebNN.
05:56:21 [Jonathan]
Honglin: ML accelerators are in rapid development. They may change a lot in the future.
05:56:37 [Jonathan]
... It's hard to cover all cases. GPUs are more mature than ML accelerators.
05:56:54 [Jonathan]
... We're aware NNAPI is not available everywhere. That's where we're starting for Chrome OS.
05:57:09 [Jonathan]
Ningxin: Regarding NNAPI, I agree.
05:57:37 [Jonathan]
... For the WebNN spec, in the working group, we took NNAPI as an implementation backend option when checking each operator's compatibility.
05:57:48 [Jonathan]
... We test against DirectML and NNAPI on Android or ChromeOS.
05:58:14 [Jonathan]
... I have an ask for help. For some issue discussions, we have questions about how ops can be supported by NNAPI.
05:58:15 [anssik]
RRSAgent, draft minutes
05:58:15 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
05:58:25 [Jonathan]
... We'd like to be connected to the right people.
05:58:43 [Jonathan]
... Can you help?
05:59:08 [Jonathan]
jimpollock: You're looking for understanding of whether a WebNN op spec would be supported by NNAPI? Yes, we can help review and see what the mapping is like.
05:59:22 [Jonathan]
... A lot of ops have constraints. There may be certain modes that are supported, and others not
05:59:40 [Jonathan]
Ningxin: We also have WebNN Native project, to evaluate.
05:59:49 [anssik]
-> https://github.com/webmachinelearning/webnn-native/ WebNN-native project
05:59:55 [Jonathan]
... We have a PR to add NNAPI backend. We have questions about implementability.
06:00:05 [Jonathan]
... We can summarize and ask for your input.
06:00:48 [ningxin_hu]
webnn-native nnapi backend PR: https://github.com/webmachinelearning/webnn-native/pull/178
06:02:28 [anssik]
anssik: HUGE thanks to Jonathan for scribing, great job!
06:02:49 [anssik]
... we'll defer versioning, streaming input and model format discussion topics to our next call
06:03:50 [anssik]
... we agreed to have our next call one month from now, but Jonathan and Honglin please share if you'd like to have our next call sooner, two weeks from now
06:04:00 [anssik]
RRSAgent, draft minutes
06:04:00 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html anssik
12:00:42 [RRSAgent]
RRSAgent has joined #webmachinelearning
12:00:42 [RRSAgent]
logging to https://www.w3.org/2022/02/09-webmachinelearning-irc
12:00:56 [dom]
i/Topic: Model Loader API/scribe+ Jonathan
12:00:58 [dom]
RRSAgent, draft minutes
12:00:58 [RRSAgent]
I have made the request to generate https://www.w3.org/2022/02/09-webmachinelearning-minutes.html dom
12:01:22 [dom]
RRSAgent, bye