13:54:13 RRSAgent has joined #webmachinelearning 13:54:17 logging to https://www.w3.org/2023/10/05-webmachinelearning-irc 13:54:17 RRSAgent, make logs Public 13:54:18 please title this meeting ("meeting: ..."), anssik 13:54:18 Meeting: WebML WG Teleconference – 5 October 2023 13:54:23 Chair: Anssi 13:54:34 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-10-05-wg-agenda.md 13:54:34 Scribe: Anssi 13:54:36 scribeNick: anssik 13:54:42 gb, this is webmachinelearning/webnn 13:54:42 anssik, OK. 13:54:48 Present+ Anssi_Kostiainen 13:54:51 Regrets+ Dominique_Hazael-Massieux 13:55:06 RRSAgent, draft minutes 13:55:07 I have made the request to generate https://www.w3.org/2023/10/05-webmachinelearning-minutes.html anssik 13:57:49 Present+ Rafael_Cintron 13:59:49 RafaelCintron has joined #webmachinelearning 14:00:15 Present+ Joshua_Lochner 14:00:23 Present+ Deepti_Gandluri 14:00:41 Deepti has joined #webmachinelearning 14:00:48 jsbell has joined #webmachinelearning 14:01:09 Present+ Wanming_Lin 14:01:14 Present+ Joshua_Bell 14:01:55 RRSAgent, draft minutes 14:01:56 I have made the request to generate https://www.w3.org/2023/10/05-webmachinelearning-minutes.html anssik 14:02:57 anssik: please welcome Deepti Gandluri from Google to the WG! She has worked on the WebAssembly implementation in V8. 14:03:32 Deepti: I also co-chair W3C Wasm CG 14:03:50 Thank you! 14:04:15 ... also please welcome Phillis Tang also from Google to the WG! She has shipped PWA desktop capabilities in Chrome, also one of Google W3C WebApps WG reps 14:04:42 Topic: WebNN v2: Review op breakdown for proposed model targets 14:04:55 anssik: The WG has identified the following as its v2 model targets: 14:05:03 ... Text-to-image: Stable Diffusion unet/VAE/text encoder 14:05:12 ... Image segmentation: Segment Everything decoder 14:05:13 Joshua_Lochner has joined #webmachinelearning 14:05:16 ... Speech-to-text: Whisper Tiny 14:05:24 ... Text-to-text generation (encoder-decoder): t5 and m2m100 14:05:29 ... Text-generation (decoder-only): llama 14:05:58 Vivek has joined #webmachinelearning 14:05:58 ... as discussed on our last call, we want to do an op breakdown to better understand what is common across these architectures to inform WebNN API v2 op priorities. 14:06:25 ... Ningxin indicated he was working Wanming on such an op breakdown, would you Wanming like to share an update into this investigation? 14:06:47 Wanming has joined #webmachinelearning 14:07:18 anssik: issue #375 14:07:19 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 14:07:26 -> https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1732976561 14:07:37 -> https://user-images.githubusercontent.com/3271201/270237911-3a204653-c8d4-4243-b2cd-6e4443240bf3.jpg 14:08:45 Rachel has joined #webmachinelearning 14:08:57 present+ 14:09:10 Wanming: transformer models contain dynamic input shapes, ONNX RT Web enabled freeDimensionOverrides 14:09:20 ... users able to run dynamic shape models in WebNN EP 14:10:01 ... ONNX RT support constant folding, fusion, node eliminations etc. 14:10:08 ... in addition to graph optimization 14:10:43 ... optimized model is a static model, when inference session is initiatized these optimizations are applied, WebNN EP runs an optimized model 14:11:03 ... comparing the optimized model with dynamic shape model a number of ops are eliminated and fused 14:11:15 ... the table provides a summary of this process 14:11:52 ... in this table you see op list and number before and after optimization 14:12:17 ... each model indicates how many ops there are, if 0 ops are totally eliminated 14:12:31 ... e.g. constant and shape could be 100% eliminated 14:13:13 ... in "op usage count of optimized models", zero means it is not used at all 14:13:24 q? 14:13:37 Present+ Ningxin_Hu 14:13:40 q+ 14:13:44 ack anssik 14:13:49 ningxin_hu has joined #webmachinelearning 14:14:35 Present+ Rachel_Yager 14:14:37 q+ 14:14:49 Present+ Vivek_Sekhar 14:14:53 ack Vivek 14:15:01 Present+ Ningxin_Hu 14:15:13 Vivek: thanks for this investigation, super useful, we've discussed this internally too 14:15:30 ... one of the earlier comments in the issue was re op comparison to TOSA and StableHLO 14:15:59 ... could we look into how we align with those two? 14:16:58 anssik: maybe Googlers could contribute that data if we make the spreadsheet public? 14:18:03 q? 14:18:21 q+ 14:18:21 q+ 14:18:27 q? 14:18:32 ack Deepti 14:19:45 Deepti: is there any docs on what do v1 and v2 mean in this context? 14:20:51 thanks 14:20:57 q? 14:21:00 ack ningxin_hu 14:21:16 anssik: just internal constructs for ourselves to reason about our work 14:21:40 ningxin_hu: have you mapped ONNX ops to what Jiawei proposed in the thread for transformer support? 14:21:50 ... three transformer models noted there, did you map to those? 14:22:24 Wanming: not yet, some of those ops are implemented in terms of other ops 14:22:46 ningxin_hu: this could be a nice next step, have breakdown first, then see how they map into those proposals from Jiawei 14:23:29 at bottom of https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1674224992 14:23:51 https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1674224992 14:23:58 ... for TOSA and StableHLO mapping, there is a table that has elements for mapping to these by Jiawei 14:24:03 at bottom of a very long comment 14:24:44 https://github.com/webmachinelearning/webnn/issues/375#:~:text=TOSA/StableHLO%20Mappings 14:25:11 q? 14:25:24 anssik: we could integrate that data from Dwayne to the table by Wanming 14:26:11 ... and make the table collaboratively editable 14:26:18 ... I propose those as our next steps, agreed? 14:26:29 [silence means agreement] 14:26:59 Topic: Security considerations 14:27:13 Subtopic: Computation control-flow attack based on weights / constants change 14:27:23 anssik: issue #443 14:27:29 https://github.com/webmachinelearning/webnn/issues/443 -> Issue 443 Add security consideration for computation control-flow attack based on weights / constants change (by huningxin) 14:27:39 ... a question raised whether a computation control-flow attack would be possible using weights and constants change 14:27:45 ... from the comments: 14:27:51 ... - WebNN currently only accepts static graphs 14:28:03 ... - WebNN does not support control flow operations (a difference from Wasm/WGSL) 14:28:19 ... implementation-specific concerns how memory region is shared between: 14:28:23 ... - the more privileged process calling the graph processing functions 14:28:31 ... - the untrusted renderer process exposing the WebNN API 14:28:51 ... suggests compromised renderer process is the concern, not abuse of the WebNN API via JS 14:29:05 ... a reasonable course of action would be to expand the Security Considerations: 14:29:14 -> Document operations susceptible to out-of-bounds access as a guidance to implementers https://www.w3.org/TR/webnn/#issue-9e2aaedc 14:29:43 ... Ningxin what is your latest thinking? 14:29:52 q? 14:30:14 ningxin_hu: Alex asked this question during the Chromium code review, linked from #443 14:30:56 ... this is a valid questions, related to implementation, any multi-process impl will used shared memory between privileged and unprivileged process for transferring weights 14:31:39 ... the argument is implementation of WebNN can allow compromising the renderer process with weights changed during computation, affecting control flow behaviour 14:32:29 q+ 14:32:31 ... the way how weights are shared between processes and what is the relationship between graph compilation and build, not defined in the WebNN API spec, is currently considered an implementation detail 14:32:46 ... normative language would help to define how to mitigate this 14:32:47 q? 14:32:51 ack RafaelCintron 14:33:25 RafaelCintron: adding to Ningxin, all CLs that go into Chromium and Mojo get Security team's reaview 14:33:37 ... s/reaview/review 14:35:37 ... in Chromium CL it was asked, can we reduce the number of copies, i.e. GPU operating on the same memory as the renderer process, what are the security implications of that? WebGPU also exploring this problem space. 14:35:47 anssik: how far WebGPU folks are on this exploration? 14:35:53 RafaelCintron: will check and report back 14:36:07 q? 14:36:51 anssik: are these CLs landed? 14:36:58 RafaelCintron: landed, the CLs make a copy so no problem 14:37:34 q+ 14:38:09 RafaelCintron: with larger models copying memory might become an issue 14:38:13 ack jsbell 14:38:32 jsbell: we shouldn't let iteration on implementation hold off from adding security considerations to the spec 14:38:52 ... you describe the problem and list mitigations as you identify them, e.g. sandboxing 14:39:55 q? 14:40:00 q+ 14:40:03 ack Deepti 14:40:34 Deepti: I understand mem copies in general are not ideal, when copying between these processes is there a measurable perf impact? 14:40:57 ... isolation between renderer and the process touching shared memory, is this just a performance issue? 14:41:01 q? 14:41:23 RafaelCintron: the main reason for getting rid of copies is performance 14:42:11 Deepti: how transferrable this is across ML apps, e.g. in Wasm to get rid of copies, we found other things web apps do removing copies did not improve performance that much 14:42:39 RafaelCintron: with enough ops and large models we can validate performance implications 14:42:40 q? 14:43:08 q? 14:43:15 RRSAgent, draft minutes 14:43:16 I have made the request to generate https://www.w3.org/2023/10/05-webmachinelearning-minutes.html anssik 14:43:28 Topic: New features 14:43:39 Subtopic: Allow checking whether operators/types are supported for a backend before creating a graph 14:43:46 anssik: issue #463 14:43:49 https://github.com/webmachinelearning/webnn/issues/463 -> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) 14:44:04 ... currently WebNN API throws errors when calling MLGraphBuilder.build() with unsupported operators in the graph 14:44:29 ... Rafael notes that at build time weights may have been already downloaded and thus failing at build time is too late to avoid downloading a large model if the backend does not support some operators 14:44:49 ... proposal to add APIs to help determine if build would fail given a set of ops BEFORE actually creating the graph 14:45:11 ... Jiewei proposed as a solution for weight loading step inspired by popular models e.g. Stable Diffusion 14:45:29 ... Jiewei also shared pseudo code for the API changes on how to match constant nodes with weight buffers 14:45:40 ... Ningxin modified it to use MLNamedArrayBufferViews 14:46:01 ... proposed as a v2 feature i.e. to be addressed together with our v2 ops work 14:46:01 ... thoughts? 14:46:01 q? 14:46:31 RafaelCintron: this came up in Chromium CL reviews 14:47:07 ... example, some op takes only certain data type, suboptimal for web developers 14:47:21 ... or can do int, but not float 14:47:28 ... two approaches to solve this: 14:47:55 ... - the number of nodes makes the graph big to download 14:48:16 ... - API similar to WebGPU that tells you what are supported by the implementation 14:49:49 s/download/download, a solution proposed by Jiawei was to build the graph separately as a second step 14:49:55 q? 14:50:06 q+ 14:50:10 q? 14:50:11 ack ningxin_hu 14:50:25 ningxin_hu: I have a question to the user of the framework? 14:50:46 ... can framework do this separation of topology from weights to two different files or resources to download? 14:51:17 ... Wanming, when you looked at transformer models do they combine these two resources together or are they served in separate files? 14:51:18 +1 to that - want feedback from frameworks that would use WebNN backends 14:51:28 Wanming: not sure, but I'll take a look later 14:51:32 q? 14:52:17 jsbell: this seems like an ergonomics issue, providing capabilities similar to WebGPU has fingerprinting concerns 14:52:30 q+ 14:52:32 ... for developers this is hidden by frameworks they're be interfacing with primarily 14:52:32 q? 14:52:37 ack Joshua_Lochner 14:53:23 Joshua_Lochner: from a Transformers.js perspective, .onnx file contains graph topology and weights 14:54:05 ... ONNX RT Node supports external data format, for ONNX RT Web has a feature request to separate the graph and weights into separate files 14:54:36 q? 14:55:51 RafaelCintron: I think more investigation required on which API shape is a better solution for this 14:56:10 https://github.com/microsoft/onnxruntime/issues/17151 14:56:37 and the PR to fix: https://github.com/microsoft/onnxruntime/pull/17155 14:57:38 q? 14:58:26 Topic: Enhancements 14:58:38 Subtopic: Type of parameters should match the input data type 14:58:42 anssik: issue: #442 14:58:44 https://github.com/webmachinelearning/webnn/issues/442 -> Issue 442 Type of some parameters should match the input data type (by Honry) 14:59:04 ... Wanming notes MLPadOptions.value, MLClampOptions.minValue, MLClampOptions.maxValue should use union type 14:59:07 ... also a few proposed v2 ops too 14:59:11 ... Dwayne seems to agree 14:59:21 ... I believe we could address this with a union type, needs to be checked if that's applicable for dictionary members 14:59:26 -> https://webidl.spec.whatwg.org/#idl-dictionaries 14:59:47 looking... :) 14:59:53 I'll add to me TODO list 15:00:04 (same for https://github.com/webmachinelearning/webnn/pull/464) 15:00:43 Wanming: no additional information to add at this time 15:00:45 q? 15:01:59 RRSAgent, draft minutes 15:02:00 I have made the request to generate https://www.w3.org/2023/10/05-webmachinelearning-minutes.html anssik 15:13:03 dom has joined #webmachinelearning 15:24:51 AramZS has joined #webmachinelearning 16:10:16 AramZS_ has joined #webmachinelearning 16:30:35 AramZS has joined #webmachinelearning 17:11:52 Zakim has left #webmachinelearning