13:52:16 RRSAgent has joined #webmachinelearning 13:52:20 logging to https://www.w3.org/2024/04/04-webmachinelearning-irc 13:52:20 RRSAgent, make logs Public 13:52:21 please title this meeting ("meeting: ..."), anssik 13:52:23 Meeting: WebML WG Teleconference – 4 April 2024 13:52:28 Chair: Anssi 13:52:34 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2024-04-04-wg-agenda.md 13:52:39 Scribe: Anssi 13:52:49 scribeNick: anssik 13:53:06 gb, this is webmachinelearning/webnn 13:53:06 anssik, OK. 13:53:14 Present+ Anssi_Kostiainen 13:53:21 Regrets+ Reilly_Grant 13:53:27 RRSAgent, draft minutes 13:53:28 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 13:57:11 AramZS has joined #webmachinelearning 13:57:31 Present+ Michael_McCool 13:58:13 jsbell has joined #webmachinelearning 13:59:09 Present+ Joshua_Bell 13:59:25 McCool has joined #webmachinelearning 14:00:45 Ningxin_Hu has joined #webmachinelearning 14:01:01 Present+ Dwayne_Robinson 14:01:06 Present+ Ningxin_Hu 14:01:11 Present+ Bryan_Bernhart 14:01:19 Present+ Joshua_Lochner 14:01:28 phillis has joined #webmachinelearning 14:01:31 Present+ Phillis_Tang 14:01:39 Present+ Rafael_Cintron 14:01:40 geoff has joined #webmachinelearning 14:01:48 Present+ Ningxin_Hu 14:01:52 Present+ Geoff_Gustafson 14:02:10 Joshua_Lochner has joined #webmachinelearning 14:02:13 Present+ Ilya_Rezvov 14:02:22 RRSAgent, draft minutes 14:02:24 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 14:02:31 RafaelCintron has joined #webmachinelearning 14:02:39 Present+ Zoltan_Kis 14:02:40 cbacharakis has joined #webmachinelearning 14:02:40 zkis has joined #webmachinelearning 14:02:40 Present+ Austin_Sullivan 14:02:52 present+ Zoltan_Kis 14:02:57 Present+ Christo_Bacharakis 14:03:38 asully has joined #webmachinelearning 14:03:54 RRSAgent, draft minutes 14:03:55 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 14:04:03 anssik: please join me in welcoming our new WG participants: 14:04:14 ... - Vasil Dedejski from Netcetera 14:04:21 ... - Juan Deng from Alibaba Group 14:04:35 ... - also Michael McCool from Intel joins in an official WG participant capacity 14:05:01 Topic: NPU support discussion 14:05:07 anssik: issue #623 14:05:08 https://github.com/webmachinelearning/webnn/issues/623 -> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request] 14:05:46 anssik: The goal of this discussion is to revisit the problem space for NPU support for WebNN, review the high-level proposal for the key elements of the design, gather feedback and signals from the group regarding interest, scope, priority. 14:06:05 ... NPU device type and support for quantized models have been explored in the group prior and have been awaiting implementation experience. 14:06:14 ... earlier discussion in issues #128 and #302 14:06:15 https://github.com/webmachinelearning/webnn/issues/128 -> Issue 128 WebNN should support int8 quantized models (by wchao1115) [v2] [opset] [feature request] 14:06:15 https://github.com/webmachinelearning/webnn/issues/302 -> Issue 302 API simplification: context types, context options, createContext() (by zolkis) [v2] 14:06:29 ... thanks to Chai for the proposal and Yajing for comments 14:07:04 Dwayne: NPUs are probably familiar to most, not necessarily faster than GPUs but more efficient 14:07:19 ... in the spec we have GPU and CPU and power preferences, data types are also in the spec 14:07:32 q+ 14:07:42 ... we're missing NPU device type and bare minimum operations, quantization and dequantization 14:07:50 (when Dwayne is done) 14:08:07 ... linear 8-bit is typical, also more aggressive ones even 1-bit exists, we think should start with the most common ones 14:08:33 ... device type and quantization ops are the key elements, moving forward need to enumerate the options what the options would look at 14:09:07 ... bikeshedding areas how to express this in the context, CPU, GPU, NPU, how to express that? a primary device type and a fallback device type? a priority order? 14:09:07 q? 14:09:29 q? 14:09:35 ack McCool 14:09:58 Michael: I wanted to point our quantization is also important for CPU and GPU, due to memory bandwidth issues 14:10:13 ... quantization representation should be perhaps separated from the target device 14:10:25 ... we're looking at impacts on caching, if a model is quantized or not 14:10:26 q? 14:11:27 Phillis: I provided feedback in the comments, the current proposal is to have add NPU and fallbacks, it assumes NPU has a smaller set of ops, but CPU and GPU have full coverage 14:11:48 ... based on implementation experience from TFLite and CoreML backend, op set coverage is smaller compared to CPU 14:12:01 ... so if we want to signal fallback, need to signal it for other device types too 14:12:19 ... CoreML is opaque in terms on in which compute unit the workload executes on 14:12:52 ... sometimes the workload is executed on CPU if the tensor is small enough, a blackbox style design 14:12:54 q? 14:13:04 q? 14:13:35 q+ 14:13:41 Dwayne: I know Mingming submitted a patch to Chromium for NPU device type to experiment that 14:13:48 ... we don't have a concept of a fallback yet 14:13:49 q? 14:13:52 ack zkis 14:14:18 Zoltan: Phillis, can we say CPU is always the fallback device? Should we separate fallback from the hints? 14:14:20 jsbell has joined #webmachinelearning 14:14:39 Phillis: I think hints are more accurate, match the underlying behavior better, a soft signal to the underlying backend it prefers NPU 14:14:50 jsbell has joined #webmachinelearning 14:15:21 Zoltan: CPU as a fallback device? can we spec it as such it always works 14:15:44 Dwayne: we could spec this as a preference, and get a signal back to the developer if it doesn't work 14:15:50 q? 14:16:31 Dwayne: CPU as a fallback has a challenge needs to do graph partitioning 14:16:47 Phillis: CoreML itself does graph partitioning 14:17:00 Dwayne: similarly for DML too in the future, happens transparently 14:17:10 Phillis: no need to worry about subgraphs partitioning then 14:17:11 q? 14:17:49 q+ 14:17:56 ack Ningxin_Hu 14:18:08 Ningxin_Hu: I can speak for Mingming and the implementation 14:18:12 -> Chromium implementation https://chromium-review.googlesource.com/c/chromium/src/+/5330647 14:18:27 Ningxin_Hu: we've added NPU device type for DML path to testing 14:18:52 ... these is no fallback in the current implementation, we are collaborating with the DML team and Intel driver team for fallback experiment 14:19:27 ... to introduce the NPU device type, we can run a small set of models, using that as a starting point 14:19:47 ... later with fallback to CPU and GPU 14:19:48 q? 14:19:58 q? 14:20:15 Topic: Hybrid AI exploration update 14:20:32 anssik: I've asked the project team to give a brief update on the proof-of-concept informed by the group's feedback. Thanks everyone for your insightful feedback shared to date! 14:20:37 -> https://github.com/webmachinelearning/proposals/issues/5 14:20:38 https://github.com/webmachinelearning/proposals/issues/5 -> Issue 5 Hybrid AI Exploration (by grgustaf) 14:20:43 Michael: presenting a slide with a summary of feedback received from the group 14:21:09 ... Models/Use Cases:​ 14:21:17 ... 1) MMS – ASR​ 14:21:31 ... 2) SeamlessM4T - general speech tasks 14:21:42 ... 3) Others? LLMs? e.g. mistral-7b 14:21:49 ... Comments 14:22:06 ... Two forms of hybrid AI 14:22:13 ... - Server or Client 14:22:17 ... - Split models 14:22:42 ... Meaning of "hybrid"? Can also mean "heterogenous hardware" 14:22:53 ... Pain points, priority 14:23:02 ... Generally saving space/download latency 14:23:15 ... 1. Sharing/reusing large models across sites 14:23:24 ... 2. Same resources at different URLs 14:23:33 ... 3. Same resources in different formats 14:23:54 ... 4. Want to explose "built-in" models 14:24:29 ... 5. Generalizing models 14:24:39 ... 6. Need solution that can handle adapters 14:24:41 q? 14:25:22 q+ 14:25:26 ack jsbell 14:25:39 jsbell: wanted to ask if your'e engaged with privacy groups? 14:25:44 Michale 14:25:49 s/Michale// 14:26:16 Michael: have thought about shared caches and privacy considerations, ways to mitigate that 14:27:35 jsbell: wanted to make sure the privacy implementations are considered, if a small number of sites are using a specific model then that 1-bit of information is significant 14:28:34 anssik: privacy considerations are important, proposing those are documented along with the proposal 14:28:34 q? 14:29:06 Topic: Open issues and PRs 14:29:15 Subtopic: Debrief on PRs merged recently 14:29:27 anssik: This topic is for the editors and PR authors to debrief the group on substantive PRs that got merged in the last few weeks, answer question from the group. 14:29:32 -> Recently merged PRs https://github.com/webmachinelearning/webnn/pulls?q=is%3Apr+is%3Amerged 14:29:36 ... PRs merged by issue type since last meeting: 14:29:42 ... conventions: #618 #621 #627 14:29:42 https://github.com/webmachinelearning/webnn/pull/618 -> MERGED Pull Request 618 Conventions: Add and apply a few more spec coding conventions (by inexorabletash) [conventions] 14:29:43 https://github.com/webmachinelearning/webnn/pull/627 -> MERGED Pull Request 627 Conventions: Use "rank" for variable names, when appropriate (by inexorabletash) [conventions] 14:29:43 https://github.com/webmachinelearning/webnn/pull/621 -> MERGED Pull Request 621 Conventions: Ensure all dict members have definitions (by inexorabletash) 14:29:48 ... bug fix: #616 #619 #620 14:29:48 https://github.com/webmachinelearning/webnn/pull/619 -> MERGED Pull Request 619 Bugfix: Drop "re-throw" in MLActivation creation steps (by inexorabletash) 14:29:48 https://github.com/webmachinelearning/webnn/pull/616 -> MERGED Pull Request 616 Bugfix: Unbalanced autolink brackets (by inexorabletash) 14:29:50 https://github.com/webmachinelearning/webnn/pull/620 -> MERGED Pull Request 620 Bug fix: Drop "... have been checked..." notes (by inexorabletash) 14:29:53 ... question: #617 #632 14:29:54 https://github.com/webmachinelearning/webnn/pull/632 -> MERGED Pull Request 632 add a note for empty input (by philloooo) 14:29:57 https://github.com/webmachinelearning/webnn/pull/617 -> MERGED Pull Request 617 Update NSNet2 reference (by anssiko) 14:30:22 anssik: anything the editors or PR authors feel important to highlight for the broader group who may not follow day-to-day GH activity? 14:30:34 q+ 14:30:39 ack jsbell 14:31:21 ningxin has joined #webmachinelearning 14:31:24 jsbell: not specific to any of these PRs, FYI, I'm working on a local tool written in Node.js to enforce conventions, will share it with the group when better baked in, qSA against the DOM and grepping against the source 14:31:41 Dwayne: in CI or local? 14:31:46 jsbell: to be determined 14:32:10 anssik: much thanks Josh for your work on this tool 14:34:07 Subtopic: [bug] Need clarify scale factor for resample2d 14:34:23 anssik: issue #610 is about overflow handling convention preferences 14:34:25 https://github.com/webmachinelearning/webnn/issues/610 -> Issue 610 Need clarify scale factor for resample2d (by BruceDai) [bug] 14:34:28 ... Josh reports part 1 of this issue was addressed by https://github.com/webmachinelearning/webnn/commit/24edf7f775ccb5105b3503449c714e2b7af563e6 14:34:34 ... part 2 is not yet addressed in the spec IIUC: 14:34:38 ... "clarify the limitations for the product of scale factor and spatial dimension's size" 14:34:59 ... it looks like the validation steps of Resample2d were fixed in the implementation 14:35:11 ... are the remaining spec changes clear? anything to discuss? 14:35:20 q? 14:36:06 jsbell: part 2, is around validation of clamping that may be required 14:37:00 Dwayne: would be a lot of noise if we'd add that to all ops, can we centrally spec that? 14:37:02 q? 14:37:22 +1 to handle overflow centrally 14:37:45 q? 14:37:47 jsbell has joined #webmachinelearning 14:38:06 q? 14:38:20 Subtopic: [bug] Synchronously validate input operands/activations 14:38:23 anssik: issue #572 14:38:24 https://github.com/webmachinelearning/webnn/issues/572 -> Issue 572 Synchronously validate input operands/activations (by inexorabletash) [bug] [question] 14:38:32 anssik: merged PRs #591 and #605 addressed parts of this issue 14:38:32 https://github.com/webmachinelearning/webnn/pull/591 -> MERGED Pull Request 591 Content: Define operand concept, simplify graph connection steps (by inexorabletash) 14:38:34 https://github.com/webmachinelearning/webnn/pull/605 -> MERGED Pull Request 605 Synchronously validate input operands/validations (by inexorabletash) 14:38:39 ... Josh identified the following as remaining work: 14:38:46 - Standard phrasing for "be an operator for ..." 14:38:46 - Introducing "Validate arguments" section for each method 14:38:46 - Introducing "Calculate output shape" section for each method (maybe "output descriptor" is better?) 14:39:03 jsbell: just any feedback on those is welcome 14:39:21 ... first is asking how we should phrase that, easy PR for someone who want to say "let's do it this way!" 14:39:47 ... others have come up in this issue, feedback wanted on those too, don't want to add too many PRs that just add text 14:39:48 q? 14:40:15 Subtopic: [question] Allow no-op graphs? 14:40:20 anssik: issue #614 14:40:21 https://github.com/webmachinelearning/webnn/issues/614 -> Issue 614 Allow no-op graphs? (by inexorabletash) [question] 14:40:49 ... Josh reports in PR #603 a step was added to build() to match the Chromium implementation, which errors out if an input operand or a constant operands is specified as an output. 14:40:50 https://github.com/webmachinelearning/webnn/pull/603 -> MERGED Pull Request 603 Content: Define build() steps more rigorously (by inexorabletash) 14:41:11 ... Dwayne notes this would nullify the possibility of a nop graph but also noted that a caller can always insert a dummy identity node to satisfy this constraint 14:41:21 ... Ningxin notes constant-only graph seems useful, especially for GPU or NPU device 14:41:27 ... also good discussion for "constant MLBuffer" 14:41:41 ... proposed by Bryan as "after builder.constant(mlBuffer), it becomes read-only (ex. no writeBuffer() or dispatch() allowed)." 14:42:13 q+ 14:42:52 Dwayne: I don't think this is a big deal 14:42:54 ack ningxin 14:43:26 ningxin: I'd like to make it clear that my comment is about constant-only graph as the use case, a model with two decoder graphs 14:43:55 ... later in discussion with Austin and Bryan my use case is satisfied by "constant MLBuffer", so wanted to clarify this 14:44:20 ... another comment, I want to understand if there's a use case for a no-op graph? 14:44:21 q? 14:44:58 Dwayne: chaining models together and get flexibility along that chain, if there's a concept that just takes input and passes that along, but dummy identity would still satisfy that 14:45:09 ningxin: WebNN is supposed to be a backend API, so not sure if that in scope 14:45:10 q? 14:45:27 Dwayne: no reservations to resolve this with an empty graph 14:45:29 q? 14:46:27 ningxin: because Bryan mentioned this was added to TODO we can track this as part of MLBuffer proposal 14:46:30 q? 14:46:49 Subtopic: [question] Graph with no input 14:46:54 anssik: issue #615 and PR #632 14:46:54 https://github.com/webmachinelearning/webnn/pull/632 -> MERGED Pull Request 632 add a note for empty input (by philloooo) 14:46:55 https://github.com/webmachinelearning/webnn/issues/615 -> CLOSED Issue 615 Graph with no input (by philloooo) [question] 14:46:59 ... this one was fixed, thanks! 14:47:58 Phillis: I added a note that says WebNN does support that, if the backend does not support this, implementations can work around this. 14:48:03 q? 14:48:19 Subtopic: [question] Can an MLGraphBuilder be reused? 14:48:27 anssik: issue #567 14:48:28 https://github.com/webmachinelearning/webnn/issues/567 -> Issue 567 Can an MLGraphBuilder be reused? (by reillyeon) [question] 14:48:38 ... Reilly asks "Are there any known use cases for MLGraphBuilder reuse? 14:48:43 ... Ningxin mentions one use case: 14:48:49 ... "Whisper, because WebNN doesn't support If operator, there will be two WebNN sub-graphs being built. One with past Key Value (KV) cache ("with_past") and another one without past KV cache ("no_past"). The inference code will run the "no_past" sub-graph for the first iteration and run the "with_past" sub-graph for the following iterations. The two sub-graphs actually share some common weights. It would be useful if the same 14:48:49 weights being built by builder.constant() can be taken by the operators of two sub-graphs." 14:48:59 jsbell has joined #webmachinelearning 14:49:03 q+ 14:50:05 ningxin: this is related to the previous topic, a merged model in ONNX terms, has two subgraphs with If operator, because WebNN does not support If op, in the backend the if op will fallback to Wasm, one with passed KV cache, and another w/o passed KV cache 14:50:53 ... after the first iteration a value can be reused in KV cache, later on cache is reused until the sequence length is reached 14:51:03 ... used in transformer models, including Whisper 14:51:46 ... for ONNX RT Web, for subgraphs, weights are shared, we create a constant for weights in the same GraphBuilder and reuse them, in this use case we can reused MLGraphBuilder 14:52:26 ... later if we talk about MLBuffer, if we can make MLBuffer hold the weights, persist in device memory, that'd be even better, two graphs can use the constants with MLBuffer w/o any duplication of data uploading or memory copy 14:52:37 ... also avoid duplication of copies of device memory 14:52:50 ... I think the two things for constant and MLGraphBuilder for reuse are related 14:53:06 q? 14:53:11 ack jsbell 14:53:22 From Reilly Grant: I'm satisfied that there are good use cases for constructing multiple MLGraphs from an MLGraphBuilder but we need example code and implementation experience before we can decide specifically how it will work. In particular I'm still concerned by the constraints it puts on implementations if build() can be called an arbitrary 14:53:22 number of times and so I'd like the group to consider a multibuild() variant which allows compiling multiple graphs simultaneously which is likely to be more efficient for implementations while hopefully giving the same power to developers. 14:53:53 q? 14:54:12 q? 14:54:15 +1 to explore more 14:54:47 Subtopic: [question] Consider adopting new broadcasting rules 14:54:52 anssik: issue #590 14:54:52 https://github.com/webmachinelearning/webnn/issues/590 -> Issue 590 Consider adopting new broadcasting rules (by a-sully) [question] 14:54:57 ... discussed on our 21 March telcon 14:55:02 ... to recap, Austin sees three options: 14:55:07 ... Option 2: Adopt XLA's broadcasting rules 14:55:14 ... Option 1: Adopt NumPy's broadcasting rules 14:55:18 ... Option 3: Keep the status quo 14:55:23 ... Dwayne offered one more option: 14:55:30 ... Option 4: Dwayne's proposal 14:55:37 ... - keep unidirectional for rare cases (expand and GEMM) 14:55:41 ... - research more backends to potentially add restrictions that inputs must have the same rank. 14:56:07 q? 14:56:17 anssik: I'm hearing we let this issue sit 14:56:30 q+ 14:56:30 Subtopic: [question] Is "validate graph resources" backwards? 14:56:35 anssik: issue #602 and PR #622 (thanks Josh!) 14:56:35 https://github.com/webmachinelearning/webnn/pull/622 -> Pull Request 622 Bug fix: Make "validate graph resources" test reflexive (by inexorabletash) 14:56:35 https://github.com/webmachinelearning/webnn/issues/602 -> Issue 602 Is "validate graph resources" backwards? (by inexorabletash) [question] 14:57:03 jsbell: put up the PR for the reflexive test 14:57:26 ... came up with a new behaviour after the telcon, reflected in the PR 14:57:40 ... validation of input and output async, for better developer ergonomics 14:57:55 s/async/asymmetric/ 14:58:11 Subtopic: [question] Need clarify the usage of axes=[0,1] for resample2d 14:58:17 anssik: issue #624 14:58:17 https://github.com/webmachinelearning/webnn/issues/624 -> Issue 624 Need clarify the usage of axes=[0,1] for resample2d (by BruceDai) [question] [operator specific] 14:58:22 ... this implementation experience informed question has details in the issue 14:58:58 Dwayne: came in core review, resample supports a number of different axes 14:59:33 ... anyone knows the history of this design? 14:59:38 q? 14:59:44 q- 15:00:02 q? 15:00:21 Subtopic: [feature-request] Gaussian error linear unit (GELU) activation 15:00:25 anssik: issue #626 15:00:26 https://github.com/webmachinelearning/webnn/issues/626 -> Issue 626 Need Gelu operation (by mingmingtasd) [feature request] [operator specific] 15:00:31 ... a proposed new op passes our initial tests: 15:00:36 ... sample models: Whisper base, Stable Diffusion U-Net, Segment Everything decoder 15:00:43 ... cross-framework support: ONNX, TF, PyTorch 15:00:50 ... cross-platform implementability: CoreML, DirectML, OpenVINO 15:00:55 ... this seems like a reasonable addition, any comments? 15:00:59 RRSAgent, draft minutes 15:01:00 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 15:01:07 ningxin has joined #webmachinelearning 15:01:07 q? 15:01:18 q+ 15:01:24 RafaelCintron has joined #webmachinelearning 15:01:24 Seems reasonable 15:01:24 ack ningxin 15:01:40 ningxin: Mingming mentioned this gives perf gain if supported 15:01:59 ... in the experimental implementation on NPU we observed perf benefit if we have Gelu over emulation path 15:02:06 McCool has joined #webmachinelearning 15:02:21 ... with this activation we can fuse with matmul for even better performance 15:02:22 q? 15:02:33 q? 15:02:53 ... PR #628 is out already 15:02:56 https://github.com/webmachinelearning/webnn/pull/628 -> Pull Request 628 Define Gelu operation (by mingmingtasd) 15:03:23 q? 15:04:09 RRSAgent, draft minutes 15:04:10 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 15:16:24 s/core review/code review 15:17:06 RRSAgent, draft minutes 15:17:07 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 15:17:26 Regrets+ Dominique_Hazael-Massieux 15:17:57 RRSAgent, draft minutes 15:17:59 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 15:27:10 s/explose/expose 15:28:06 s/privacy implementations/privacy in implementations 15:28:25 RRSAgent, draft minutes 15:28:26 I have made the request to generate https://www.w3.org/2024/04/04-webmachinelearning-minutes.html anssik 16:38:12 <\join_subline> \join_subline has joined #webmachinelearning 17:25:23 Zakim has left #webmachinelearning