I did a small update to our agenda an hour ago: removed issues #492 #625 that had been addressed (thanks!), removed #475 whose accompanying PR #650 received a draft label; I also added one quick naming-related issue #669
-> agenda diff https://github.com/webmachinelearning/meetings/commit/771dc249fcc03a2c33c7f33e4b1d14b1871545a7
Present+ Austin_Sullivan
Topic: New incubation of possible interest: Web Translation API
Joshua_Lochner has joined #webmachinelearning
anssik: I want to bring to this WG's attention an early proposal by the Chrome translate API team, Web Translation API, for performing real-time translations
Present+ Joshua_Lochner
-> Explainer for the Web Translation API https://github.com/explainers-by-googlers/translation-api
-> TAG review for the Web Translation API https://github.com/w3ctag/design-reviews/issues/948
anssik: to be clear, this API is not in scope of the WG currently but with adequate support and incubation this API has potential to become a new deliverable of this WG when we charter next time in 2025
... please let me know if you have any feedback, now or later
anssik: this API intersects with our recent discussion on Hybrid AI, quoting the explainer: "Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers." I believe a hybrid approach could also be feasible for this API
Present+ Phillis_Tang
Topic: NPU support
anssik: issue #623
anssik: given this is a new feature, I want to slot this topic in again to ensure all feedback is captured
... thanks Dwayne, Phillis, Zoltan for sharing your proposals and feedback in the the GH issue as well as all the fine folks working on the prototype
... I want to gauge consensus on on the MLDeviceType design specifically: Dwayne with contributions from Phillis' has put together a pros/cons summary of the 4 options for extending MLDeviceType to to document the WG's current thinking:
-> 4 options for extendingMLDeviceType https://github.com/webmachinelearning/webnn/issues/623#issuecomment-2063954107
... anssik: the 4 options proposed with (+pros -cons):
... 1) deviceType: "npu" (+++ -)
... 2) deviceType: "npu" fallbackDeviceType: "gpu" (+ --)
... 3) deviceTypes: ["npu", "gpu"] (+ --)
... 4) deviceType: "npu", excludesDeviceTypes: ["gpu"]
... is the WG ready to let the editors start a PR to formalize option 1 in the spec reflecting what is prototyped in Chromium?
Dwayne: not a lot of comments, the easier approach would be option 1
... and if no additional ops are proposed this could become de facto standard, it is aligned with the prototype currently
... any further implementation experience to be shared from running models on NPUs completely using the prototype is welcome at any time
Dwayne: working on a few models, MobileNet, SqueezeNet, ResNet, one transformer model
jsbell: +1 to Phillis' recommendation for option 1
... we may grow to option 3 in the future, but good to start with 1
+1 to start with option 1
Present+ Ningxin_Hu
anssik: editors can start formalize the NPU design starting with option 1
Topic: MLBuffer
anssik: issue #542 ... over the last few weeks Austin has investigated supporting MLBuffer on CoreML and shared his suggestions in the issue (thanks!) that discusses creating and representing MLBuffer on XPU devices
Austin investigation spurred an active discussion, thanks everyone!
... I'll hand the mic to Austin to explain what he has found out and to summarize the current state of things to the broader group
asully: a good summary in the table in https://github.com/webmachinelearning/webnn/issues/542#issuecomment-2067555410
asully: zero-copy on CoreML only with fp16, Mike to confirm
... also data type and shape must be known MLBuffer is a platform-specific tensor and you need to know the shape and data type
... to pass MLBuffers as input and output both have shape and data type
... some backends have issues with zero-copy, so argument is that MLBuffer is a tensor and data type and shape should to know up front
asully: thanks Mike for confirming CoreML support status TL;DR: if we don't know data type and shape ahead of time cannot do zero copy
Present+ Deepti_Gandluri
asully: Bryan from Intel has a prototype for MLBuffer, once Bryan's change in landed we can add the Mac prototype on top
... on some platforms you can allocate an opaque buffer but that is not the case on CoreML
asully: I looked at ONNX Runtime and CoreML backend while doing this prototyping
ningxin: ONNX Runtime IO Banding allocator interface only pass size when allocating device buffer
... if we don't change the interface we can allocate this information, pass shape and data type with help from the framework side, so probably want to pull in framework people
asully: if you want to implement these buffers on Mac need to make the same changes (to frameworks) as is proposed to WebNN
ningxin: did you file an ONNX RT issue?
asully: will do that
ningxin: another thing, working with Bryan we prototype dispatch pass, optimize Whisper decoder, KV cache
... we see good performance improvement from up to 50% running on GPU
... through the prototype found we may need another operator to update the KV cache from the previous iteration to the next one WebNN is static shaped design, so KV cache needs to be allocated as an MLBuffer
... with adequate size to accommodate that
... need another tensor to append from the previous to the next, probably want to propose scatternd operator, counterpart to gather, can share this obsercation in another issue to inform the group
Topic: Open issues and PRs
Subtopic: Debrief on PRs merged recently
anssik: JoshuaB has again submitted a bunch of PR that fixed a number of issues. ... issue #465 fixed by PR #648
... issue #645 fixed by PR #651
... issue (n/a) fixed by PR #655
... issue #466 fixed by PR #649
... issue #283 fixed by PR #646 (with follow-ups to be discussed later)
... issue #492 fixed by PR #656
... issue #660 fixed by PR #661
... issue #614 fixed by PR #665
... issue #625 fixed by PR #644
... issue (n/a) fixed by PR #659
jsbell: issue #283 and PR #646 added data type constraints, thanks for reviewers!
... a follow-up for table-driven design PR #657
... if folks want to look at it much appreciated
... definitely want feedback how does this work for implementers and people writing tests
Dwayne: many PRs are goodness, some have potential to break existing implementation, can we add a label to PRs that may break existing implementations?
jsbell: I can look into best practices on labeling PRs
asully: some mark things as editorial that do not make normative changes that impact implementations
Subtopic: [bug] ArgMax/Min selectLastIndex is not supported on CoreML
anssik: issue #652
... a proposal from Phillis to consider removing selectLastIndex parameter due to CoreML compatibility ... Phillis asked if CoreML "argmax is guaranteed to return smaller index?"
... Mike responded "is not guaranteed and may change"
... Dwayne proposes some options in the issue I'd like to discuss with the group
... and hear if someone has other solutions in mind?
Dwayne: currently not guaranteed this won't change, so can ask Apple to codify one behavior
... another possibility, there's multiple Apple APIs, we're focused on CoreML, not sure of technical considerations for choosing it over BNNS and MPS for example
... even if undocumented, could also add more ops to address this
... we could also keep the flag and then some backends would diverge, increases testing complexity
... last possibility to remove the flag entirely Phillis asked if CoreML "argmax is guaranteed to return smaller index?" 14:34:02 ... Mike responded "is not guaranteed and may change" 14:34:10 ... Dwayne proposes some options in the issue I'd like to discuss with the group 14:34:16 ... and hear if someone has other solutions in mind? 14:34:47 I'm speaking with CoreML on the first one 14:34:58 Dwayne: currently not guaranteed this won't change, so can ask Apple to codify one behavior 14:35:33 ... another possibility, there's multiple Apple APIs, we're focused on CoreML, not sure of technical considerations for choosing it over BNNS and MPS for example 14:35:34 BNNS and MPS will not run on the NPU 14:35:51 ... even if undocumented, could also add more ops to address this 14:36:17 ... we could also keep the flag and then some backends would diverge, increases testing complexity 14:36:32 ... last possibility to remove the flag entirely 14:36:58 ... I see thumbs up from Ningxin and Austin, do you want to comment?
Dwayne: for some history, I believe we thought argmax/min to return int32 before we tried some models, and found it complicated things we don't support int64 I added it because it was so widely used
... if we pass an output type for argmax would be amenable
Phillis: if we pass that we could probe it
Dwayne: another issue on data type validation
Phillis: I think we run into these interop issues and need to probe, can you comment on why int64 was important for ONNX models so that I have that reference
Dwayne: all indices are in 64 in tensors and passed as such
Dwayne: indices specifically, if you output argmax tensor to the next one the next op needs them
Phillip: gather needs this, right?
Dwayne: will do research after the meeting and provide information in GH issue
Subtopic: [bug] Consider dropping the support of uint64/int64 data type for some operators
anssik: issue #654
... informed by implementation experience, certain op + data type combinations can't be supported reliably on certain Windows + DirectML version combination Lisa provided a list (thanks!) of such ops in the issue
... Austin notes CoreML does not natively support 64-bit types
... suggests an emulation path is required on platforms that don't support these data types Joshua asked (an implementation-related) question on how to bundle DML with the browser distribution
jsbell: possible solution, ask the context if the data type is supported?
... could entertain that idea, it is OK for WebNN running on different machines can support different ops
... one path to explore
asully: looking at CoreML Neural Engine prefers fp16
... if this can be expressed somehow developers might use fp16 more for better performance
Dwayne: is there support for preference vs supports only?
Phillis: only supports
asully: CoreML will ballback to CPU internally, another layer of fallback is what UA decides to do
... the stance has been to push things lower, but the question is if you want to support int64 on CoreML you'd need to do that in userspace
Mike: echoing what Phillis said, running on ANE is fp16, can fallback to GPU or CPU
Subtopic: [question] Do we need an MLConstantOperand?
anssik: issue #668 I see thumbs up from Ningxin and Austin, do you want to comment? 14:44:24 q? 14:45:06 Dwayne: for some history, I believe we thought argmax/min to return int32 before we tried some models, and found it complicated things we don't support int64 14:45:18 ... I could look at my ~700 models are see if these fields are linked to non-constant tensors?
asully: hypothesis is these are not always marked as such, because frameworks don't have such a concept, if framework has no way to express this, upstreaming this concept to WebNN would make sense
Dwayne: if output is from another op then what we do?
asully: we need to discuss
... easier to relax restrictions and add them later on
... preference would be to try this and if something breaks we discuss that
... otherwise CoreML support requires a constant tensor to be passed Lisa provided a list (thanks!) of such ops in the issue 14:49:09 ... Austin notes CoreML does not natively support 64-bit types 14:49:28 ... suggests an emulation path is required on platforms that don't support these data types 14:49:40 ... Joshua asked (an implementation-related) question on how to bundle DML with the browser distribution 14:49:48 q? 14:49:51 q+ 14:49:55 ack jsbell 14:50:21 jsbell: possible solution, ask the context if the data type is supported? 14:50:42 ... could entertain that idea, it is OK for WebNN running on different machines can support different ops 14:50:53 ... one path to explore 14:52:01 asully: looking at CoreML Neural Engine prefers fp16 14:52:16 q? 14:52:17 ... if this can be expressed somehow developers might use fp16 more for better performance 14:52:30 Dwayne: is there support for preference vs supports only? 14:52:37 Phillis: only supports 14:52:45 q+ Mike3 14:53:04 asully: CoreML will ballback to CPU internally, another layer of fallback is what UA decides to do 14:53:29 ... the stance has been to push things lower, but the question is if you want to support int64 on CoreML you'd need to do that in userspace 14:53:35 q? 14:53:40 ack Mike3 14:54:08 Mike: echoing what Phillis said, running on ANE is fp16, can fallback to GPU or CPU 14:54:36 q? 14:54:41 I guess #463 was the generic issue re: checking for support 14:54:42 https://github.com/webmachinelearning/webnn/issues/463 -> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request] 14:55:13 q+ 14:55:43 https://github.com/webmachinelearning/webnn/issues/668 14:55:44 https://github.com/webmachinelearning/webnn/issues/668 -> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question] 14:56:05 q? 14:56:08 ack asully 14:56:22 Subtopic: [question] Do we need an MLConstantOperand? 14:56:29 anssik: issue #668 14:56:30 https://github.com/webmachinelearning/webnn/issues/668 -> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question] 14:56:54 asully: TL:DR: CoreML requires some tensors to be constant, a concept of a constant tensor is not something other frameworks such as TFLite have 14:57:25 ... the other frameworks do not have this concept, some parameters passed as tensors seems solely because they're big (think of them as weights), LSTM weights 14:57:35 ... they're used as constant and passed to this as a parameter 14:57:51 ... on CoreML these are required to be known at compile time 14:58:01 ... to support these ops on CoreML these tensors must be created as constant 14:58:21 ... in WebNN currently there's no distinction to know if this is constant of opaque operand 14:58:42 ... asking if you can create constant, can it give back a parameter 14:58:59 ... some make sense to pass as constants only, if not constants you can't plumb direct to CoreML 14:59:29 ... the questions is, no constant tensor concept in other frameworks, does it make sense to pass anything other than constant make sense, introduce constant operand instead? 15:00:25 q? 15:00:56 RRSAgent, draft minutes 15:00:57 I have made the request to generate https://www.w3.org/2024/05/02-webmachinelearning-minutes.html anssik 15:01:25 Dwayne: looking at this issue and fields in there, all are constant in all the models I've seen I believe? 15:01:43 ... I could look at my ~700 models are see if these fields are linked to non-constant tensors? 15:02:16 asully: hypothesis is these are not always marked as such, because frameworks don't have such a concept, if framework has no way to express this, upstreaming this concept to WebNN would make sense 15:02:28 Dwayne: if output is from another op then what we do? 15:02:35 asully: we need to discuss 15:02:43 ... easier to relax restrictions and add them later on 15:02:57 ... preference would be to try this and if something breaks we discuss that 15:03:13 ... otherwise CoreML support requires a constant tensor to be passed 15:03:14 q? 15:14:53 s/ on on/ on 15:15:16 s/ to to/ to 15:17:31 s/should to know/should be known 15:18:21 s/IO Banding/IO Binding 15:20:39 s/found we/we found we 15:21:37 s/bunch of PR/bunch of PRs 15:24:24 s/be helpful/label be helpful 15:26:25 s/ballback/fallback 15:28:30 s/constant make sense/constant 15:28:36 RRSAgent, draft minutes 15:28:38 I have made the request to generate https://www.w3.org/2024/05/02-webmachinelearning-minutes.html anssik 17:27:01 Zakim has left #webmachinelearning