Meeting minutes
Repository: webmachinelearning/webnn
anssik: please join me in welcoming our latest WG participant!
… - Yunchao He representing ByteDance
<jsbell> zkis: If you can visit chrome://
anssik: I did a small update to our agenda an hour ago: removed issues #492 #625 that had been addressed (thanks!), removed #475 whose accompanying PR #650 received a draft label; I also added one quick naming-related issue #669
<gb> Pull Request 650 Swap parameters of scalar constant() operand method (by inexorabletash)
<gb> Issue 669 Rename `MLOperandDescriptor`'s `dimensions` to `shape` (by a-sully) [feature request]
<gb> Issue 475 Remove `builder.constant(value, type)` variant (by huningxin) [operator specific]
agenda diff
… with that let's kick off today's meeting with a new incubation that might be of interest to this group to warm up the engine
New incubation of possible interest: Web Translation API
anssik: I want to bring to this WG's attention an early proposal by the Chrome translate API team, Web Translation API, for performing real-time translations
Explainer for the Web Translation API
TAG review for the Web Translation API
<gb> Issue 948 Web Translation API (by domenic) [Progress: untriaged]
anssik: to be clear, this API is not in scope of the WG currently but with adequate support and incubation this API has potential to become a new deliverable of this WG when we charter next time in 2025
… please let me know if you have any feedback, now or later
anssik: this API intersects with our recent discussion on Hybrid AI, quoting the explainer: "Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers."
… I believe a hybrid approach could also be feasible for this API
NPU support
anssik: issue #623
<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]
anssik: given this is a new feature, I want to slot this topic in again to ensure all feedback is captured
… thanks Dwayne, Phillis, Zoltan for sharing your proposals and feedback in the the GH issue as well as all the fine folks working on the prototype
… I want to gauge consensus on the MLDeviceType design specifically:
… Dwayne with contributions from Phillis' has put together a pros/cons summary of the 4 options for extending MLDeviceType to document the WG's current thinking:
4 options for extendingMLDeviceType
… anssik: the 4 options proposed with (+pros -cons):
… 1) deviceType: "npu" (+++ -)
… 2) deviceType: "npu" fallbackDeviceType: "gpu" (+ --)
… 3) deviceTypes: ["npu", "gpu"] (+ --)
… 4) deviceType: "npu", excludesDeviceTypes: ["gpu"]
… is the WG ready to let the editors start a PR to formalize option 1 in the spec reflecting what is prototyped in Chromium?
<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]
Dwayne: not a lot of comments, the easier approach would be option 1
… and if no additional ops are proposed this could become de facto standard, it is aligned with the prototype currently
… anssik: implementation experience to be shared from running models on NPUs completely using the prototype is welcome at any time
Dwayne: working on a few models, MobileNet, SqueezeNet, ResNet, one transformer model
jsbell: +1 to Phillis' recommendation for option 1
… we may grow to option 3 in the future, but good to start with 1
<Mike3> start with 1, move to 3 as needed seems reasonable
<ningxin> +1 to start with option 1
anssik: editors can start formalize the NPU design starting with option 1
MLBuffer
anssik: issue #542
<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]
anssik: over the last few weeks Austin has investigated supporting MLBuffer on CoreML and shared his suggestions in the issue (thanks!) that discusses creating and representing MLBuffer on XPU devices
… Austin investigation spurred an active discussion, thanks everyone!
… I'll hand the mic to Austin to explain what he has found out and to summarize the current state of things to the broader group
asully: a good summary in the table in webmachinelearning/
<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]
asully: zero-copy on CoreML only with fp16, Mike to confirm
… also data type and shape must be known
<Mike3> yes that's right
asully: MLBuffer is a platform-specific tensor and you need to know the shape and data type
… to pass MLBuffers as input and output both have shape and data type
… some backends have issues with zero-copy, so argument is that MLBuffer is a tensor and data type and shape should be known up front
asully: thanks Mike for confirming CoreML support status
… TL;DR: if we don't know data type and shape ahead of time cannot do zero copy
asully: Bryan from Intel has a prototype for MLBuffer, once Bryan's change in landed we can add the Mac prototype on top
… on some platforms you can allocate an opaque buffer but that is not the case on CoreML
<ningxin> +1
asully: I looked at ONNX Runtime and CoreML backend while doing this prototyping
ningxin: ONNX Runtime IO Binding allocator interface only pass size when allocating device buffer
… if we don't change the interface we can allocate this information, pass shape and data type with help from the framework side, so probably want to pull in framework people
asully: if you want to implement these buffers on Mac need to make the same changes (to frameworks) as is proposed to WebNN
ningxin: did you file an ONNX RT issue?
asully: will do that
ningxin: another thing, working with Bryan we prototype dispatch pass, optimize Whisper decoder, KV cache
… we see good performance improvement from up to 50% running on GPU
… through the prototype we found we may need another operator to update the KV cache from the previous iteration to the next one
… WebNN is static shaped design, so KV cache needs to be allocated as an MLBuffer
… with adequate size to accommodate that
… need another tensor to append from the previous to the next, probably want to propose scatternd operator, counterpart to gather, can share this observation in another issue to inform the group
Open issues and PRs
Debrief on PRs merged recently
anssik: JoshuaB has again submitted a bunch of PRs that fixed a number of issues. Thanks again!
… issue #465 fixed by PR #648
<gb> CLOSED Issue 465 `keepDimensions` of reduction ops (by lisa0314) [operator specific]
anssik: issue #645 fixed by PR #651
<gb> MERGED Pull Request 651 Remove softplus() steepness option (by a-sully)
anssik: issue (n/a) fixed by PR #655
anssik: issue #466 fixed by PR #649
<gb> CLOSED Issue 466 Softmax axis absent (by fdwr) [operator specific]
<gb> MERGED Pull Request 649 Add axis argument to softmax() (by inexorabletash)
anssik: issue #283 fixed by PR #646 (with follow-ups to be discussed later)
<gb> MERGED Pull Request 646 Specify the operand data type constraints of operations (by inexorabletash)
<gb> CLOSED Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]
anssik: issue #492 fixed by PR #656
<gb> MERGED Pull Request 656 Remove sequential filling overload of constant() (by a-sully)
anssik: issue #660 fixed by PR #661
<gb> MERGED Pull Request 661 Bugfix: prelu should validate slope operand (by huningxin)
anssik: issue #614 fixed by PR #665
<gb> MERGED Pull Request 665 Add note about no-op graphs (by inexorabletash)
<gb> CLOSED Issue 614 Allow no-op graphs? (by inexorabletash) [question]
anssik: issue #625 fixed by PR #644
<gb> MERGED Pull Request 644 Validate the hidden size of GRU and LSTM operators (by inexorabletash)
anssik: issue (n/a) fixed by PR #659
jsbell: issue #283 and PR #646 added data type constraints, thanks for reviewers!
<jsbell> #657
<gb> Pull Request 657 Make operand data type and rank validation table-driven (by inexorabletash)
jsbell: a follow-up for table-driven design PR #657
… if folks want to look at it much appreciated
… definitely want feedback how does this work for implementers and people writing tests
Dwayne: many PRs are goodness, some have potential to break existing implementation, can we add a label to PRs that may break existing implementations?
jsbell: I can look into best practices on labeling PRs
asully: some mark things as editorial that do not make normative changes that impact implementations
[bug] ArgMax/Min selectLastIndex is not supported on CoreML
anssik: issue #652
<gb> Issue 652 ArgMax/Min `selectLastIndex` is not supported on CoreML (by philloooo) [bug] [operator specific]
anssik: a proposal from Phillis to consider removing selectLastIndex parameter due to CoreML compatibility
… Phillis asked if CoreML "argmax is guaranteed to return smaller index?"
… Mike responded "is not guaranteed and may change"
… Dwayne proposes some options in the issue I'd like to discuss with the group
… and hear if someone has other solutions in mind?
<Mike3> I'm speaking with CoreML on the first one
Dwayne: currently not guaranteed this won't change, so can ask Apple to codify one behavior
… another possibility, there's multiple Apple APIs, we're focused on CoreML, not sure of technical considerations for choosing it over BNNS and MPS for example
<Mike3> BNNS and MPS will not run on the NPU
Dwayne: even if undocumented, could also add more ops to address this
… we could also keep the flag and then some backends would diverge, increases testing complexity
… last possibility to remove the flag entirely
… PT and TF decide ties in different way, ONNX has a similar flag
Mike: only issue with BNNS and MPS is that they don't run on NPU
… working with engineers on what is possible and will get back to you on the issue
Phillis: option 1 or option 4 are the practical options
… removing is the simplest way to go
Dwayne: the main reason for the flag is TF and PT can can supported because they have different defaults
… not sure of any models where this matters
<jsbell> op-specific: 654 653 652 629 487 481 444 396 392 383 324 180
<jsbell> broader: 668 658 623 590 489 456 391
jsbell: tangentially, there's a cluster like this one related to interop, is there interest to introduce a [interop] label for these?
… op-specific: #654 #653 #652 #629 #487 #481 #444 #396 #392 #383 #324 #180
… broader: #668 #658 #623 #590 #489 #456 #391
<gb> Issue 652 ArgMax/Min `selectLastIndex` is not supported on CoreML (by philloooo) [bug] [operator specific]
<gb> Issue 654 Consider dropping the support of uint64/int64 data type for some operators (by lisa0314) [bug] [operator specific]
<gb> Issue 653 Consider changing output type of ArgMax/Argmin to int32, or allow passing output_type (by philloooo) [bug] [operator specific]
<gb> Issue 444 Zero dilations needs spec clarification (by fdwr) [operator specific]
<gb> Issue 392 `split` into sizes are not widely supported (by huningxin) [operator specific]
<gb> Issue 383 Need to restrict the value of alpha to be positive for elu operation (by lisa0314) [operator specific]
<gb> Issue 481 Should `scale` and `bias` be required inputs for `batchNormalization` op? (by huningxin) [operator specific]
<gb> Issue 629 argMax/Min only support scalar axis in TFLite runtime (by fujunwei) [operator specific]
<gb> Issue 396 Clarify the restriction for `minValue` and `maxValue` of `MLClampOptions` (by huningxin) [operator specific]
<gb> Issue 456 Define the maximum number of operand dimensions (maximum rank) (by huningxin)
<gb> Issue 180 Should dilated pooling be supported (by fujunwei) [operator specific]
<gb> Issue 391 Allow 0 size dimensions (by huningxin) [feature request]
<gb> Issue 487 Should `axes` be required parameter for layerNormalization build method? (by huningxin) [operator specific]
<gb> Issue 590 Consider adopting new broadcasting rules (by a-sully) [question]
<gb> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (by huningxin) [feature request] [operator specific]
anssik: would this [interop] label be helpful for the group, thoughts?
<gb> Issue 489 Clarify the casting behavior from floating-point / signed integers <-> unsigned integers (by huningxin)
<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]
<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]
<gb> Issue 658 Consider removing `MLActivation` parameters used for op fusion (by a-sully) [question]
[bug] Consider changing output type of ArgMax/Argmin to int32, or allow passing output_type
anssik: issue #653
… Phillis reports:
… Currently WebNN specifies ArgMax/Min returns int64.
… - CoreML supported return type is int32 or uint16.
… - tflite supports int32, int64.
… - dml supports int32, int64, uint32, uint64.
… Returning int64 can't be emulated on CoreML (with details in the issue)
… Phillis proposes two possible options:
… - a. Update argMin/Max to always output int32
… - b. On the spec level, allow passing a param of output_type. And allow probing the supported data types for this param.
… I see thumbs up from Ningxin and Austin, do you want to comment?
Dwayne: for some history, I believe we thought argmax/min to return int32 before we tried some models, and found it complicated things we don't support int64
… I added it because it was so widely used
… if we pass an output type for argmax would be amenable
Phillis: if we pass that we could probe it
Dwayne: another issue on data type validation
Phillis: I think we run into these interop issues and need to probe, can you comment on why int64 was important for ONNX models so that I have that reference
Dwayne: all indices are in 64 in tensors and passed as such
Dwayne: indices specifically, if you output argmax tensor to the next one the next op needs them
Phillip: gather needs this, right?
Dwayne: will do research after the meeting and provide information in GH issue
[bug] Consider dropping the support of uint64/int64 data type for some operators
anssik: issue #654
<gb> Issue 654 Consider dropping the support of uint64/int64 data type for some operators (by lisa0314) [bug] [operator specific]
anssik: informed by implementation experience, certain op + data type combinations can't be supported reliably on certain Windows + DirectML version combination
… Lisa provided a list (thanks!) of such ops in the issue
… Austin notes CoreML does not natively support 64-bit types
… suggests an emulation path is required on platforms that don't support these data types
… Joshua asked (an implementation-related) question on how to bundle DML with the browser distribution
jsbell: possible solution, ask the context if the data type is supported?
… could entertain that idea, it is OK for WebNN running on different machines can support different ops
… one path to explore
asully: looking at CoreML Neural Engine prefers fp16
… if this can be expressed somehow developers might use fp16 more for better performance
Dwayne: is there support for preference vs supports only?
Phillis: only supports
asully: CoreML will fallback to CPU internally, another layer of fallback is what UA decides to do
… the stance has been to push things lower, but the question is if you want to support int64 on CoreML you'd need to do that in userspace
Mike: echoing what Phillis said, running on ANE is fp16, can fallback to GPU or CPU
<jsbell> I guess #463 was the generic issue re: checking for support
<gb> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request]
<asully> webmachinelearning/
<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]
[question] Do we need an MLConstantOperand?
anssik: issue #668
<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]
asully: TL:DR: CoreML requires some tensors to be constant, a concept of a constant tensor is not something other frameworks such as TFLite have
… the other frameworks do not have this concept, some parameters passed as tensors seems solely because they're big (think of them as weights), LSTM weights
… they're used as constant and passed to this as a parameter
… on CoreML these are required to be known at compile time
… to support these ops on CoreML these tensors must be created as constant
… in WebNN currently there's no distinction to know if this is constant of opaque operand
… asking if you can create constant, can it give back a parameter
… some make sense to pass as constants only, if not constants you can't plumb direct to CoreML
… the questions is, no constant tensor concept in other frameworks, does it make sense to pass anything other than constant, introduce constant operand instead?
Dwayne: looking at this issue and fields in there, all are constant in all the models I've seen I believe?
… I could look at my ~700 models are see if these fields are linked to non-constant tensors?
asully: hypothesis is these are not always marked as such, because frameworks don't have such a concept, if framework has no way to express this, upstreaming this concept to WebNN would make sense
Dwayne: if output is from another op then what we do?
asully: we need to discuss
… easier to relax restrictions and add them later on
… preference would be to try this and if something breaks we discuss that
… otherwise CoreML support requires a constant tensor to be passed