WebML WG Teleconference – 2 May 2024

Meeting minutes

Repository: webmachinelearning/webnn

anssik: please join me in welcoming our latest WG participant!
… - Yunchao He representing ByteDance

<jsbell> zkis: If you can visit chrome://crashes/ and email me any crash IDs I'll follow up

anssik: I did a small update to our agenda an hour ago: removed issues #492 #625 that had been addressed (thanks!), removed #475 whose accompanying PR #650 received a draft label; I also added one quick naming-related issue #669

<gb> Pull Request 650 Swap parameters of scalar constant() operand method (by inexorabletash)

<gb> Issue 669 Rename `MLOperandDescriptor`'s `dimensions` to `shape` (by a-sully) [feature request]

<gb> CLOSED Issue 625 Need clarify the maximum limit of the hidden size of the Gru operator (by miaobin) [operator specific]

<gb> Issue 475 Remove `builder.constant(value, type)` variant (by huningxin) [operator specific]

<gb> CLOSED Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

agenda diff
… with that let's kick off today's meeting with a new incubation that might be of interest to this group to warm up the engine

New incubation of possible interest: Web Translation API

anssik: I want to bring to this WG's attention an early proposal by the Chrome translate API team, Web Translation API, for performing real-time translations

Explainer for the Web Translation API

TAG review for the Web Translation API

<gb> Issue 948 Web Translation API (by domenic) [Progress: untriaged]

anssik: to be clear, this API is not in scope of the WG currently but with adequate support and incubation this API has potential to become a new deliverable of this WG when we charter next time in 2025
… please let me know if you have any feedback, now or later

anssik: this API intersects with our recent discussion on Hybrid AI, quoting the explainer: "Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers."
… I believe a hybrid approach could also be feasible for this API

NPU support

anssik: issue #623

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

anssik: given this is a new feature, I want to slot this topic in again to ensure all feedback is captured
… thanks Dwayne, Phillis, Zoltan for sharing your proposals and feedback in the the GH issue as well as all the fine folks working on the prototype
… I want to gauge consensus on the MLDeviceType design specifically:
… Dwayne with contributions from Phillis' has put together a pros/cons summary of the 4 options for extending MLDeviceType to document the WG's current thinking:

4 options for extendingMLDeviceType
… anssik: the 4 options proposed with (+pros -cons):
… 1) deviceType: "npu" (+++ -)
… 2) deviceType: "npu" fallbackDeviceType: "gpu" (+ --)
… 3) deviceTypes: ["npu", "gpu"] (+ --)
… 4) deviceType: "npu", excludesDeviceTypes: ["gpu"]
… is the WG ready to let the editors start a PR to formalize option 1 in the spec reflecting what is prototyped in Chromium?

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

Dwayne: not a lot of comments, the easier approach would be option 1
… and if no additional ops are proposed this could become de facto standard, it is aligned with the prototype currently
… anssik: implementation experience to be shared from running models on NPUs completely using the prototype is welcome at any time

Dwayne: working on a few models, MobileNet, SqueezeNet, ResNet, one transformer model

jsbell: +1 to Phillis' recommendation for option 1
… we may grow to option 3 in the future, but good to start with 1

<Mike3> start with 1, move to 3 as needed seems reasonable

<ningxin> +1 to start with option 1

anssik: editors can start formalize the NPU design starting with option 1

MLBuffer

anssik: issue #542

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

anssik: over the last few weeks Austin has investigated supporting MLBuffer on CoreML and shared his suggestions in the issue (thanks!) that discusses creating and representing MLBuffer on XPU devices
… Austin investigation spurred an active discussion, thanks everyone!
… I'll hand the mic to Austin to explain what he has found out and to summarize the current state of things to the broader group

asully: a good summary in the table in webmachinelearning/webnn#542 (comment)

<gb> Issue 542 [MLBuffer] Creation and representing MLBuffer on a XPU devices (by bbernhar) [webgpu interop]

asully: zero-copy on CoreML only with fp16, Mike to confirm
… also data type and shape must be known

<Mike3> yes that's right

asully: MLBuffer is a platform-specific tensor and you need to know the shape and data type
… to pass MLBuffers as input and output both have shape and data type
… some backends have issues with zero-copy, so argument is that MLBuffer is a tensor and data type and shape should be known up front

asully: thanks Mike for confirming CoreML support status
… TL;DR: if we don't know data type and shape ahead of time cannot do zero copy

asully: Bryan from Intel has a prototype for MLBuffer, once Bryan's change in landed we can add the Mac prototype on top
… on some platforms you can allocate an opaque buffer but that is not the case on CoreML

<ningxin> +1

asully: I looked at ONNX Runtime and CoreML backend while doing this prototyping

ningxin: ONNX Runtime IO Binding allocator interface only pass size when allocating device buffer
… if we don't change the interface we can allocate this information, pass shape and data type with help from the framework side, so probably want to pull in framework people

asully: if you want to implement these buffers on Mac need to make the same changes (to frameworks) as is proposed to WebNN

ningxin: did you file an ONNX RT issue?

asully: will do that

ningxin: another thing, working with Bryan we prototype dispatch pass, optimize Whisper decoder, KV cache
… we see good performance improvement from up to 50% running on GPU
… through the prototype we found we may need another operator to update the KV cache from the previous iteration to the next one
… WebNN is static shaped design, so KV cache needs to be allocated as an MLBuffer
… with adequate size to accommodate that
… need another tensor to append from the previous to the next, probably want to propose scatternd operator, counterpart to gather, can share this observation in another issue to inform the group

Open issues and PRs

Debrief on PRs merged recently

anssik: JoshuaB has again submitted a bunch of PRs that fixed a number of issues. Thanks again!
… issue #465 fixed by PR #648

<gb> MERGED Pull Request 648 Editorial: Add note about keepDimensions of reduction ops (by inexorabletash)

<gb> CLOSED Issue 465 `keepDimensions` of reduction ops (by lisa0314) [operator specific]

anssik: issue #645 fixed by PR #651

<gb> MERGED Pull Request 651 Remove softplus() steepness option (by a-sully)

<gb> CLOSED Issue 645 Consider removing `steepness` parameter of softplus (by shiyi9801) [operator specific]

anssik: issue (n/a) fixed by PR #655

<gb> MERGED Pull Request 655 gru/lstm: Fix missing/incorrect variables and options use (by inexorabletash)

anssik: issue #466 fixed by PR #649

<gb> CLOSED Issue 466 Softmax axis absent (by fdwr) [operator specific]

<gb> MERGED Pull Request 649 Add axis argument to softmax() (by inexorabletash)

anssik: issue #283 fixed by PR #646 (with follow-ups to be discussed later)

<gb> MERGED Pull Request 646 Specify the operand data type constraints of operations (by inexorabletash)

<gb> CLOSED Issue 283 Specify the operand data type constraints of operation (by huningxin) [question]

anssik: issue #492 fixed by PR #656

<gb> MERGED Pull Request 656 Remove sequential filling overload of constant() (by a-sully)

<gb> CLOSED Issue 492 Constant sequential filling operation needs output shape parameter (by fdwr) [feature request] [operator specific]

anssik: issue #660 fixed by PR #661

<gb> MERGED Pull Request 661 Bugfix: prelu should validate slope operand (by huningxin)

<gb> CLOSED Issue 660 prelu steps should validate the data type of slope is equal to the data type of input (by huningxin) [bug]

anssik: issue #614 fixed by PR #665

<gb> MERGED Pull Request 665 Add note about no-op graphs (by inexorabletash)

<gb> CLOSED Issue 614 Allow no-op graphs? (by inexorabletash) [question]

anssik: issue #625 fixed by PR #644

<gb> MERGED Pull Request 644 Validate the hidden size of GRU and LSTM operators (by inexorabletash)

<gb> CLOSED Issue 625 Need clarify the maximum limit of the hidden size of the Gru operator (by miaobin) [operator specific]

anssik: issue (n/a) fixed by PR #659

<gb> MERGED Pull Request 659 Simplify, correct, and add validation for GRU/LSTM and friends (by inexorabletash)

jsbell: issue #283 and PR #646 added data type constraints, thanks for reviewers!

<jsbell> #657

<gb> Pull Request 657 Make operand data type and rank validation table-driven (by inexorabletash)

jsbell: a follow-up for table-driven design PR #657
… if folks want to look at it much appreciated
… definitely want feedback how does this work for implementers and people writing tests

Dwayne: many PRs are goodness, some have potential to break existing implementation, can we add a label to PRs that may break existing implementations?

jsbell: I can look into best practices on labeling PRs

asully: some mark things as editorial that do not make normative changes that impact implementations

[bug] ArgMax/Min selectLastIndex is not supported on CoreML

anssik: issue #652

<gb> Issue 652 ArgMax/Min `selectLastIndex` is not supported on CoreML (by philloooo) [bug] [operator specific]

anssik: a proposal from Phillis to consider removing selectLastIndex parameter due to CoreML compatibility
… Phillis asked if CoreML "argmax is guaranteed to return smaller index?"
… Mike responded "is not guaranteed and may change"
… Dwayne proposes some options in the issue I'd like to discuss with the group
… and hear if someone has other solutions in mind?

<Mike3> I'm speaking with CoreML on the first one

Dwayne: currently not guaranteed this won't change, so can ask Apple to codify one behavior
… another possibility, there's multiple Apple APIs, we're focused on CoreML, not sure of technical considerations for choosing it over BNNS and MPS for example

<Mike3> BNNS and MPS will not run on the NPU

Dwayne: even if undocumented, could also add more ops to address this
… we could also keep the flag and then some backends would diverge, increases testing complexity
… last possibility to remove the flag entirely
… PT and TF decide ties in different way, ONNX has a similar flag

Mike: only issue with BNNS and MPS is that they don't run on NPU
… working with engineers on what is possible and will get back to you on the issue

Phillis: option 1 or option 4 are the practical options
… removing is the simplest way to go

Dwayne: the main reason for the flag is TF and PT can can supported because they have different defaults
… not sure of any models where this matters

<jsbell> op-specific: 654 653 652 629 487 481 444 396 392 383 324 180

<jsbell> broader: 668 658 623 590 489 456 391

jsbell: tangentially, there's a cluster like this one related to interop, is there interest to introduce a [interop] label for these?
… op-specific: #654 #653 #652 #629 #487 #481 #444 #396 #392 #383 #324 #180
… broader: #668 #658 #623 #590 #489 #456 #391

<gb> Issue 652 ArgMax/Min `selectLastIndex` is not supported on CoreML (by philloooo) [bug] [operator specific]

<gb> Issue 654 Consider dropping the support of uint64/int64 data type for some operators (by lisa0314) [bug] [operator specific]

<gb> Issue 653 Consider changing output type of ArgMax/Argmin to int32, or allow passing output_type (by philloooo) [bug] [operator specific]

<gb> Issue 444 Zero dilations needs spec clarification (by fdwr) [operator specific]

<gb> Issue 392 `split` into sizes are not widely supported (by huningxin) [operator specific]

<gb> Issue 383 Need to restrict the value of alpha to be positive for elu operation (by lisa0314) [operator specific]

<gb> Issue 481 Should `scale` and `bias` be required inputs for `batchNormalization` op? (by huningxin) [operator specific]

<gb> Issue 629 argMax/Min only support scalar axis in TFLite runtime (by fujunwei) [operator specific]

<gb> Issue 396 Clarify the restriction for `minValue` and `maxValue` of `MLClampOptions` (by huningxin) [operator specific]

<gb> Issue 456 Define the maximum number of operand dimensions (maximum rank) (by huningxin)

<gb> Issue 180 Should dilated pooling be supported (by fujunwei) [operator specific]

<gb> Issue 391 Allow 0 size dimensions (by huningxin) [feature request]

<gb> Issue 487 Should `axes` be required parameter for layerNormalization build method? (by huningxin) [operator specific]

<gb> Issue 590 Consider adopting new broadcasting rules (by a-sully) [question]

<gb> Issue 324 Simplify the operand layout support of conv2d and pooling 2d operations (by huningxin) [feature request] [operator specific]

anssik: would this [interop] label be helpful for the group, thoughts?

<gb> Issue 489 Clarify the casting behavior from floating-point / signed integers <-> unsigned integers (by huningxin)

<gb> Issue 623 WebNN should support NPU and QDQ operations (by wchao1115) [v2] [opset] [feature request]

<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]

<gb> Issue 658 Consider removing `MLActivation` parameters used for op fusion (by a-sully) [question]

[bug] Consider changing output type of ArgMax/Argmin to int32, or allow passing output_type

anssik: issue #653
… Phillis reports:
… Currently WebNN specifies ArgMax/Min returns int64.
… - CoreML supported return type is int32 or uint16.
… - tflite supports int32, int64.
… - dml supports int32, int64, uint32, uint64.
… Returning int64 can't be emulated on CoreML (with details in the issue)
… Phillis proposes two possible options:
… - a. Update argMin/Max to always output int32
… - b. On the spec level, allow passing a param of output_type. And allow probing the supported data types for this param.
… I see thumbs up from Ningxin and Austin, do you want to comment?

Dwayne: for some history, I believe we thought argmax/min to return int32 before we tried some models, and found it complicated things we don't support int64
… I added it because it was so widely used
… if we pass an output type for argmax would be amenable

Phillis: if we pass that we could probe it

Dwayne: another issue on data type validation

Phillis: I think we run into these interop issues and need to probe, can you comment on why int64 was important for ONNX models so that I have that reference

Dwayne: all indices are in 64 in tensors and passed as such

Dwayne: indices specifically, if you output argmax tensor to the next one the next op needs them

Phillip: gather needs this, right?

Dwayne: will do research after the meeting and provide information in GH issue

[bug] Consider dropping the support of uint64/int64 data type for some operators

anssik: issue #654

<gb> Issue 654 Consider dropping the support of uint64/int64 data type for some operators (by lisa0314) [bug] [operator specific]

anssik: informed by implementation experience, certain op + data type combinations can't be supported reliably on certain Windows + DirectML version combination
… Lisa provided a list (thanks!) of such ops in the issue
… Austin notes CoreML does not natively support 64-bit types
… suggests an emulation path is required on platforms that don't support these data types
… Joshua asked (an implementation-related) question on how to bundle DML with the browser distribution

jsbell: possible solution, ask the context if the data type is supported?
… could entertain that idea, it is OK for WebNN running on different machines can support different ops
… one path to explore

asully: looking at CoreML Neural Engine prefers fp16
… if this can be expressed somehow developers might use fp16 more for better performance

Dwayne: is there support for preference vs supports only?

Phillis: only supports

asully: CoreML will fallback to CPU internally, another layer of fallback is what UA decides to do
… the stance has been to push things lower, but the question is if you want to support int64 on CoreML you'd need to do that in userspace

Mike: echoing what Phillis said, running on ANE is fp16, can fallback to GPU or CPU

<jsbell> I guess #463 was the generic issue re: checking for support

<gb> Issue 463 Allow checking whether operators/types are supported for a backend before creating a graph (by huningxin) [feature request]

<asully> webmachinelearning/webnn#668

<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]

[question] Do we need an MLConstantOperand?

anssik: issue #668

<gb> Issue 668 Do we need an `MLConstantOperand`? (by a-sully) [question]

asully: TL:DR: CoreML requires some tensors to be constant, a concept of a constant tensor is not something other frameworks such as TFLite have
… the other frameworks do not have this concept, some parameters passed as tensors seems solely because they're big (think of them as weights), LSTM weights
… they're used as constant and passed to this as a parameter
… on CoreML these are required to be known at compile time
… to support these ops on CoreML these tensors must be created as constant
… in WebNN currently there's no distinction to know if this is constant of opaque operand
… asking if you can create constant, can it give back a parameter
… some make sense to pass as constants only, if not constants you can't plumb direct to CoreML
… the questions is, no constant tensor concept in other frameworks, does it make sense to pass anything other than constant, introduce constant operand instead?

Dwayne: looking at this issue and fields in there, all are constant in all the models I've seen I believe?
… I could look at my ~700 models are see if these fields are linked to non-constant tensors?

asully: hypothesis is these are not always marked as such, because frameworks don't have such a concept, if framework has no way to express this, upstreaming this concept to WebNN would make sense

Dwayne: if output is from another op then what we do?

asully: we need to discuss
… easier to relax restrictions and add them later on
… preference would be to try this and if something breaks we discuss that
… otherwise CoreML support requires a constant tensor to be passed

– DRAFT –
WebML WG Teleconference – 2 May 2024

02 May 2024

Attendees

Meeting minutes

New incubation of possible interest: Web Translation API

NPU support

MLBuffer

Open issues and PRs

Debrief on PRs merged recently

[bug] ArgMax/Min selectLastIndex is not supported on CoreML

[bug] Consider changing output type of ArgMax/Argmin to int32, or allow passing output_type

[bug] Consider dropping the support of uint64/int64 data type for some operators

[question] Do we need an MLConstantOperand?

Diagnostics