WebML WG Teleconference – 11 May 2023

Meeting minutes

Repository: webmachinelearning/webnn

WebNN - WebIDL and Infra standard conventions

The constant() method steps

anssik: meta issue #210 and PR #365

<ghurlbot> Issue 210 Use modern WebIDL and Infra standard conventions (anssiko) enhancement, Editorial

<ghurlbot> Pull Request 365 Add the constant() method steps. (zolkis)

anssik: this PR now has the required approvals, ready to merge
… thanks Zoltan to interating on this PR carefully and Chai and Ningxin for review and comments!

Zoltan: it looks like mergeable, first PR that is a dependency for the rest

ningxin_hu: thanks Zoltan, do we want to squash the commits?

anssik: editors to decide whether squash the commit on merge

Sync and async algorithms

anssik: issue #316 and PR #329

<ghurlbot> Pull Request 329 Rework the sync async algorithms based on #323 (zolkis)

<ghurlbot> Issue 316 Review sync vs async compute differences (zolkis) Editorial

anssik: I proposed in the PR to remove the note the "During asynchronous execution ..." note
… because RFC 2119 terminology is not to be used in an informative note.
… Zoltan updated the PR accordingly
… it looks like that was the remaining open conversation on this PR?

Zoltan: yes, that was the only open one

anssik: ready to merge after Chai's final review

The builder.input() method steps

anssik: meta issue #210 and PR #364

<ghurlbot> Pull Request 364 Add the builder.input() method steps (zolkis)

anssik: it looks like this PR has the required approvals, ready to merge
… thanks again Zoltan for this PR and Chai and Ningxin for your review

Zoltan: this needs rebase before merge
… I will do the rebase
… editors are free to merge after that

Review the rest of the standard conventions PRs

anssik: my summary of the status of the remaining standard conventions PRs
… #337 MLOperand and MLActivation internal slots - 1 change requested by Chai

<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)

anssik: #348 clamp() algorithm - 1 change requested by Chai & addressed by Zoltan

<ghurlbot> Pull Request 348 Add the clamp() algorithm (zolkis)

anssik: #364 builder.input() method steps - 2 approvals, ready to merge
… #366 concat() algorithm - 1 change requested by Chai & addressed by Zoltan

<ghurlbot> Pull Request 366 Add the concat algorithm (zolkis)

anssik: #339 batchnorm() algorithm - re-review requested from Chai and Ningxin

<ghurlbot> Pull Request 339 Fix #334: Improve the batch norm algorithm (zolkis)

Zoltan: #337 is being taken apart by other PRs because this is the dependency PR, so I'll leave this as the last
… the rest can be merged after review

Zoltan: after merging these open PRs we do the stylistic PR, these introduce building blocks for the stylistic PR
… as soon reviews are cleared I make the stylistic change PR and we can merge Chai's outstanding PR #322

<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)

WebNN - enhancements, editorials, questions

float16 support

anssik: I wanted to discuss prototype findings to inform v2 feature work
… I invited Wanming to join us for this call, working with Zesong he added float16 support to ONNX Runtime Web

PR for ONNXRuntime float16 support

anssik: we discussed earlier that ECMA TC39 has a Float16Array proposal, see issue #373

<ghurlbot> Issue 373 heads up re: proposal to add Float16Array to JavaScript (bakkot) v2

Float16Array proposal (ECMA Stage 2 ~proposal)

anssik: we don't yet have Float16Array in the official JS standard so one workaround is to pass raw bits via Uint16Array typed array that is an ECMA standard

Uint16Array (ECMA Stage 4 ~ formal standard)

anssik: Wanming, thanks for your work on this, I see the PR has been merged
… did you yet get a chance to try float16 support with some of the models that benefit from float16? Any other insights from your implementation work?

Wanming: this implements translation of float16 to Uint16Array

<wm> 8% better than float32 model

ningxin_hu: thanks Wanming, this PR is not merged in an official ONNX Runtime repo yet, it is in Wanming's fork as a staging repo for experiment
… open issue in ONNX Runtime about float16 support, various options discussed, this workaround is one of them
… Wanming demonstrated this workaround works
… even if in a downstream fork
… for WebNN spec we have a mapping table where we note this is not yet natively supported but there's a workaround, this work demonstrates this is doable with a workaround
… we can share this prototype with the audience
… performance improvement is good, memory footprint is reduced in ~half
… Btw. I submitted a small PR #386 to add a provisional reference to the upcoming TC39 Float16Array type into MLOperandType and ArrayBufferView compatibility table

<ghurlbot> Pull Request 386 Add float16 to MLOperandType and ArrayBufferView compatibility table (anssiko)

anssik: thanks Wanming for this work!

ningxin_hu: related announcement ONNX Runtime Web support, WebNN Execution Provider was merged in an official repo this week, float16 is a downstream experiment

<ningxin_hu> microsoft/onnxruntime#15698

Subclass MLGraph based on the context that creates it

anssik: issue #344

<ghurlbot> Issue 344 Subclass MLGraph based on the context that creates it (huningxin) question

anssik: we deferred this from our previous call, I was asking whether this mainly an API ergonomics improvement?
… does the WG feel like pursuing this proposal further?

ningxin_hu: no update at this time

Error handling of MLNamedArrayBufferViews transfer algorithm

anssik: issue #351

<ghurlbot> Issue 351 Need to define error handling of MLNamedArrayBufferViews transfer algorithm (huningxin)

anssik: Ningxin has a very good description of this in the issue comment, not repeating it here
… the proposal is for WebNN spec to define how to handle this exception and what's the impact to the MLNamedArrayBufferViews
… this issue was identified as part of implementation review by Jiawei

ningxin_hu: we addressed the previous related issue and introduced this one
… main and worker thread accessing ArrayBufferView with input and output, transfer in a loop
… possible that an error happens when transferring a buffer in the middle
… in a detached state, this is what the Chromium impl does today, Jiawei feel this is not ideal so we brought this to the WG for input
… what is the ideal way to handle this error?
… should the implementation do something with the already detached ArrayBufferViews?

anssik: is this a regression or clarification?

ningxin_hu: before the previous PR there was no transfer defined, no such issue existed before

<ningxin_hu> webmachinelearning/webnn#318

ningxin_hu: this was introduced in PR #318

<ghurlbot> Issue 318 [closed] The input and output resources race condition issue of asynchronous execution (huningxin)

anssik: this was a TODO in the implementation?

ningxin_hu: I will need to check that

<ningxin_hu> TODO in the implementation: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/ml/webnn/ml_graph_xnnpack.cc;l=440

RafaelCintron: is the problem that when you detach an ArrayBuffer on the foreground thread it says sure, then in the worker when we open it we see problems and we don't know how to surface the problem to the main thread?

ningxin_hu: not exactly, there's a loop that iterates one ArrayBuffer at a time
… the issue happens when we iterate in the middle of the loop and find the ArrayBuffer is not detachable, Wasm heap memory or mapped to WebGPU buffer that cannot be detached
… this loop breaks out, report an error
… there are several early ABs detached and cannot be used in the main thread and no recovery mechanism to help the current implementation

RafaelCintron: we can loop through everything before detaching? to make sure everything can be detached?
… put a pointer to say "this cannot be detached" and only when we know everything can be detached we detach everything

ningxin_hu: validation step before is a good idea, that would be an atomic operation, thanks for mentioning that approach, we can put that in spec language
… we want to hear the input from implementers on this issue what makes for the best design
… two loops are atomic operations, want to confirm if that is possible and doable, probably we can address this issue that way

RafaelCintron: probably that is good, could also check with Domenic for feedback, see if there are other APIs that have a similar problem

ningxin_hu: I will summarize what RafaelCintron suggested and put that in the issue

Support depth_multiplier > 1 for a depthwise conv2d op

anssik: issue #353

<ghurlbot> Issue 353 Support `depth_multiplier > 1` for a depthwise conv2d op (Honry)

anssik: an issue from Wanming
… suggests a fix to a note in conv2d op

The conv2d() method

anssik: note now reads "A depthwise conv2d operation is a variant of grouped convolution, used in models like the MobileNet, where the options.groups = input_channels = output_channels"
… suggested fix: "options.groups = input_channels = output_channels / depth_multiplier"

Wanming: we found this issue when implementing TFLite WebNN delegate

ningxin_hu: this is probably a framework compatibility issue, Wanming works on TFLite delegate
… for WebNN we try to cover depthwise conv2d op, we can emulate other variants
… another aspect, in implementation of XNNPACK there's depthwise conv kernel
… we could have a note in the spec, this issue is motivated by frameworks

ningxin_hu: there's a real model that requires depth_multiplier, my open is we need a survey on implementability side, how to implement this with native ML APIs

Wanming: I can do this investigation

anssik: Thanks!

Clarify interpolation algorithm for resample2d

anssik: issue #358 and related issue #270

<ghurlbot> Issue 358 Please clarify interpolation algorithm for resample2d (BruceDai) enhancement

<ghurlbot> Issue 270 Support coordinate transformation modes for Resample2d (Honry)

https://www.w3.org/TR/webnn/#enumdef-mlinterpolationmode

anssik: resample2d() resamples the tensor values from the source to the destination spatial dimensions according to the scaling factors
… resample2d() supports two interpolation algorthms to fill the output tensor values:

enum MLInterpolationMode {

"nearest-neighbor",

"linear"

};

anssik: the issue asks for clarifications to these modes on how to properly implement them
… it appears there are various interpretations of these algos depending on the domain
… Dwayne gives an example for nearest neighbor sampling
… the banking industry uses "round halves up"
… in graphics "round to nearest with X.5 halves toward negative infinity"
… thoughts?

[ no comments at this time ]

anssik: input on GH welcome

Support for transformers

anssik: Continue discuss transformers and related requirements and gaps. Test our improved contribution guidelines with these new ops: explore key use cases, sample models, cross-framework support, cross-platform implementability.
… Status: The WG decided to start explore support for transformers in WebNN.
… Next step: Identify use cases. Contributions welcome via issue #375.

<ghurlbot> Issue 375 Mention transformer in use cases (dontcallmedom) v2

anssik: for example, is there interest to look at the gaps in Stable Diffusion?

RafaelCintron: I have no objection to explore this space of transformers, they are upcoming cool thing in the industry

ningxin_hu: Stable Diffusion would be good one
… Transformers.js is another good one it supports many HuggingFace models
… Segments anything and segmentation usages we can also check out

– DRAFT –
WebML WG Teleconference – 11 May 2023

11 May 2023

Attendees