Meeting minutes
Repository: webmachinelearning/webnn
WebNN - WebIDL and Infra standard conventions
The constant() method steps
anssik: meta issue #210 and PR #365
<ghurlbot> Issue 210 Use modern WebIDL and Infra standard conventions (anssiko) enhancement, Editorial
<ghurlbot> Pull Request 365 Add the constant() method steps. (zolkis)
anssik: this PR now has the required approvals, ready to merge
… thanks Zoltan to interating on this PR carefully and Chai and Ningxin for review and comments!
Zoltan: it looks like mergeable, first PR that is a dependency for the rest
ningxin_hu: thanks Zoltan, do we want to squash the commits?
anssik: editors to decide whether squash the commit on merge
Sync and async algorithms
anssik: issue #316 and PR #329
<ghurlbot> Pull Request 329 Rework the sync async algorithms based on #323 (zolkis)
<ghurlbot> Issue 316 Review sync vs async compute differences (zolkis) Editorial
anssik: I proposed in the PR to remove the note the "During asynchronous execution ..." note
… because RFC 2119 terminology is not to be used in an informative note.
… Zoltan updated the PR accordingly
… it looks like that was the remaining open conversation on this PR?
Zoltan: yes, that was the only open one
anssik: ready to merge after Chai's final review
The builder.input() method steps
anssik: meta issue #210 and PR #364
<ghurlbot> Pull Request 364 Add the builder.input() method steps (zolkis)
anssik: it looks like this PR has the required approvals, ready to merge
… thanks again Zoltan for this PR and Chai and Ningxin for your review
Zoltan: this needs rebase before merge
… I will do the rebase
… editors are free to merge after that
Review the rest of the standard conventions PRs
anssik: my summary of the status of the remaining standard conventions PRs
… #337 MLOperand and MLActivation internal slots - 1 change requested by Chai
<ghurlbot> Pull Request 337 Add internal slots to MLOperand and MLActivation (zolkis)
anssik: #348 clamp() algorithm - 1 change requested by Chai & addressed by Zoltan
<ghurlbot> Pull Request 348 Add the clamp() algorithm (zolkis)
anssik: #364 builder.input() method steps - 2 approvals, ready to merge
… #366 concat() algorithm - 1 change requested by Chai & addressed by Zoltan
<ghurlbot> Pull Request 366 Add the concat algorithm (zolkis)
anssik: #339 batchnorm() algorithm - re-review requested from Chai and Ningxin
<ghurlbot> Pull Request 339 Fix #334: Improve the batch norm algorithm (zolkis)
Zoltan: #337 is being taken apart by other PRs because this is the dependency PR, so I'll leave this as the last
… the rest can be merged after review
Zoltan: after merging these open PRs we do the stylistic PR, these introduce building blocks for the stylistic PR
… as soon reviews are cleared I make the stylistic change PR and we can merge Chai's outstanding PR #322
<ghurlbot> Pull Request 322 Simplify MLContext creation (wchao1115)
WebNN - enhancements, editorials, questions
float16 support
anssik: I wanted to discuss prototype findings to inform v2 feature work
… I invited Wanming to join us for this call, working with Zesong he added float16 support to ONNX Runtime Web
PR for ONNXRuntime float16 support
anssik: we discussed earlier that ECMA TC39 has a Float16Array proposal, see issue #373
<ghurlbot> Issue 373 heads up re: proposal to add Float16Array to JavaScript (bakkot) v2
Float16Array proposal (ECMA Stage 2 ~proposal)
anssik: we don't yet have Float16Array in the official JS standard so one workaround is to pass raw bits via Uint16Array typed array that is an ECMA standard
Uint16Array (ECMA Stage 4 ~ formal standard)
anssik: Wanming, thanks for your work on this, I see the PR has been merged
… did you yet get a chance to try float16 support with some of the models that benefit from float16? Any other insights from your implementation work?
Wanming: this implements translation of float16 to Uint16Array
<wm> 8% better than float32 model
ningxin_hu: thanks Wanming, this PR is not merged in an official ONNX Runtime repo yet, it is in Wanming's fork as a staging repo for experiment
… open issue in ONNX Runtime about float16 support, various options discussed, this workaround is one of them
… Wanming demonstrated this workaround works
… even if in a downstream fork
… for WebNN spec we have a mapping table where we note this is not yet natively supported but there's a workaround, this work demonstrates this is doable with a workaround
… we can share this prototype with the audience
… performance improvement is good, memory footprint is reduced in ~half
… Btw. I submitted a small PR #386 to add a provisional reference to the upcoming TC39 Float16Array type into MLOperandType and ArrayBufferView compatibility table
<ghurlbot> Pull Request 386 Add float16 to MLOperandType and ArrayBufferView compatibility table (anssiko)
anssik: thanks Wanming for this work!
ningxin_hu: related announcement ONNX Runtime Web support, WebNN Execution Provider was merged in an official repo this week, float16 is a downstream experiment
<ningxin_hu> microsoft/
Subclass MLGraph based on the context that creates it
anssik: issue #344
<ghurlbot> Issue 344 Subclass MLGraph based on the context that creates it (huningxin) question
anssik: we deferred this from our previous call, I was asking whether this mainly an API ergonomics improvement?
… does the WG feel like pursuing this proposal further?
ningxin_hu: no update at this time
Error handling of MLNamedArrayBufferViews transfer algorithm
anssik: issue #351
<ghurlbot> Issue 351 Need to define error handling of MLNamedArrayBufferViews transfer algorithm (huningxin)
anssik: Ningxin has a very good description of this in the issue comment, not repeating it here
… the proposal is for WebNN spec to define how to handle this exception and what's the impact to the MLNamedArrayBufferViews
… this issue was identified as part of implementation review by Jiawei
ningxin_hu: we addressed the previous related issue and introduced this one
… main and worker thread accessing ArrayBufferView with input and output, transfer in a loop
… possible that an error happens when transferring a buffer in the middle
… in a detached state, this is what the Chromium impl does today, Jiawei feel this is not ideal so we brought this to the WG for input
… what is the ideal way to handle this error?
… should the implementation do something with the already detached ArrayBufferViews?
anssik: is this a regression or clarification?
ningxin_hu: before the previous PR there was no transfer defined, no such issue existed before
<ningxin_hu> webmachinelearning/
ningxin_hu: this was introduced in PR #318
<ghurlbot> Issue 318 [closed] The input and output resources race condition issue of asynchronous execution (huningxin)
anssik: this was a TODO in the implementation?
ningxin_hu: I will need to check that
<ningxin_hu> TODO in the implementation: https://
RafaelCintron: is the problem that when you detach an ArrayBuffer on the foreground thread it says sure, then in the worker when we open it we see problems and we don't know how to surface the problem to the main thread?
ningxin_hu: not exactly, there's a loop that iterates one ArrayBuffer at a time
… the issue happens when we iterate in the middle of the loop and find the ArrayBuffer is not detachable, Wasm heap memory or mapped to WebGPU buffer that cannot be detached
… this loop breaks out, report an error
… there are several early ABs detached and cannot be used in the main thread and no recovery mechanism to help the current implementation
RafaelCintron: we can loop through everything before detaching? to make sure everything can be detached?
… put a pointer to say "this cannot be detached" and only when we know everything can be detached we detach everything
ningxin_hu: validation step before is a good idea, that would be an atomic operation, thanks for mentioning that approach, we can put that in spec language
… we want to hear the input from implementers on this issue what makes for the best design
… two loops are atomic operations, want to confirm if that is possible and doable, probably we can address this issue that way
RafaelCintron: probably that is good, could also check with Domenic for feedback, see if there are other APIs that have a similar problem
ningxin_hu: I will summarize what RafaelCintron suggested and put that in the issue
Support depth_multiplier > 1 for a depthwise conv2d op
anssik: issue #353
<ghurlbot> Issue 353 Support `depth_multiplier > 1` for a depthwise conv2d op (Honry)
anssik: an issue from Wanming
… suggests a fix to a note in conv2d op
anssik: note now reads "A depthwise conv2d operation is a variant of grouped convolution, used in models like the MobileNet, where the options.groups = input_channels = output_channels"
… suggested fix: "options.groups = input_channels = output_channels / depth_multiplier"
Wanming: we found this issue when implementing TFLite WebNN delegate
ningxin_hu: this is probably a framework compatibility issue, Wanming works on TFLite delegate
… for WebNN we try to cover depthwise conv2d op, we can emulate other variants
… another aspect, in implementation of XNNPACK there's depthwise conv kernel
… we could have a note in the spec, this issue is motivated by frameworks
ningxin_hu: there's a real model that requires depth_multiplier, my open is we need a survey on implementability side, how to implement this with native ML APIs
Wanming: I can do this investigation
anssik: Thanks!
Clarify interpolation algorithm for resample2d
anssik: issue #358 and related issue #270
<ghurlbot> Issue 358 Please clarify interpolation algorithm for resample2d (BruceDai) enhancement
<ghurlbot> Issue 270 Support coordinate transformation modes for Resample2d (Honry)
https://
anssik: resample2d() resamples the tensor values from the source to the destination spatial dimensions according to the scaling factors
… resample2d() supports two interpolation algorthms to fill the output tensor values:
enum MLInterpolationMode {
"nearest-neighbor",
"linear"
};
anssik: the issue asks for clarifications to these modes on how to properly implement them
… it appears there are various interpretations of these algos depending on the domain
… Dwayne gives an example for nearest neighbor sampling
… the banking industry uses "round halves up"
… in graphics "round to nearest with X.5 halves toward negative infinity"
… thoughts?
[ no comments at this time ]
anssik: input on GH welcome
Support for transformers
anssik: Continue discuss transformers and related requirements and gaps. Test our improved contribution guidelines with these new ops: explore key use cases, sample models, cross-framework support, cross-platform implementability.
… Status: The WG decided to start explore support for transformers in WebNN.
… Next step: Identify use cases. Contributions welcome via issue #375.
<ghurlbot> Issue 375 Mention transformer in use cases (dontcallmedom) v2
anssik: for example, is there interest to look at the gaps in Stable Diffusion?
RafaelCintron: I have no objection to explore this space of transformers, they are upcoming cool thing in the industry
ningxin_hu: Stable Diffusion would be good one
… Transformers.js is another good one it supports many HuggingFace models
… Segments anything and segmentation usages we can also check out