WebML CG Teleconference – 5 March 2020

Meeting minutes

conv2d op definition

PR reviewed and ready to land

PROPOSED RESOLUTION: Merge PR#43 to add conv2d definition to the spec

ningxin_hu: I see daniel and nikhil online, want to check the signature looks good

HTML preview of conv2d: https://‌pr-preview.s3.amazonaws.com/‌huningxin/‌webnn/‌pull/‌43.html#api-neuralnetworkcontext-conv2d

Nikhil: we haven't had time yet to look at this

anssik: ok to merge and evolve in subsequent PRs?

Nikhil: sounds good

Daniel: the same, looks good, will review the details later

Resolution: Merge PR#43 to add conv2d definition to the spec

matMul op definition

[op compatibility] matMul (issue #27)

matMul compat table

ningxin_hu: the compat table for matMul op in GitHub in a familiar location at https://‌github.com/‌webmachinelearning/‌webnn/‌blob/‌master/‌op_compatibility/‌matmul.md
… Chai helped with DirectML mapping
… NNAPI does not have matMul defined, so looked at TF Lite matMul, references found in the doc

[discussing details of dimensionality]

ningxin_hu: NNAPI open question: NNAPI: for N-D output when N >2, is it correct to map TF-Lite Pack op to NNAPI ANEURALNETWORKS_RESHAPE and ANEURALNETWORKS_CONCATENATION ops?

anssik: can Nikhil find someone from Google to help with NNAPI specifics?

Nikhil: can look for a contact

Chai: ningxin starts from numpy, but not sure if that maps nicely to all these APIs
… to multiply matrices, DNNL takes multiply-add approach
… question to all, whether this should be considered instead or in addition
… also does this group want single matMul that can handle N-dim tensors, or do genetic matMul and generic Gem

ningxin_hu: Chai's right re DNNL, but want to hear Nikhil's feedback

Daniel: the reason for not fusing things on the op level is we want to reduce the level of ops in the spec
… if we allow fusion capability we might have more ops than we want
… we want ops to do less things, and let compiler fuse things under the hood

<ningxin_hu> s/gem/gemm/

Daniel: if we allow fusing on a logical level, we're leaking hw optimizations

should keep bias as its own op
… /should keep bias as its own op/... should keep bias as its own op

Chai: agree with Daniel's philosophy, a great topic
… the context in which the compiler is able to use auto fusion, if not careful, hard to discover
… say you have a graph that has multiply-add as to different ops, most hw able to fuse this bias piggybacking on it
… question is how does the sw level know this context can be fused
… in this struggle one idea in ONNX is called ONNX function
… in that you define the whole op as a multiple-add function(?), then if the hw is not able to support mad that gets broken down to two ops it is able to support
… wanted to make a data point, a great topic, everyone trying to address this problem on a native level

Nikhil: MLIR approach takes the function approach, pushing it entirely to the compiler
… when downleveled, there are two components, folding weights and doing fusing, happens in the compiler entirely
… I want to also throw another thing out there
… something we've been thinking about at Google
… op-level is the CG's current approach, another project we should pay attention to is XLA
… lower level than ops, higher than hardware
… designed to GPUs, has around 80 ops
… people on XLA team say they can run every NN with this op set
… a lot of backends for XLA
… currently a Google project: https://‌www.tensorflow.org/‌xla/‌architecture

<daniel_smilkov> https://‌www.tensorflow.org/‌xla/‌architecture

<daniel_smilkov> https://‌www.tensorflow.org/‌xla/‌operation_semantics

Daniel: the primary motivation we're starting to be interested in XLA HLO, it is trying to remove reliance on custom ops
… something around that level can likely stand the test of time
… these ~80 ops haven't grown
… much more stable than ONNX or TF Lite ops that grow faster

Nikhil: also want to bring some caveats of XLA HLO
… the binary format not backwards compatible
… this is clearly an issue for the Web

Chai: I believe, issue #41 has extensive discussion on the topic, XLA also discussed in it, see https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌41

ningxin: back to compat study findings
… remaining opens in addition to NNAPI are MPS and BNNS
… in MPS support matMul through FullyConnected
… also MPSNDArrayMatrixMultiplication can handle matMul on MPS
… help from Apple requested

anssik: any other comments or feedback?

[none]

Handling unsupported OperandType

Handling unsupported OperandType (issue #36)

anssik: Discuss how to surface "datatype not supported by underlying platform" errors through the API

ningxin_hu: initial idea is to throw exception at compile time

Chai: I have another proposal, I believe in this case it should be error code instead of an exception
… feedback provided on the issue
… essentially, you don't want everything to be exception, only unrecoverable errors
… everything errors should be signaled using error codes
… the caller should fall back to compile the model with different options, that is recoverable error
… do we need an error code, or simply return null?
… no preference there, but prefer not to have it as an exception

Rafael: I agree with Chai, exceptions should be thrown when develop has done something totally wrong, in this case this is not the case

anssik: any prior examples from WebGL or WebGPU?

Rafael: yes, there's a bunch of those

<Rafael> https://‌www.khronos.org/‌registry/‌webgl/‌specs/‌latest/‌1.0/#5.2.1. You need to scroll down a little bit before you get to the failIfMajorPerformanceCaveat context attribute.

Chai: I'll look at WebGL examples and see how to adopt those patterns in WebNN

Inference API to load and run a model

Revisit inference API to load and run a model (issue #41)

https://‌github.com/‌jbingham/‌web-ml-inference

anssik: Final call for feedback prior to adding this API into the group's scope

Rafael: do we need to change the charter to take on this work?

anssik: not necessarily, I'll look into this and try to see if we can add this new proposal in scope without changing the charter

anssik: does someone thing this new proposal should NOT be incubate further?

[hearing no opposition]

Rafael: Microsoft would like to see load model API incubated in this CG

Jonathan: Google is in support of this proposal (obviously :-))

Jonathan: about the charter, if we want to update it, based on comments from Microsoft, make a draft and not submit it before review

Rafael: that's how other CGs do it

– DRAFT –
WebML CG Teleconference – 5 March 2020

05 March 2020

Attendees

Meeting minutes

conv2d op definition

matMul op definition

Handling unsupported OperandType

Inference API to load and run a model

Adjourn

Summary of resolutions

Diagnostics