Meeting minutes
conv2d op definition
PR reviewed and ready to land
PROPOSED RESOLUTION: Merge PR#43 to add conv2d definition to the spec
ningxin_hu: I see daniel and nikhil online, want to check the signature looks good
HTML preview of conv2d: https://pr-preview.s3.amazonaws.com/huningxin/webnn/pull/43.html#api-neuralnetworkcontext-conv2d
Nikhil: we haven't had time yet to look at this
anssik: ok to merge and evolve in subsequent PRs?
Nikhil: sounds good
Daniel: the same, looks good, will review the details later
Resolution: Merge PR#43 to add conv2d definition to the spec
matMul op definition
[op compatibility] matMul (issue #27)
ningxin_hu: the compat table for matMul op in GitHub in a familiar location at https://github.com/webmachinelearning/webnn/blob/master/op_compatibility/matmul.md
… Chai helped with DirectML mapping
… NNAPI does not have matMul defined, so looked at TF Lite matMul, references found in the doc
[discussing details of dimensionality]
ningxin_hu: NNAPI open question: NNAPI: for N-D output when N >2, is it correct to map TF-Lite Pack op to NNAPI ANEURALNETWORKS_RESHAPE and ANEURALNETWORKS_CONCATENATION ops?
anssik: can Nikhil find someone from Google to help with NNAPI specifics?
Nikhil: can look for a contact
Chai: ningxin starts from numpy, but not sure if that maps nicely to all these APIs
… to multiply matrices, DNNL takes multiply-add approach
… question to all, whether this should be considered instead or in addition
… also does this group want single matMul that can handle N-dim tensors, or do genetic matMul and generic Gem
ningxin_hu: Chai's right re DNNL, but want to hear Nikhil's feedback
Daniel: the reason for not fusing things on the op level is we want to reduce the level of ops in the spec
… if we allow fusion capability we might have more ops than we want
… we want ops to do less things, and let compiler fuse things under the hood
<ningxin_hu> s/gem/gemm/
Daniel: if we allow fusing on a logical level, we're leaking hw optimizations
should keep bias as its own op
… /should keep bias as its own op/... should keep bias as its own op
Chai: agree with Daniel's philosophy, a great topic
… the context in which the compiler is able to use auto fusion, if not careful, hard to discover
… say you have a graph that has multiply-add as to different ops, most hw able to fuse this bias piggybacking on it
… question is how does the sw level know this context can be fused
… in this struggle one idea in ONNX is called ONNX function
… in that you define the whole op as a multiple-add function(?), then if the hw is not able to support mad that gets broken down to two ops it is able to support
… wanted to make a data point, a great topic, everyone trying to address this problem on a native level
Nikhil: MLIR approach takes the function approach, pushing it entirely to the compiler
… when downleveled, there are two components, folding weights and doing fusing, happens in the compiler entirely
… I want to also throw another thing out there
… something we've been thinking about at Google
… op-level is the CG's current approach, another project we should pay attention to is XLA
… lower level than ops, higher than hardware
… designed to GPUs, has around 80 ops
… people on XLA team say they can run every NN with this op set
… a lot of backends for XLA
… currently a Google project: https://www.tensorflow.org/xla/architecture
<daniel_smilkov> https://www.tensorflow.org/xla/architecture
<daniel_smilkov> https://www.tensorflow.org/xla/operation_semantics
Daniel: the primary motivation we're starting to be interested in XLA HLO, it is trying to remove reliance on custom ops
… something around that level can likely stand the test of time
… these ~80 ops haven't grown
… much more stable than ONNX or TF Lite ops that grow faster
Nikhil: also want to bring some caveats of XLA HLO
… the binary format not backwards compatible
… this is clearly an issue for the Web
Chai: I believe, issue #41 has extensive discussion on the topic, XLA also discussed in it, see https://github.com/webmachinelearning/webnn/issues/41
ningxin: back to compat study findings
… remaining opens in addition to NNAPI are MPS and BNNS
… in MPS support matMul through FullyConnected
… also MPSNDArrayMatrixMultiplication can handle matMul on MPS
… help from Apple requested
anssik: any other comments or feedback?
[none]
Handling unsupported OperandType
Handling unsupported OperandType (issue #36)
anssik: Discuss how to surface "datatype not supported by underlying platform" errors through the API
ningxin_hu: initial idea is to throw exception at compile time
Chai: I have another proposal, I believe in this case it should be error code instead of an exception
… feedback provided on the issue
… essentially, you don't want everything to be exception, only unrecoverable errors
… everything errors should be signaled using error codes
… the caller should fall back to compile the model with different options, that is recoverable error
… do we need an error code, or simply return null?
… no preference there, but prefer not to have it as an exception
Rafael: I agree with Chai, exceptions should be thrown when develop has done something totally wrong, in this case this is not the case
anssik: any prior examples from WebGL or WebGPU?
Rafael: yes, there's a bunch of those
<Rafael> https://www.khronos.org/registry/webgl/specs/latest/1.0/#5.2.1. You need to scroll down a little bit before you get to the failIfMajorPerformanceCaveat context attribute.
Chai: I'll look at WebGL examples and see how to adopt those patterns in WebNN
Inference API to load and run a model
Revisit inference API to load and run a model (issue #41)
https://github.com/jbingham/web-ml-inference
anssik: Final call for feedback prior to adding this API into the group's scope
Rafael: do we need to change the charter to take on this work?
anssik: not necessarily, I'll look into this and try to see if we can add this new proposal in scope without changing the charter
anssik: does someone thing this new proposal should NOT be incubate further?
[hearing no opposition]
Rafael: Microsoft would like to see load model API incubated in this CG
Jonathan: Google is in support of this proposal (obviously :-))
Jonathan: about the charter, if we want to update it, based on comments from Microsoft, make a draft and not submit it before review
Rafael: that's how other CGs do it