WebML CG Teleconference – 16 April 2020

Meeting minutes

New ops for WebNN API

anssik: as you know we have added conf and matmul to the API

anssik: now is the time to discuss what the next ops we think we should add

anssik: the proposal is, prioritize ops that address identified use cases

anssik: the space of usecases is expanding rapidly, we should also maintain these use cases that inform the API priorities to ensure we target the future and not the past

anssik: currently the usecases informantly reference papers from publicly available locations

anssik: we should continue to bring to the group's attention the latest developments in this space

anssik: with that background, let's hear proposals for new ops to add to the WebNN API

anssik: the floor is open

<Zakim> anssik, you wanted to ask

ningxin_hu: I think I like the approach to identify the models and usecases and we can use operators that are used in those models

<ningxin_hu> https://‌github.com/‌huningxin/‌webnn/‌wiki/‌WebNN-first-wave-models-and-ops

ningxin_hu: with that, I recommend to do some homework and make a table with the operators for the models

ningxin_hu: on each row there is an operator and models

ningxin_hu: this does not fully cover all usecases but does cover a lot of them

ningxin_hu: add, multiply and some activations as well as pooling ops (avg pool)

ningxin_hu: there are some that we already support, like convolution

ningxin_hu: I left some questions open in the table as well. That's my initial try to identify the ops

anssik: thank you ningxin_hu, this is very helpful

anssik: I think we could add this content into the spec as informative content

Action: ningxin_hu submit a PR for model ops table into spec or wiki

Nikhil: this is a bigger question, we've been spending quite a bit of time about what's going on with TF ops

Nikhil: is there any other standard that is going to make its way into an open body

Nikhil: I'm slowly thinking we should target a lower level of abstraction

Nikhil: we'll be chasing the ops forever and they'll be iterating at a high rate. I know this is kind of a left turn but this at about the level of XLA

Nikhil: the 80 or so ops within XLA covers the majority of neural nets they've seen

Nikhil: I'd challenge us to consider lower level abstraction

Nikhil: there are only 80 where as in TF there are 1K

Nikhil: there are tensor compute primitives and XLA, another Google one...

Nikhil: something at the same level of XLA

Nikhil: that's my current thinking, but to answer your question regarding ops. I have intuitions but I think looking at real models and consider the next steps

Nikhil: interested in hearing people's thoughts

RafaelCintron: I acknowledge what Nikhil is saying, my worry if we get too low level we at the browser can't make them performant. But we'd have to see what they are, you say matmul and conv2d are included. What would be the other ones?

Nikhil: that's a great point, what you mentioned. I'm not saying this is 100% what we do, but I think it will be easier if we have a smaller opset

Nikhil: If we ride the wave of another standard looking at ops

Nikhil: there's the other approach is that you would never go upwards, you'd ship that lower level ops and it would go directly to x86, arm, etc. If we have to call something higher than that - that's an interesting question

RafaelCintron: looking at the ops that ningxin_hu proposed are those too high level?

paul_mcdaniel_msft: Nikhil one thing that may help, low level vs high level. Some scenarios we've hit in the past if we go too low you hit real issues with optimizations

paul_mcdaniel_msft: there is an lstm sitting in there but there was no way of knowing this and inferring the structure. Keras added to the TF model the information to allow for optimization

Nikhil: yeah, there are a lot of problems here and that's a valid point. There is no golden bullet. That said it is valid to have lower level, smaller opset that is pretty solid and not moving

paul_mcdaniel_msft: the ONNX group has been trying to do, is function which allows you to go down to these lower level ops while retaining the higher level concept of what it is

paul_mcdaniel_msft: ningxin_hu is talking about activation but you don't want to know its a mul and add because the GPU can do things when it knows it's an activation

daniel_smilkov: just wanted to add something to Nikhil suggestion

daniel_smilkov: I don't think we want to go extremely low level, we want to find intermediate so that we don't have to add ops every 3 months

daniel_smilkov: XLA for example has LTSM information to allow for optimization on GPU/TPU, etc

<paul_mcdaniel_msft> FWI: how the linux foundation is starting to craft "how to propose new ops"

<paul_mcdaniel_msft> https://‌github.com/‌onnx/‌onnx/‌blob/‌master/‌docs/‌AddNewOp.md

daniel_smilkov: issue with high level is we'll need to keep adding ops. Whatever we standardize today, a year from now we'll have a new architecture that will have more opsets that we'll need to add. It's just the growth

daniel_smilkov: my worry is us keeping up

daniel_smilkov: we'll have to add custom ops if we don't which we've lost the hardware benefit

paul_mcdaniel_msft: that's a good point

paul_mcdaniel_msft: the ONNX foundation, we were worried that we'd have unchecked ops - but it's slowed down and we've started to deprecate ops

paul_mcdaniel_msft: I don't know if we're at steady state, but we were worried about that as well and I don't think we've seen that

<paul_mcdaniel_msft> onnx model architectures

<paul_mcdaniel_msft> https://‌github.com/‌onnx/‌models

daniel_smilkov: the issue is we're trying to guess it

paul_mcdaniel_msft: I'm saying one step stronger, we're not guessing in ONNX

paul_mcdaniel_msft: we have over two dozen model arch that have fed the opsets in ONNX

paul_mcdaniel_msft: it's worth looking at as a case study

daniel_smilkov: it would be a good exercise to go back a year and half ago to understand what models wouldn't run due to missing ops in ONNX

anssik: is that something you all can look at? Or folks in ONNX

daniel_smilkov: yep we can look at that

Action: daniel_smilkov to look at the opsets from a year ago to see what couldn't be ran on ONNX opsets

Nikhil: the goal wasn't to derail this, just to propose this thought because others at Google is considering this

<Zakim> anssik, you wanted to re-ask this question from RafaelCintron: looking at the ops that ningxin_hu proposed are those too high level?

<anssik> https://‌github.com/‌huningxin/‌webnn/‌wiki/‌WebNN-first-wave-models-and-ops

Action: Nikhil and daniel_smilkov to compare ningxin_hu ops and XLA to see what, if any, are too high level

<Nikhil> XLA HLO: https://‌www.tensorflow.org/‌xla/‌operation_semantics

ningxin_hu: want to mention that I, the operator list, I did initial investigation into XLA. I reviewed that and there are some overlaps for example convolution and matmul

ningxin_hu: there are some other things such as batch normalization and concat, elementwise, etc

<paul_mcdaniel_msft> NIkhil do you have a copy to that other lower level op you mentioned? was it "TCP"? thx !

ningxin_hu: we have something common - so my initial take is to grow that set slowly doing our due diligence by studying

ningxin_hu: so we can bring in other element wise ops

ningxin_hu: ningxin_hu will make a PR that will allow Nikhil and daniel_smilkov comment on

RafaelCintron: just the question we have from earlier, ONNX is not in the thousands but in the hundreds and XLA seems to be similar. Maybe we can look at an intersection of those

RafaelCintron: that we can agree on that allow for hardware optimizations but not high enough where we're adding numerous ops every year

Rama: I feel that actually it is helpful to look at it as tensor and scalar ops

Rama: you can identify key categories

Rama: you end up covering a large aspect of ops

Rama: one way to limit it is to identify the op categories

Rama: I think if we do that exercise we'll find the commonalities across XLA and ONNX

Rama: the question that arises is, that if you express it this way can you reference them at a higher level?

Rama: there is value in doing that because of the existing implementations

Rama: hand crafted compilers is normally better, while there is innovation in MLIR, custom is normally better

Rama: so there is value in defining higher level ops

Action: paul_mcdaniel_msft to do an intersection between XLA & ONNX

Inference API to load and run a model

anssik: to quickly summarize where we are

<anssik> https://‌github.com/‌webmachinelearning/‌model-loader

anssik: we reached consensus on incubating this idea in this group

anssik: next steps is to start drafting the actual spec text

anssik: the use cases are already covered in the existing explainer

anssik: can anyone speak to prototyping? Possibly folks on the call

RafaelCintron: we can help with the mission of the load model API

RafaelCintron: I think the group needs to define a canonical format, we don't want to have fragmentation which is not good for web developers.

RafaelCintron: that obviously will bring us full circle to the ops

anssik: yeah that's the good long term goal but the API shape doesn't require us to have that

anssik: our current charter blocks us from making a declaration of a model format

RafaelCintron: really, we can't define a format but we can define an opset

<anssik> https://‌webmachinelearning.github.io/‌charter/#out-of-scope

<anssik> https://‌github.com/‌webmachinelearning/‌charter/‌issues

gregwhitworth: I recommend we do start the charter changes, but yes it shouldn't block the API

Action: anssik create a strawman spec for others to iterate on

Virtual workshop agenda building

<Jonathan> Sorry, my internet is down. Catching up on meeting notes. @Anssi

<anssik> Virtual workshop agenda

anssik: I'd like this group to help figure out what to discuss on the virtual workshop

Jonathan: no worries :)

<Jonathan> Hard to wfh with no internet and poor cell phone reception

anssik: Nikhil it would be good to have a talk around different abstractions, such as XLA, MLIR, etc

anssik: we would have offline short talks that would allow for async consumption

anssik: Nikhil & daniel_smilkov do you all think that's a good topic?

<anssik> Virtual workshop agenda

Action: Nikhil and daniel_smilkov to put together abstraction talk proposal

anssik: I'm trying to figure out which parts of the topics would make good for a virtual workshop

anssik: I'd expect us to try and cover 30-50% of the content. We would have 4 90minute sessions to discuss these topics

anssik: for example have folks from Google & XLA to answer questions and have a discussion around these topics

anssik: if you can just vote for which talks you think should be created for offline consumptions

anssik: for now just add a comment of support, if necessary I'll create a more formal voting process

anssik: we have settled on Zoom + Slack for infra

<anssik> Zoom @ W3C

<anssik> anssik: On that page there are tips how to mitigate some of the known security/privacy issues raised recently

<anssik> anssik: Slack has been used in the past workshops with success, it is more accessible to people than IRC

<anssik> ... I will setup our next CG call with Zoom so we can dogfood the system and iron out any kinks

anssik: we're trending for a time in June

Adjourn

<anssik> anssik: thanks Greg for scribing!!

– DRAFT –
WebML CG Teleconference – 16 April 2020

16 April 2020

Attendees

Meeting minutes

New ops for WebNN API

Inference API to load and run a model

Virtual workshop agenda building

Adjourn

Summary of action items

Diagnostics