W3C

– DRAFT –
WebML CG Teleconference – 17 September 2020

17 September 2020

Attendees

Present
Anssi_Kostiainen, Baul_Eun, Chai_Chaoweeraprasit, Ganesan_Ramalingam, Ningxin_Hu, Paul_McDaniel, Ping_Yu, Rafael_Cintron, Zoltan_Kis
Regrets
-
Chair
Anssi
Scribe
Anssi, anssik

Meeting minutes

Model Execution API

anssik: Discuss proposed improvements to the existing execution API

Model Execution API #87

ningxin_hu: current discussion is around the Ping's first comment

today's execution API requires user to provide an output buffer at execution time, so need to know the shape of it and allocate the right sized buffer
… Ping identified that as an ergonomic issue
… also in some models shape is dynamic, not known beforehand, as identified by Rama
… need to address these two issues in the current execution API
… also Chai proposed how to simplify the execution interface, under discussion currently
… Chai other perspectives?

Chai: this is a long thread, but I'm glad we had this discussion
… to add what Ningxin described, the original ask is very specific, but in the discussion later on we touch other related topics
… this is becoming an API discussion, all interlinked
… supporting dynamic input shapes and output buffers is reasonable, we should support that
… we're almost in agreement with respect to these points
… if we look at the last replies in this issue, we can conclude we're on the same page
… related to the topic, how to simplify, I think Kenneth has raised many good points around why do we need the complication step
… raises good discussion points, it is good for perf to have a complication step, app has more control
… but we should be able to collapse complication and execution into one
… another related topic, whether we want in the future support eager execution
… Ningxin told me at one point we had that discussion in this group, I wasn't in this group at that time
… if we want to do that, it should be natural
… to support eager we shouldn't need to change everything
… simplifying complication and execution will help make the API amenable for eager more

anssik: can we split out other issues from this one? e.g. for eager we do have past issue

Chai: good to have all the discussion in context in this issue

ningxin_hu: not sure about quantized support?

Chai: we have one issue for that, let's lean on it, not specifically issue for adding quantization support to the API
… can open a new issue or piggy-pack on the existing one?

ningxin_hu: today's spec has kinds of quantization support, I suggest we comment on that issue
… see if we remove quant from Operand

Chai: fine either way

ningxin_hu: float32 and scalar are represented by Operand, that might be confusing

Chai: that'd be a separate issue

Ping: I'll back to issue #87 and review feedback

Rama: I just looked at #87, it seems execution of subgraph has not been discussion yet?

anssik: was subgraph execution discussion yet?

Chai: I need more information to understand this, in the new API compilation is immutable
… if you want to compile a subgraph, it creates a separate compilation
… compiling part of the graph only should be already solved, as a byproduct of where we arrive now

<ningxin_hu> There are my comments regarding to subgraph execution in the polyfill PR review: https://‌github.com/‌webmachinelearning/‌webnn-polyfill/‌pull/‌1#issuecomment-689939624

https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌87

Ping: current compilation is static, cannot be changed, my questions is what if people want to execute a subgraph, extract the feature vector, execute somewhere in the middle before your softmax
… either the user has to execute the whole graph, or you allow people to execute the subgraph
… I think this is normal to have this situation
… or sometimes, you repeatedly feed a layer
… how can the API be able to handle this type of use cases?

Chai: thanks, this is more clear now
… can you explain the use case a bit more?
… are you thinking of transfer learning?

Ping: let's say MobileNet is not good for classification as is, but people use transfer learning
… you need to be able to execute toward a node that's not the output of the model
… is that clear or no?

Chai: I understand this know

ningxin_hu: actually, you raised this issue in the polyfill PR review, I had some comments these

<ningxin_hu> https://‌github.com/‌webmachinelearning/‌webnn-polyfill/‌pull/‌1#issuecomment-689939624

ningxin_hu: to my understanding you can create a subgraph directly
… e.g. in the LeNet example, before matmul layers you can create the graph
… that's the feature extractor you can create and reduce weight there
… perhaps that satisfies the requirement?

Ping: that could be the resolution?
… many users of the models do not have an ability to create another model, they just take a pre-trained model and use it as is, fine-tune its feature vector
… this scenario should be solved by execution API itself, not by creating a new model
… if we can stop subgraph execution, there are other cases when people want to execute just one layer
… fine-tuning is widely used scenario

Chai: I think this is the same request as eager execution
… the requirement is very similar to making the API support eager mode

Ping: to me they are different
… in eager I'm executing much faster, I'm not dynamically creating a graph, I have a pre-trained model, I want to compile it to make it faster
… that's part of the origin model

ningxin_hu: my understanding, as illustrated in the example, developer has the flexibility to create the graph as it winds
… the all topology is available, it is up to the developer to create any model and compile and execute it
… do you want to pick some intermediate modes to get their outputs?

Ping: like you described, but more flexible way to define input and output nodes
… WebNN does not necessarily need that since we are use case-driven in the API design
… as a JS dev, I'd want to execute a part of a pre-trained model

Chai: you want an ability to only part of the pre-trained model? If so, I thisnk the latest changes discussed in this issue address thi

Rama: the interface is sufficient, can you specify intermediate values as inputs and outputs?

Chai: at some point we discussed optional output argument on the execute method

Rama: the API signature does not change, but can we assume the outputs can be intermediate, impacts implementations

Rama: the underlying implementation must do something smart if it needs to stop somewhere in the middle
… e.g. remove unnecessary nodes before execution

Ping: one questions regarding output shape
… now compilation is done prior to execution, without knowing subgraph how to satisfy the new execution plan(?)

ningxin_hu: compute method can return the result with out dictionary that has dimensions and buffer, you can spec input dim and shape

<Chai> afk

Ping: is it true, every time I ask you compile

ningxin_hu: in today's spec, Operand can have negative value in one dim to say not specified
… when you compute, you spec the dim of the input in detail in concrete shape, and compute will infer the output shape and return it to you

Ping: how we handle that usually, is complication is part of the execution
… compilation is cached
… also the shape does not need to be set prior
… the main concern was that there are a lot of pre steps that is needed prior to execution, may not be known, also need to find out what is the output shape
… because the shape is dynamic and tedious for user to do that

ningxin_hu: exactly, you're discussing today's execution API and we came up with a solution to that issue
… no need to know the shape of the output beforehand

Packing operations for gemm / matmul

anssik: anssik: Discuss optional packing ops and related optimization opportunities

Packing operations for gemm / matmul #86

Rama: the use of constant operands to operations like GEMM, matmul where there is opportunity to transfer the layout as an optimization step
… question was, should this be explicitly exposed through the API?

<Chai> back now

Rama: my position is better left as an implementation detail

<ningxin_hu> +1

Chai: I agree with Rama
… most of the issues re packing have been addressed
… process of packing is very hardware specific

Chai: I'll respond on the issue to comment on the group's position

Fingerprinting

anssik: Discuss possible fingerprinting vectors and mitigations

Fingerprinting via matmul #85

"an efficient matmul implementation can be fingerprinted to determine hardware capabilities."

ningxin_hu: some Intel hw was mentioned, so I can follow up from that perspective
… another comment, Kenneth mentions 8bit multiplication, this is related to our quant design
… related to our quantization operator design, as discussed in packing, we can hide this in implementation
… let me follow up from Intel hardware perspective

anssik: any other comments on the fingerprinting issue?

[none heard]

WebNN polyfill and samples

anssik: Continue discuss review feedback and suggestions for the foundational implementation and LeNet sample

Add the foundation implementation #1

Add LeNet example #1

ningxin_hu: comments addressed raised by Ping for the WebNN polyfill
… also separate issues created for couple
… Node.js support was a topic in the workshop, last week enabled the polyfill for Node.js running Mocha tests
… update to the LeNet example, added a Table of the LeNet topology

TAG review

TAG review #89

https://‌github.com/‌webmachinelearning/‌webnn/‌blob/‌master/‌explainer.md

anssik: TAG review would depend on a more complete explainer https://‌github.com/‌webmachinelearning/‌webnn/‌blob/‌master/‌explainer.md

Chai: the explainer would likely need some code snippets to explain the API, but given we're changing the API a little so better land those API changes after which update the explainer

anssik: Sangwhan is a good person review out explainer PRs

ningxin_hu: proposal to not block the polyfill and example PRs on spec changes
… then revise them based on the new API design

<Chai> that works

anssik: any concerns with that proposal from Ningxin?

<Chai> i can help with the explainer

[no concerns]

PROPOSED RESOLUTION: Land WebNN polyfill and samples when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion

PROPOSED RESOLUTION: Land WebNN polyfill and samples PRs when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion

Resolution: Land WebNN polyfill and samples PRs when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion

Adjourn

Summary of resolutions

  1. Land WebNN polyfill and samples PRs when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion
Minutes manually created (not a transcript), formatted by scribe.perl version 123 (Tue Sep 1 21:19:13 2020 UTC).

Diagnostics

Succeeded: s/thi/this

Succeeded: s/changes/change

Maybe present: anssik, Chai, Ping, Rama