13:58:11 <RRSAgent> RRSAgent has joined #webmachinelearning
13:58:11 <RRSAgent> logging to https://www.w3.org/2020/09/17-webmachinelearning-irc
13:58:16 <Zakim> Zakim has joined #webmachinelearning
13:58:24 <anssik> RRSAgent, make logs public
13:58:27 <zkis> zkis has joined #webmachinelearning
13:58:34 <anssik> Meeting: WebML CG Teleconference – 17 September 2020
13:58:41 <anssik> Chair: Anssi
13:58:46 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/master/telcons/2020-09-17-agenda.md
13:58:51 <anssik> Scribe: Anssi
13:58:56 <anssik> scribeNick: anssik
13:59:04 <anssik> Present+ Anssi_Kostiainen
13:59:08 <anssik> RRSAgent, draft minutes v2
13:59:08 <RRSAgent> I have made the request to generate https://www.w3.org/2020/09/17-webmachinelearning-minutes.html anssik
13:59:38 <ningxin_hu> ningxin_hu has joined #webmachinelearning
13:59:54 <anssik> Present+ Ningxin_Hu
14:00:43 <anssik> RRSAgent, draft minutes v2
14:00:43 <RRSAgent> I have made the request to generate https://www.w3.org/2020/09/17-webmachinelearning-minutes.html anssik
14:00:45 <baul_eun> baul_eun has joined #webmachinelearning
14:01:42 <anssik> Present+ Ganesan_Ramalingam
14:01:50 <anssik> Present+ Baul_Eun
14:02:00 <rama> rama has joined #webmachinelearning
14:02:22 <anssik> Present+ Paul_McDaniel
14:02:55 <anssik> Present+ Rafael_Cintron
14:05:02 <anssik> Present+ Chai_Chaoweeraprasit
14:05:30 <anssik> TOPIC: Model Execution API
14:05:58 <anssik> anssik: Discuss proposed improvements to the existing execution API
14:06:04 <anssik> -> https://github.com/webmachinelearning/webnn/issues/87 Model Execution API #87
14:06:19 <Chai> Chai has joined #webmachinelearning
14:06:30 <zkis> present+ Zoltan_Kis
14:06:59 <anssik> ningxin_hu: current discussion is around the Ping's first comment
14:07:25 <anssik> today's execution API requires user to provide an output buffer at execution time, so need to know the shape of it and allocate the right sized buffer
14:07:38 <anssik> ... Ping identified that as an ergonomic issue
14:07:58 <anssik> ... also in some models shape is dynamic, not known beforehand, as identified by Rama
14:08:12 <anssik> ... need to address these two issues in the current execution API
14:08:36 <anssik> ... also Chai proposed how to simplify the execution interface, under discussion currently
14:08:41 <anssik> ... Chai other perspectives?
14:09:10 <anssik> Chai: this is a long thread, but I'm glad we had this discussion
14:09:35 <anssik> ... to add what Ningxin described, the original ask is very specific, but in the discussion later on we touch other related topics
14:09:46 <anssik> ... this is becoming an API discussion, all interlinked
14:10:05 <anssik> ... supporting dynamic input shapes and output buffers is reasonable, we should support that
14:10:16 <anssik> ... we're almost in agreement with respect to these points
14:10:40 <anssik> ... if we look at the last replies in this issue, we can conclude we're on the same page
14:11:07 <anssik> ... related to the topic, how to simplify, I think Kenneth has raised many good points around why do we need the complication step
14:11:26 <anssik> ... raises good discussion points, it is good for perf to have a complication step, app has more control
14:11:41 <anssik> ... but we should be able to collapse complication and execution into one
14:11:59 <anssik> ... another related topic, whether we want in the future support eager execution
14:12:14 <anssik> ... Ningxin told me at one point we had that discussion in this group, I wasn't in this group at that time
14:12:21 <anssik> ... if we want to do that, it should be natural
14:12:34 <anssik> ... to support eager we shouldn't need to change everything
14:13:00 <anssik> ... simplifying complication and execution will help make the API amenable for eager more
14:14:14 <anssik> anssik: can we split out other issues from this one? e.g. for eager we do have past issue
14:14:32 <anssik> Chai: good to have all the discussion in context in this issue
14:14:48 <anssik> ningxin_hu: not sure about quantized support?
14:15:24 <anssik> Chai: we have one issue for that, let's lean on it, not specifically issue for adding quantization support to the API
14:15:35 <anssik> ... can open a new issue or piggy-pack on the existing one?
14:15:57 <anssik> ningxin_hu: today's spec has kinds of quantization support, I suggest we comment on that issue
14:16:09 <anssik> ... see if we remove quant from Operand
14:16:13 <anssik> Chai: fine either way
14:16:46 <anssik> ningxin_hu: float32 and scalar are represented by Operand, that might be confusing
14:16:53 <anssik> Chai: that'd be a separate issue
14:17:40 <anssik> Present+ Ping_Yu
14:18:52 <anssik> Ping: I'll back to issue #87 and review feedback
14:19:09 <ping_yu> ping_yu has joined #webmachinelearning
14:19:18 <anssik> Rama: I just looked at #87, it seems execution of subgraph has not been discussion yet?
14:19:36 <Chai> q+
14:20:08 <anssik> anssik: was subgraph execution discussion yet?
14:20:35 <anssik> Chai: I need more information to understand this, in the new API compilation is immutable
14:20:51 <anssik> ... if you want to compile a subgraph, it creates a separate compilation
14:21:14 <anssik> ... compiling part of the graph only should be already solved, as a byproduct of where we arrive now
14:21:16 <ningxin_hu> There are my comments regarding to subgraph execution in the polyfill PR review: https://github.com/webmachinelearning/webnn-polyfill/pull/1#issuecomment-689939624
14:21:33 <anssik> q?
14:21:37 <anssik> ack Chai
14:21:45 <anssik> https://github.com/webmachinelearning/webnn/issues/87
14:21:55 <RafaelCintron> RafaelCintron has joined #webmachinelearning
14:22:48 <anssik> Ping: current compilation is static, cannot be changed, my questions is what if people want to execute a subgraph, extract the feature vector, execute somewhere in the middle before your softmax
14:23:08 <anssik> ... either the user has to execute the whole graph, or you allow people to execute the subgraph
14:23:15 <anssik> ... I think this is normal to have this situation
14:23:28 <anssik> ... or sometimes, you repeatedly feed a layer
14:23:44 <ningxin_hu> q+
14:23:48 <anssik> ... how can the API be able to handle this type of use cases?
14:23:55 <anssik> Chai: thanks, this is more clear now
14:24:04 <anssik> ... can you explain the use case a bit more?
14:24:12 <anssik> ... are you thinking of transfer learning?
14:24:45 <anssik> Ping: let's say MobileNet is not good for classification as is, but people use transfer learning
14:24:59 <anssik> ... you need to be able to execute toward a node that's not the output of the model
14:25:03 <anssik> ... is that clear or no?
14:25:08 <anssik> Chai: I understand this know
14:25:25 <anssik> ningxin_hu: actually, you raised this issue in the polyfill PR review, I had some comments these
14:25:25 <ningxin_hu> https://github.com/webmachinelearning/webnn-polyfill/pull/1#issuecomment-689939624
14:25:44 <anssik> ... to my understanding you can create a subgraph directly
14:26:07 <anssik> ... e.g. in the LeNet example, before matmul layers you can create the graph
14:26:19 <anssik> ... that's the feature extractor you can create and reduce weight there
14:26:29 <anssik> ... perhaps that satisfies the requirement?
14:26:36 <anssik> Ping: that could be the resolution?
14:27:05 <anssik> ... many users of the models do not have an ability to create another model, they just take a pre-trained model and use it as is, fine-tune its feature vector
14:27:24 <anssik> ... this scenario should be solved by execution API itself, not by creating a new model
14:27:47 <anssik> ... if we can stop subgraph execution, there are other cases when people want to execute just one layer
14:28:02 <anssik> ... fine-tuning is widely used scenario
14:28:16 <anssik> Chai: I think this is the same request as eager execution
14:28:42 <anssik> ... the requirement is very similar to making the API support eager mode
14:28:53 <anssik> Ping: to me they are different
14:29:20 <anssik> ... in eager I'm executing much faster, I'm not dynamically creating a graph, I have a pre-trained model, I want to compile it to make it faster
14:29:32 <anssik> ... that's part of the origin model
14:30:01 <anssik> ningxin_hu: my understanding, as illustrated in the example, developer has the flexibility to create the graph as it winds
14:30:30 <anssik> ... the all topology is available, it is up to the developer to create any model and compile and execute it
14:30:49 <anssik> ... do you want to pick some intermediate modes to get their outputs?
14:31:10 <anssik> Ping: like you described, but more flexible way to define input and output nodes
14:31:35 <anssik> ... WebNN does not necessarily need that since we are use case-driven in the API design
14:32:19 <anssik> ... as a JS dev, I'd want to execute a part of a pre-trained model
14:32:55 <anssik> Chai: you want an ability to only part of the pre-trained model? If so, I think the latest changes discussed in this issue address thi
14:32:59 <anssik> s/thi/this
14:33:56 <anssik> Rama: the interface is sufficient, can you specify intermediate values as inputs and outputs?
14:34:14 <anssik> Chai: at some point we discussed optional output argument on the execute method
14:34:35 <anssik> Rama: the API signature does not changes, but can we assume the outputs can be intermediate, impacts implementations
14:34:45 <anssik> s/changes/change
14:35:02 <anssik> Rama: the underlying implementation must do something smart if it needs to stop somewhere in the middle
14:35:20 <anssik> ... e.g. remove unnecessary nodes before execution
14:35:36 <anssik> Ping: one questions regarding output shape
14:36:31 <anssik> ... now compilation is done prior to execution, without knowing subgraph how to satisfy the new execution plan(?)
14:36:57 <anssik> ningxin_hu: compute method can return the result with out dictionary that has dimensions and buffer, you can spec input dim and shape
14:37:18 <Chai> afk
14:37:21 <anssik> Ping: is it true, every time I ask you compile
14:37:22 <anssik> q?
14:37:38 <anssik> ack ningxin_hu
14:38:17 <anssik> ningxin_hu: in today's spec, Operand can have negative value in one dim to say not specified
14:38:44 <anssik> ... when you compute, you spec the dim of the input in detail in concrete shape,  and compute will infer the output shape and return it to you
14:38:59 <anssik> Ping: how we handle that usually, is complication is part of the execution
14:39:04 <anssik> ... compilation is cached
14:39:46 <anssik> ... also the shape does not need to be set prior
14:40:21 <anssik> ... the main concern was that there are a lot of pre steps that is needed prior to execution, may not be known, also need to find out what is the output shape
14:40:35 <anssik> ... because the shape is dynamic and tedious for user to do that
14:41:06 <anssik> ningxin_hu: exactly, you're discussing today's execution API and we came up with a solution to that issue
14:41:15 <anssik> ... no need to know the shape of the output beforehand
14:42:03 <anssik> TOPIC: Packing operations for gemm / matmul
14:42:16 <anssik> anssik: anssik: Discuss optional packing ops and related optimization opportunities
14:42:22 <anssik> -> https://github.com/webmachinelearning/webnn/issues/86 Packing operations for gemm / matmul #86
14:43:31 <anssik> Rama: the use of constant operands to operations like GEMM, matmul where there is opportunity to transfer the layout as an optimization step
14:43:48 <anssik> ... question was, should this be explicitly exposed through the API?
14:43:59 <Chai> back now
14:43:59 <anssik> ... my position is better left as an implementation detail
14:44:16 <ningxin_hu> +1
14:44:42 <anssik> Chai: I agree with Rama
14:44:58 <anssik> ... most of the issues re packing have been addressed
14:45:08 <anssik> ... process of packing is very hardware specific
14:46:22 <anssik> Chai: I'll respond on the issue to comment on the group's position
14:46:41 <anssik> TOPIC: Fingerprinting
14:46:54 <anssik> anssik: Discuss possible fingerprinting vectors and mitigations
14:46:59 <anssik> -> https://github.com/webmachinelearning/webnn/issues/85 Fingerprinting via matmul #85
14:47:29 <anssik> "an efficient matmul implementation can be fingerprinted to determine hardware capabilities."
14:48:35 <ningxin_hu> q+
14:49:02 <anssik> ack ningxin_hu
14:49:29 <anssik> ningxin_hu: some Intel hw was mentioned, so I can follow up from that perspective
14:49:53 <anssik> ... another comment, Kenneth mentions 8bit multiplication, this is related to our quant design
14:50:52 <anssik> ... related to our quantization operator design, as discussed in packing, we can hide this in implementation
14:51:02 <anssik> ... let me follow up from Intel hardware perspective
14:51:23 <anssik> anssik: any other comments on the fingerprinting issue?
14:51:27 <anssik> [none heard]
14:51:32 <anssik> TOPIC: WebNN polyfill and samples
14:51:46 <anssik> anssik: Continue discuss review feedback and suggestions for the foundational implementation and LeNet sample
14:51:51 <anssik> -> https://github.com/webmachinelearning/webnn-polyfill/pull/1 Add the foundation implementation #1
14:51:56 <anssik> -> https://github.com/webmachinelearning/webnn-samples/pull/1 Add LeNet example #1
14:52:25 <anssik> ningxin_hu: comments addressed raised by Ping for the WebNN polyfill
14:52:32 <anssik> ... also separate issues created for couple
14:53:52 <anssik> ... Node.js support was a topic in the workshop, last week enabled the polyfill for Node.js running Mocha tests
14:55:15 <anssik> q?
14:55:38 <anssik> ... update to the LeNet example, added a Table of the LeNet topology
14:56:42 <anssik> TOPIC: TAG review
14:56:55 <anssik> -> https://github.com/webmachinelearning/webnn/issues/89 TAG review #89
14:57:59 <anssik> https://github.com/webmachinelearning/webnn/blob/master/explainer.md
14:59:05 <anssik> anssik: TAG review would depend on a more complete explainer https://github.com/webmachinelearning/webnn/blob/master/explainer.md
15:00:28 <anssik> Chai: the explainer would likely need some code snippets to explain the API, but given we're changing the API a little so better land those API changes after which update the explainer
15:00:32 <ningxin_hu> q+
15:00:52 <anssik> ack ningxin_hu
15:01:05 <anssik> anssik: Sangwhan is a good person review out explainer PRs
15:01:22 <anssik> ningxin_hu: proposal to not block the polyfill and example PRs on spec changes
15:01:36 <anssik> ... then revise them based on the new API design
15:02:33 <Chai> that works
15:02:40 <anssik> anssik: any concerns with that proposal from Ningxin?
15:02:46 <Chai> i can help with the explainer
15:02:48 <anssik> [no concerns]
15:03:29 <anssik> PROPOSED RESOLUTION: Land WebNN polyfill and samples when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion
15:03:43 <anssik> PROPOSED RESOLUTION: Land WebNN polyfill and samples PRs when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion
15:04:10 <anssik> RESOLUTION: Land WebNN polyfill and samples PRs when existing review comments have been addressed, do not block on in-flight spec PRs and API design discussion
15:04:59 <anssik> TOPIC: Adjourn
15:05:04 <anssik> RRSAgent, draft minutes v2
15:05:04 <RRSAgent> I have made the request to generate https://www.w3.org/2020/09/17-webmachinelearning-minutes.html anssik
17:27:36 <Zakim> Zakim has left #webmachinelearning
17:38:38 <zkis> zkis has joined #webmachinelearning
19:01:11 <Alan> Alan has joined #webmachinelearning