13:58:07 <RRSAgent> RRSAgent has joined #webmachinelearning
13:58:12 <RRSAgent> logging to https://www.w3.org/2023/09/07-webmachinelearning-irc
13:58:12 <Zakim> RRSAgent, make logs Public
13:58:13 <Zakim> please title this meeting ("meeting: ..."), anssik
13:58:14 <anssik> Meeting: WebML WG Teleconference – 7 September 2023
13:58:23 <anssik> Chair: Anssi
13:58:23 <anssik> Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-09-07-wg-agenda.md
13:58:29 <anssik> Scribe: Anssi
13:58:33 <anssik> scribeNick: anssik
13:58:45 <anssik> gb, this is webmachinelearning/webnn
13:58:46 <gb> anssik, OK.
13:58:56 <anssik> Present+ Anssi_Kostiainen
13:58:56 <anssik> Present+ Rafael_Cintron
13:58:56 <anssik> Present+ Zoltan_Kis
13:59:08 <anssik> RRSAgent, draft minutes
13:59:10 <RRSAgent> I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik
13:59:47 <jsbell> jsbell has joined #webmachinelearning
14:00:09 <anssik> Present+ Joshua_Bell
14:00:21 <RafaelCintron> RafaelCintron has joined #webmachinelearning
14:00:29 <zkis> zkis has joined #webmachinelearning
14:01:10 <anssik> Present+ Chai_Chaoweeraprasit
14:01:11 <ningxin_hu> ningxin_hu has joined #webmachinelearning
14:01:16 <anssik> Present+ Ningxin_Hu
14:01:59 <AramZS> AramZS has joined #webmachinelearning
14:03:15 <anssik> RRSAgent, draft minutes
14:03:16 <RRSAgent> I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik
14:03:38 <anssik> Topic: WebNN v2: Proposed text-to-text and text-generation model targets
14:04:12 <anssik> anssik: at our last meeting we agreed to use the following models as our v2 targets:
14:04:17 <chai> chai has joined #webmachinelearning
14:04:19 <anssik> Text-to-image: Stable Diffusion unet/VAE/text encoder
14:04:26 <anssik> Speech-to-text: Whisper Tiny
14:04:34 <anssik> Present+ Joshua_Lochner
14:05:24 <Joshua_Lochner> Joshua_Lochner has joined #webmachinelearning
14:05:29 <anssik> anssik: we identified text-to-text and text-generation models as an additional targets worth looking into and welcomed proposals
14:05:48 <anssik> ... Joshua Lochner contributed super helpful data on trending text-to-text and text-generation models from Hugging Face Hub, thanks!
14:05:54 <anssik> -> Text-to-text and text-gen v2 models proposals by Joshua L https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1693955191
14:06:21 <anssik> Subtopic: Text-to-text generation (encoder-decoder)
14:06:27 <anssik> -> Trending Text-to-text generation (encoder-decoder) https://huggingface.co/models?pipeline_tag=text2text-generation&sort=trending
14:06:52 <anssik> anssik: majority t5-based, newer models use t5 text-encoder, non-t5 architectures m2m100, bart
14:07:02 <anssik> ... Transformers.js demonstrates feasibility of these architectures
14:07:19 <anssik> ... possible use cases/tasks: translation, summarization, instruction finetuned text-to-text
14:07:44 <anssik> ... Joshua? What do you think would make for a good target text-to-text gen model(s) for browser usage?
14:07:45 <anssik> q?
14:08:26 <anssik> Joshua_Lochner: I did try to split this into encoder-decoder and decoder only
14:09:11 <anssik> ... getting weight as small as possible has been challenging, in some archs a lot of time wasted on this, HF team's optimization efforts able to minimize small size of t5 based arch by 20-30%
14:09:31 <anssik> ... ONNX exports do not do de-dupe so not so easy task
14:10:00 <anssik> ... these arch are quite different, decoder-only models have to have a "caching branch" as in Transformers.js
14:10:23 <anssik> ... reusing key and values a major issue earlier on
14:10:58 <anssik> ... in encoder-decoder models OTOH, need to send decoder attentions
14:11:52 <anssik> ... t5 does token classification, text classification, summarization, arbit text2text problems, if all ops are supported on this level it get always any most popular model running
14:12:08 <anssik> ... t5 is a great baseline to build on top of us and likely a good target for us
14:12:40 <anssik> ... out of non-t5 archs, m2m100 is the backbone of "no lang left behind" and other such models
14:13:15 <anssik> ... great for translation, t5 and m2m100 are best best for us I think
14:13:41 <anssik> q?
14:14:05 <chai> q+
14:14:10 <anssik> ack chai
14:14:51 <anssik> chai: that's a great place to start
14:14:54 <anssik> #375
14:14:54 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]
14:15:34 <anssik> chai: these are important models, let's continue discussion in issue #375
14:15:56 <anssik> q?
14:16:13 <anssik> Subtopic: Text-generation (decoder-only)
14:16:19 <anssik> -> Trending Text-generation (decoder-only) https://huggingface.co/models?pipeline_tag=text-generation&sort=trending
14:16:37 <anssik> anssik: trending are llama base models and its finetuned versions
14:17:04 <anssik> ... in web context 7B parameter model demonstrably runs in the browser vs. Open LLM Leaderboard 60B+
14:17:13 <anssik> ... Web LLM JS library runs 7B params with 4-bit quantization with WebGPU
14:17:21 <anssik> -> Web LLM https://github.com/mlc-ai/web-llm
14:17:25 <anssik> -> Open LLM Leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
14:17:40 <anssik> ... more browser friendly smaller models gpt2, bloom, gpt-neo
14:17:45 <anssik> ... Joshua's fav use case: code completion
14:18:04 <anssik> q?
14:18:21 <anssik> Joshua_Lochner: Anssi you hit the nail in the head, llama is trending
14:18:54 <Vivek> Vivek has joined #webmachinelearning
14:18:58 <anssik> ... if you get those llama models running in the browser as experimented already, it's be amazing, 15-20 tokens/s, 4-quant, it is 3-4 gigs
14:20:28 <anssik> ... drop encoder, do decoder-only is the hot topic currently, conversational LLM
14:20:41 <anssik> ... domain-specific use cases such as code completion quite useful
14:21:09 <anssik> ... just released code llma, instruction finetuned code completion, and just code completion no instruction funetuning
14:21:20 <anssik> ... those are useful use cases
14:21:48 <anssik> ... older archs gpt2 and gpt-neo, neox, still interest in these models
14:22:34 <anssik> ... but if you can start and get llama running the others will fall into place, if all ops required by llama will be supported, others will work too largely
14:23:15 <anssik> ... proposed target would be to get llama running
14:24:05 <anssik> ... in open source LLMs, latest is falcon, architecture is interesting, they also make smaller versions of the model
14:24:44 <anssik> q?
14:25:36 <anssik> #375
14:25:36 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]
14:25:46 <anssik> q?
14:26:25 <chai> q+
14:26:32 <anssik> ack chai
14:27:23 <anssik> chai: transformers is a big topic, a moving area, so I have a few thoughts on how to proceed potentially
14:27:57 <anssik> ... we should maybe approach it similarly to CNN approach, pick the starting models and based on op breakdown analyze which are common across the models
14:28:13 <anssik> https://github.com/webmachinelearning/webnn/blob/main/op_compatibility/first_wave_models.md
14:28:34 <anssik> chai: at some point the set of ops start to be more and more common and less novelty
14:29:02 <anssik> ... getting the starting point is important, we probably want to accept contributions from the people on the call
14:29:41 <anssik> ... we want to think how to start get this into the spec and what else we need to support, should start from the spec standpoint go kick some of the models to start with
14:30:02 <anssik> ... implementation-wise, there are a few big roadblocks that need to be solved to be able to run big models in the browser
14:30:19 <anssik> ... not happening over night, progress on the spec side will accelerate development on the browser implementations
14:30:49 <anssik> ... we want to work at the same pace with spec and implementation, and are doing that
14:31:31 <Joshua_Lochner_> Joshua_Lochner_ has joined #webmachinelearning
14:32:41 <anssik> q?
14:32:42 <chai> brb
14:33:39 <anssik> q?
14:33:52 <anssik> Subtopic: Proposed new v2 ops and data types
14:34:07 <anssik> Present+ Dwayne_Robinson
14:34:15 <anssik> #375
14:34:15 <gb> https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]
14:34:44 <anssik> https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1674224992
14:35:04 <anssik> anssik: I've only seen +1s and no concerns raised for this proposed list of new v2 ops and data types
14:35:27 <anssik> ... I acknowledge Google Chrome's feedback on WebNN on streamlined API surface, but I also see Google would prefer the group to target these models: Segment Anything, Stable Diffusion, Whisper Tiny
14:35:36 <anssik> ... I wonder if Google folks have any feedback on the proposed v2 ops and data types?
14:35:44 <anssik> q?
14:35:54 <anssik> Present+ Vivek_Sekhar
14:36:08 <anssik> Joshua_Bell: we'll discuss internally and come back with feedback
14:36:32 <anssik> Dwayne: reduced instruction set, MLIR TOSA or a general desire for reduced?
14:36:53 <anssik> Vivek: no specific instruction set in mind at this point, TOSA is a great example of a very low-level instruction set, minimal
14:36:57 <chai> <back>
14:37:22 <anssik> ... StableHLO another example, drawing from those existing examples is what we're doing, relates to the earlier question on v2 ops
14:37:41 <chai> q+
14:37:55 <anssik> ... curious how high-level ops can be decomposed to low-level ops, would love to explore that, can we do some of this work in userspace
14:38:20 <anssik> Dwayne: spec gives guidance on high-level op how they can be implemented in terms of low-level ops
14:38:41 <anssik> Vivek: link between this decomp would be very helpful for implementers, what is the minimal baseline
14:39:26 <anssik> q?
14:39:40 <anssik> Topic: Spec writing with Bikeshed using the latest WebIDL and Infra standard conventions
14:40:02 <anssik> anssik: Big PR was merged and established a modernized foundation to define our new features on. Thanks you all for your contributions!
14:40:25 <anssik> ... I asked Joshua Bell to give an update on spec writing using the latest conventions in Web IDL and Infra and share his Bikeshed tips, for the benefit of all contributors, especially editors
14:40:29 <anssik> -> Writing Procedural Specs https://garykac.github.io/procspec/
14:40:32 <anssik> -> Writing Specifications with Bikeshed https://dlaliberte.github.io/bikeshed-intro/
14:40:36 <anssik> -> Bikeshed Docs https://speced.github.io/bikeshed/
14:41:32 <anssik> Joshua_Bell: want to share what we've learned writing specs
14:41:43 <anssik> ... earlier specs did only inputs and output and did not talk about details
14:41:56 <anssik> ... with multiple implementations that becomes a problem
14:42:07 <anssik> ... we should not expect certain behaviour
14:42:16 <anssik> ... this makes hard to evolve implementation, lack of precision
14:42:29 <anssik> ... audience should be browser implementer, not web developer
14:42:45 <anssik> ... if you have anything in the spec in many places things will get out of sync
14:42:57 <anssik> ... optimize precision
14:43:11 <anssik> ... non-normative sections should become more common
14:43:49 <anssik> ... they describe the API geared toward web developers, clearly separated from normative content
14:44:11 <anssik> ... WebIDL is the interface between your spec and JS
14:44:33 <anssik> ... in normative sections you should never talk about JS types, mention ECMAScript in 99.9999%  cases
14:44:54 <anssik> ... it is really hard to get right, when you get it wrong security bugs creep in easily
14:45:07 <anssik> ... use basic Infra spec defined types
14:45:32 <anssik> ... Infra spec is not perfect, you go beyond it, it is evolving e.g. internal slots evolving
14:45:42 <anssik> ... if it is not in Infra, check its GH open issues
14:45:51 <anssik> ... last fallback, open a new Infra issue
14:46:20 <anssik> ... Bikeshed is a compiler for your spec
14:46:28 <anssik> ... the less you write, the less likely you have a bug
14:46:51 <anssik> ... use as less words as possible for better spec
14:47:04 <anssik> ... use Infra types, Bikeshed will catch link errors
14:47:11 <anssik> ... use scope as in programming languages
14:47:30 <anssik> ... algorithm blocks, Bikeshed can do error checking if referencing a variable outside scope
14:47:37 <anssik> ... keep your builds clean
14:47:54 <anssik> ... if you use plain HTML no validation by Bikeshed
14:48:14 <anssik> ... Bikeshed is not bug free, e.g. I found 3 bugs when working on latest WebNN conventions update
14:49:01 <anssik> ... generated Bikeshed indexes, terms defined by it, help you find if you have things defined you don't want to
14:49:28 <anssik> ... also look at the normative references list, are they really the dependencies your spec should depend on
14:49:50 <anssik> ... closing thoughs, IDL index that gets generated can be used to automatically generate WPT API shape tests
14:50:21 <anssik> ... speaking of WPT, every normative change should come with a WPT test
14:52:11 <jsbell> nickname on github: @InexorableTash
14:52:11 <gb> https://github.com/InexorableTash -> @InexorableTash
14:52:39 <ningxin_hu> very helpful info, thanks much Joshua!
14:52:40 <anssik> q?
14:52:42 <ningxin_hu> q+
14:53:01 <anssik> ack chai
14:53:10 <anssik> ack ningxin_hu
14:53:21 <anssik> ningxin_hu: thanks for the introduction!
14:53:59 <anssik> ... question, in WebNN we use algorithms that interact with native APIs, we refer to that as platform object, is there guidance for this type of interaction?
14:54:29 <anssik> Joshua_Bell: I want to work with you on that
14:56:12 <anssik> ... when referencing platform-specific behaviour, it would be optimal if the spec could be implemented in JS (even if slow)
14:56:55 <anssik> ... some algorithms are linking to paper that are behind paywalls, these should have public references
14:57:20 <anssik> ... referring to a platform object OK, but not talk about its behaviour, I'll get back to you
14:58:22 <anssik> https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-09-07-wg-agenda.md
14:58:36 <anssik> Topic: New features
14:58:42 <anssik> Subtopic: Allow to hint number of threads for CPU MLContext
14:58:45 <anssik> anssik: issue #436
14:58:46 <gb> https://github.com/webmachinelearning/webnn/issues/436 -> Issue 436 Allow to hint number of threads for CPU MLContext (by huningxin)
14:58:50 <anssik> ... use case: ML frameworks commonly parallelize operator computation when inferring a model
14:59:50 <anssik> ningxin_hu: the number of threads (degree of parallelism) configuration depend on usage scenarios: e.g. single-threaded better for small models due to task scheduling overhead
15:00:23 <anssik> ... proposed to allow the framework to set the number of threads
15:00:32 <anssik> ... ML frameworks allow control the number of threads, examples ONNXRuntime, TF Lite
15:02:00 <anssik> anssik: Model Loader API experimented with MLContextOptions.numThreads
15:02:00 <anssik> q?
15:02:13 <RafaelCintron> q+
15:02:14 <anssik> ack RafaelCintron
15:02:52 <anssik> RafaelCintron: most people have multiple tabs open and threads doing things, should this be considered a hint ?
15:03:13 <anssik> ... we should go easy on user's machines
15:03:40 <anssik> ningxin_hu: this would be considered a hint
15:04:10 <anssik> RafaelCintron: are there scenarios where the browser chooses number of threads poorly?
15:05:07 <anssik> ningxin_hu: for small model context switching overhead is significant compared to compute cost
15:05:24 <anssik> ... in this case the hint would be to use only a single core for such a model
15:05:42 <anssik> q?
15:06:11 <anssik> q?
15:06:22 <anssik> q?
15:06:47 <anssik> Topic: AOB
15:06:52 <anssik> anssik: W3C TPAC 2023 next week, no official WebML meeting there but I will talk to folks there and bring any feedback to the WG
15:06:59 <anssik> ... expect our next agenda to arrive a bit later due to TPAC, will meet at the usual time 21 September regardless
15:07:04 <anssik> RRSAgent, draft minutes
15:07:06 <RRSAgent> I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik
17:27:32 <Zakim> Zakim has left #webmachinelearning