13:58:07 RRSAgent has joined #webmachinelearning 13:58:12 logging to https://www.w3.org/2023/09/07-webmachinelearning-irc 13:58:12 RRSAgent, make logs Public 13:58:13 please title this meeting ("meeting: ..."), anssik 13:58:14 Meeting: WebML WG Teleconference – 7 September 2023 13:58:23 Chair: Anssi 13:58:23 Agenda: https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-09-07-wg-agenda.md 13:58:29 Scribe: Anssi 13:58:33 scribeNick: anssik 13:58:45 gb, this is webmachinelearning/webnn 13:58:46 anssik, OK. 13:58:56 Present+ Anssi_Kostiainen 13:58:56 Present+ Rafael_Cintron 13:58:56 Present+ Zoltan_Kis 13:59:08 RRSAgent, draft minutes 13:59:10 I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik 13:59:47 jsbell has joined #webmachinelearning 14:00:09 Present+ Joshua_Bell 14:00:21 RafaelCintron has joined #webmachinelearning 14:00:29 zkis has joined #webmachinelearning 14:01:10 Present+ Chai_Chaoweeraprasit 14:01:11 ningxin_hu has joined #webmachinelearning 14:01:16 Present+ Ningxin_Hu 14:01:59 AramZS has joined #webmachinelearning 14:03:15 RRSAgent, draft minutes 14:03:16 I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik 14:03:38 Topic: WebNN v2: Proposed text-to-text and text-generation model targets 14:04:12 anssik: at our last meeting we agreed to use the following models as our v2 targets: 14:04:17 chai has joined #webmachinelearning 14:04:19 Text-to-image: Stable Diffusion unet/VAE/text encoder 14:04:26 Speech-to-text: Whisper Tiny 14:04:34 Present+ Joshua_Lochner 14:05:24 Joshua_Lochner has joined #webmachinelearning 14:05:29 anssik: we identified text-to-text and text-generation models as an additional targets worth looking into and welcomed proposals 14:05:48 ... Joshua Lochner contributed super helpful data on trending text-to-text and text-generation models from Hugging Face Hub, thanks! 14:05:54 -> Text-to-text and text-gen v2 models proposals by Joshua L https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1693955191 14:06:21 Subtopic: Text-to-text generation (encoder-decoder) 14:06:27 -> Trending Text-to-text generation (encoder-decoder) https://huggingface.co/models?pipeline_tag=text2text-generation&sort=trending 14:06:52 anssik: majority t5-based, newer models use t5 text-encoder, non-t5 architectures m2m100, bart 14:07:02 ... Transformers.js demonstrates feasibility of these architectures 14:07:19 ... possible use cases/tasks: translation, summarization, instruction finetuned text-to-text 14:07:44 ... Joshua? What do you think would make for a good target text-to-text gen model(s) for browser usage? 14:07:45 q? 14:08:26 Joshua_Lochner: I did try to split this into encoder-decoder and decoder only 14:09:11 ... getting weight as small as possible has been challenging, in some archs a lot of time wasted on this, HF team's optimization efforts able to minimize small size of t5 based arch by 20-30% 14:09:31 ... ONNX exports do not do de-dupe so not so easy task 14:10:00 ... these arch are quite different, decoder-only models have to have a "caching branch" as in Transformers.js 14:10:23 ... reusing key and values a major issue earlier on 14:10:58 ... in encoder-decoder models OTOH, need to send decoder attentions 14:11:52 ... t5 does token classification, text classification, summarization, arbit text2text problems, if all ops are supported on this level it get always any most popular model running 14:12:08 ... t5 is a great baseline to build on top of us and likely a good target for us 14:12:40 ... out of non-t5 archs, m2m100 is the backbone of "no lang left behind" and other such models 14:13:15 ... great for translation, t5 and m2m100 are best best for us I think 14:13:41 q? 14:14:05 q+ 14:14:10 ack chai 14:14:51 chai: that's a great place to start 14:14:54 #375 14:14:54 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 14:15:34 chai: these are important models, let's continue discussion in issue #375 14:15:56 q? 14:16:13 Subtopic: Text-generation (decoder-only) 14:16:19 -> Trending Text-generation (decoder-only) https://huggingface.co/models?pipeline_tag=text-generation&sort=trending 14:16:37 anssik: trending are llama base models and its finetuned versions 14:17:04 ... in web context 7B parameter model demonstrably runs in the browser vs. Open LLM Leaderboard 60B+ 14:17:13 ... Web LLM JS library runs 7B params with 4-bit quantization with WebGPU 14:17:21 -> Web LLM https://github.com/mlc-ai/web-llm 14:17:25 -> Open LLM Leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard 14:17:40 ... more browser friendly smaller models gpt2, bloom, gpt-neo 14:17:45 ... Joshua's fav use case: code completion 14:18:04 q? 14:18:21 Joshua_Lochner: Anssi you hit the nail in the head, llama is trending 14:18:54 Vivek has joined #webmachinelearning 14:18:58 ... if you get those llama models running in the browser as experimented already, it's be amazing, 15-20 tokens/s, 4-quant, it is 3-4 gigs 14:20:28 ... drop encoder, do decoder-only is the hot topic currently, conversational LLM 14:20:41 ... domain-specific use cases such as code completion quite useful 14:21:09 ... just released code llma, instruction finetuned code completion, and just code completion no instruction funetuning 14:21:20 ... those are useful use cases 14:21:48 ... older archs gpt2 and gpt-neo, neox, still interest in these models 14:22:34 ... but if you can start and get llama running the others will fall into place, if all ops required by llama will be supported, others will work too largely 14:23:15 ... proposed target would be to get llama running 14:24:05 ... in open source LLMs, latest is falcon, architecture is interesting, they also make smaller versions of the model 14:24:44 q? 14:25:36 #375 14:25:36 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 14:25:46 q? 14:26:25 q+ 14:26:32 ack chai 14:27:23 chai: transformers is a big topic, a moving area, so I have a few thoughts on how to proceed potentially 14:27:57 ... we should maybe approach it similarly to CNN approach, pick the starting models and based on op breakdown analyze which are common across the models 14:28:13 https://github.com/webmachinelearning/webnn/blob/main/op_compatibility/first_wave_models.md 14:28:34 chai: at some point the set of ops start to be more and more common and less novelty 14:29:02 ... getting the starting point is important, we probably want to accept contributions from the people on the call 14:29:41 ... we want to think how to start get this into the spec and what else we need to support, should start from the spec standpoint go kick some of the models to start with 14:30:02 ... implementation-wise, there are a few big roadblocks that need to be solved to be able to run big models in the browser 14:30:19 ... not happening over night, progress on the spec side will accelerate development on the browser implementations 14:30:49 ... we want to work at the same pace with spec and implementation, and are doing that 14:31:31 Joshua_Lochner_ has joined #webmachinelearning 14:32:41 q? 14:32:42 brb 14:33:39 q? 14:33:52 Subtopic: Proposed new v2 ops and data types 14:34:07 Present+ Dwayne_Robinson 14:34:15 #375 14:34:15 https://github.com/webmachinelearning/webnn/issues/375 -> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set] 14:34:44 https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1674224992 14:35:04 anssik: I've only seen +1s and no concerns raised for this proposed list of new v2 ops and data types 14:35:27 ... I acknowledge Google Chrome's feedback on WebNN on streamlined API surface, but I also see Google would prefer the group to target these models: Segment Anything, Stable Diffusion, Whisper Tiny 14:35:36 ... I wonder if Google folks have any feedback on the proposed v2 ops and data types? 14:35:44 q? 14:35:54 Present+ Vivek_Sekhar 14:36:08 Joshua_Bell: we'll discuss internally and come back with feedback 14:36:32 Dwayne: reduced instruction set, MLIR TOSA or a general desire for reduced? 14:36:53 Vivek: no specific instruction set in mind at this point, TOSA is a great example of a very low-level instruction set, minimal 14:36:57 14:37:22 ... StableHLO another example, drawing from those existing examples is what we're doing, relates to the earlier question on v2 ops 14:37:41 q+ 14:37:55 ... curious how high-level ops can be decomposed to low-level ops, would love to explore that, can we do some of this work in userspace 14:38:20 Dwayne: spec gives guidance on high-level op how they can be implemented in terms of low-level ops 14:38:41 Vivek: link between this decomp would be very helpful for implementers, what is the minimal baseline 14:39:26 q? 14:39:40 Topic: Spec writing with Bikeshed using the latest WebIDL and Infra standard conventions 14:40:02 anssik: Big PR was merged and established a modernized foundation to define our new features on. Thanks you all for your contributions! 14:40:25 ... I asked Joshua Bell to give an update on spec writing using the latest conventions in Web IDL and Infra and share his Bikeshed tips, for the benefit of all contributors, especially editors 14:40:29 -> Writing Procedural Specs https://garykac.github.io/procspec/ 14:40:32 -> Writing Specifications with Bikeshed https://dlaliberte.github.io/bikeshed-intro/ 14:40:36 -> Bikeshed Docs https://speced.github.io/bikeshed/ 14:41:32 Joshua_Bell: want to share what we've learned writing specs 14:41:43 ... earlier specs did only inputs and output and did not talk about details 14:41:56 ... with multiple implementations that becomes a problem 14:42:07 ... we should not expect certain behaviour 14:42:16 ... this makes hard to evolve implementation, lack of precision 14:42:29 ... audience should be browser implementer, not web developer 14:42:45 ... if you have anything in the spec in many places things will get out of sync 14:42:57 ... optimize precision 14:43:11 ... non-normative sections should become more common 14:43:49 ... they describe the API geared toward web developers, clearly separated from normative content 14:44:11 ... WebIDL is the interface between your spec and JS 14:44:33 ... in normative sections you should never talk about JS types, mention ECMAScript in 99.9999% cases 14:44:54 ... it is really hard to get right, when you get it wrong security bugs creep in easily 14:45:07 ... use basic Infra spec defined types 14:45:32 ... Infra spec is not perfect, you go beyond it, it is evolving e.g. internal slots evolving 14:45:42 ... if it is not in Infra, check its GH open issues 14:45:51 ... last fallback, open a new Infra issue 14:46:20 ... Bikeshed is a compiler for your spec 14:46:28 ... the less you write, the less likely you have a bug 14:46:51 ... use as less words as possible for better spec 14:47:04 ... use Infra types, Bikeshed will catch link errors 14:47:11 ... use scope as in programming languages 14:47:30 ... algorithm blocks, Bikeshed can do error checking if referencing a variable outside scope 14:47:37 ... keep your builds clean 14:47:54 ... if you use plain HTML no validation by Bikeshed 14:48:14 ... Bikeshed is not bug free, e.g. I found 3 bugs when working on latest WebNN conventions update 14:49:01 ... generated Bikeshed indexes, terms defined by it, help you find if you have things defined you don't want to 14:49:28 ... also look at the normative references list, are they really the dependencies your spec should depend on 14:49:50 ... closing thoughs, IDL index that gets generated can be used to automatically generate WPT API shape tests 14:50:21 ... speaking of WPT, every normative change should come with a WPT test 14:52:11 nickname on github: @InexorableTash 14:52:11 https://github.com/InexorableTash -> @InexorableTash 14:52:39 very helpful info, thanks much Joshua! 14:52:40 q? 14:52:42 q+ 14:53:01 ack chai 14:53:10 ack ningxin_hu 14:53:21 ningxin_hu: thanks for the introduction! 14:53:59 ... question, in WebNN we use algorithms that interact with native APIs, we refer to that as platform object, is there guidance for this type of interaction? 14:54:29 Joshua_Bell: I want to work with you on that 14:56:12 ... when referencing platform-specific behaviour, it would be optimal if the spec could be implemented in JS (even if slow) 14:56:55 ... some algorithms are linking to paper that are behind paywalls, these should have public references 14:57:20 ... referring to a platform object OK, but not talk about its behaviour, I'll get back to you 14:58:22 https://github.com/webmachinelearning/meetings/blob/main/telcons/2023-09-07-wg-agenda.md 14:58:36 Topic: New features 14:58:42 Subtopic: Allow to hint number of threads for CPU MLContext 14:58:45 anssik: issue #436 14:58:46 https://github.com/webmachinelearning/webnn/issues/436 -> Issue 436 Allow to hint number of threads for CPU MLContext (by huningxin) 14:58:50 ... use case: ML frameworks commonly parallelize operator computation when inferring a model 14:59:50 ningxin_hu: the number of threads (degree of parallelism) configuration depend on usage scenarios: e.g. single-threaded better for small models due to task scheduling overhead 15:00:23 ... proposed to allow the framework to set the number of threads 15:00:32 ... ML frameworks allow control the number of threads, examples ONNXRuntime, TF Lite 15:02:00 anssik: Model Loader API experimented with MLContextOptions.numThreads 15:02:00 q? 15:02:13 q+ 15:02:14 ack RafaelCintron 15:02:52 RafaelCintron: most people have multiple tabs open and threads doing things, should this be considered a hint ? 15:03:13 ... we should go easy on user's machines 15:03:40 ningxin_hu: this would be considered a hint 15:04:10 RafaelCintron: are there scenarios where the browser chooses number of threads poorly? 15:05:07 ningxin_hu: for small model context switching overhead is significant compared to compute cost 15:05:24 ... in this case the hint would be to use only a single core for such a model 15:05:42 q? 15:06:11 q? 15:06:22 q? 15:06:47 Topic: AOB 15:06:52 anssik: W3C TPAC 2023 next week, no official WebML meeting there but I will talk to folks there and bring any feedback to the WG 15:06:59 ... expect our next agenda to arrive a bit later due to TPAC, will meet at the usual time 21 September regardless 15:07:04 RRSAgent, draft minutes 15:07:06 I have made the request to generate https://www.w3.org/2023/09/07-webmachinelearning-minutes.html anssik 17:27:32 Zakim has left #webmachinelearning