Meeting minutes
<Jonathan_Bingham> anssik: does the agenda look good?
Model Loader API
... [silence means yes ;-)]
Versioning
minutes https://www.w3.org/2022/01/12-webmachinelearning-minutes.html
anssi: how to version the API for supported ops and formats was an open question from last time
honglin: we may need to support different versions of the backend
… currently we're thinking of the version of TF Lite for the prototype
… we're concerned about, in the future, if there are powerful new ops, but it has to be disabled, the version may not be enough
… it may not be possible to fully support the backend, but it will need to be standardized for the clients
anssi: can you open a new github issue for this for async feedback?
… not all of the stakeholders are here today
… any rough sketch of a proposal?
… it's briefly mentioned in the Explainer
Jonathan_Bingham: I think this topic is related to another agenda item, question on model formats
… in this prototype implementation TFLite flatbuf is the format and TF Runner an impl detail, the API abstracts over this backend
… perhaps the most natural way to do version is in terms of model format
… then question on op set a particular model supports, is that enough?
anssi: agree, let's connect the topics in the github issue
Streaming inputs
honglin: this is also beyond the current state. for some APIs like translation or speech to text, the output might not be ready immediately
… for speech recognition, you keep sending input. when the model thinks it's a complete sentence, it will give outputs
… there may need to be a callback to the model, rather than the current interface
… for now, we're not considering it. an API extension may be required
jiewei: how is the speech recognition model implemented under the hood?
… maybe the event should be handled by javascript rather than the model loader
honglin: that might be inefficient. for speech recognition in ChromeOS, the result is sent from the model spontaneously. it's not a TF Lite model in this example. but in the future TF Lite could support it.
… maybe provide 2 functions: data feeding, and result accepting
… the model will call it actively, by itself
sami: it will be good to have concrete models to look at. it's not a blocker currently
(or was that Jiewei again?)
anssi: let's gather feedback in github again, outside of this call
Model format
anssi: my perspective: thanks to Honglin and others' work, we're at the prototyping phase
… my expectation is the implementation experience will help inform the standard model format discussion
"The Model Loader API needs a standard format supported across browsers and devices for broad interoperability."
anssi: adoption of the model loader spec as a working group deliverable, or official standard, is conditional to agreeing on a standard format
WebML WG Charter: Tentative Deliverables
anssi: TF Lite format is the choice for experimentation. that's great. we have to start somewhere.
… let's discuss the next steps to move toward a standard format.
… what is the canonical reference for the TF Lite format used in the prototype?
… is there a spec, or is the implementation the de facto description?
Jonathan_Bingham: can provide link to TFLite flat buffer format as a reference
… that can probably contribute toward a standard
… I raised my hand for queue because I think Raphael from Msft would be a great person to summarize position from their side
… the developers needs a single format to work against and be confident it works across all devices and browser, developer ergonomics issue
… we at Google think the same
… then how we move forward from this prototype to something that could work for the wider community, a good topicc, would love for Ningxin and Msft folks to chime in on this topi
… proposal, let's look at Ningxin's tests for WebNN, he has looked at a few formats and tried to get them run on top of WebNN
… can we find a common intersection, starting point would be to explore what exists now
… I'm not suggesting Model Loader understands all those format, but that we can translate those formats into a format that Model Loader understands
anssi: that's a pragmatic approach. i agree it's about developer ergonomics.
… licensing and other concerns we can figure out.
… the purpose of standards is to have interoperability.
… probably we'll never get to a world where there is one format only. we'll need tools to work between formats.
… as with the previous topics, let's document in a github issue and get input from Ningxin and others.
… we can follow up with Rafael and Ningxin
jonathan: i'll file the github issue
anssi: there's mention in the Explainer, but it will be easier to get discussion in an Issue
<dom> TFLite FlatBuffers format
Support for non-IETF 754 floating point types
Open question: support for non-IETF 754 float point types #23
jiewei: neither webnn nor model loader explains how to support it in their API
… what's the plan?
WebNN API: Support for non-IETF 754 float point types #252
Web & ML workshop discussion: Support for Float16 in JS & WASM environments
anssi: there was an earlier question for bfloat 16, which is mostly used for training accelerators
… it's non standard currently and some concerns were raised
… is this type also for training? does it improve inference performance too?
jiewei: that's an experiment we can do
… i'm pretty sure it's not just for training, though it does improve training speed
… it applies to inference, with dedicated accelerators. the speedup would be kind of the same.
… it's worth experimenting whether the speedup is user-perceptible
anssi: how big of an effort would it be to generate and share this data?
jiewei: we might be able to do experiments with nvidia graphics cards
honglin: float16 can be easily supported, since the model loader accepts binary buffers
… we want to have an idea of how webnn would support it, and then we can make it consistent with model loader
anssi: jiewei pointed out that the discussion is different depending on the API
… for web nn, the discussion is how to use the types when building the graph
… for model loader, it's about how to use the types during IO
honglin: in model loader, execution is internal. correct.
jiewei: if we want to interoperate between model loader and webnn, how do we pass the types between the two?
… do we convert to a common format? or use a standard struct?
anssi: you proposed earlier that this could happen transparently, and auto-convert based on the accelerator. not sure if you'd get the performance benefit
jiewei: this applies to webnn; model loader can internally auto-convert
… for webnn, the non-standard format would be for just part of the graph
anssi: you had another proposal about the API specifying acceptable quantization levels
jiewei: i can give an example of the improvement. if we quantize images with fp16, the power consumption can be reduced by 50% with performance the same
… none of the APIs provide a way for developers to specify that
… for some accelerators, it may provide a speedup. depends on the chips we're executing on.
anssi: what's the state of the prototype with respect to these types?
jiewei: it doesn't provide this option
anssi: we can validate the expected performance improvement, and that would motivate people on webnn to find a solution.
… you can use the existing issues, and share data in the issue
… i can. nudge the issue in the Working Group to get reactions
Jonathan_Bingham: ChromeOS run on lower end devices where low power consumption is important, we probably don't have specific apps that are blocked on these types
anssik: would the extra types pose a privacy risk?
… eg, distinguish between Apple's neural engine vs nvidia gpu
… the first priority is to share data that the types would improve performance. the web gpu group is discussing similar things.
WebNN API status update
honglin: what's the current state of webnn? our sydney team can't join the working group meeting
anssik: i'll share links.
Intent to Prototype: Web Neural Network API (WebNN)
WebNN implementation in Chromium
anssik: the group has been focused on candidate recommendation spec. it's basically the feature complete state.
the target is to achieve it this year.
… other requirements to get to cr: we need a cross-browser test suite with web platform test (wpt), which chromium runs in continuous integration
… the test suite confirms that the implementation conforms to the spec.
… that's a major deliverable
… the group has also started work on a pure javascript implementation without 3rd party dependencies
… another project is webnn native
… if you're familiar with webgpu implementations, it's like the webnn version of dawn
… there's parallel work on scoping, polishing the implementation, wider review
WebNN API spec Candidate Recommendation readiness tracker
Add WebNN baseline implementation for first-wave ops #1
honglin: because we're going to submit some CLs in a couple weeks, and we share some data structs with webnn, we may need to collaborate in the directory structure in blink
… that's why i was asking about the status of webnn, because we want to coordinate
… if they're going to submit later, we may just go ahead first
anssik: the aim is to have something behind a flag this year
… that means it can still change
Jonathan_Bingham: are you going to start with a developer trial or go straight to an origin trial?
anssik: developer trial comes before the origin trial in the chromium launch process
… we'll start with that, with a feature flag
… the origin trial is for real production use with less tech-savvy developers
… origin trials are not supposed to enable a feature for all users; if too many are using it, the trial will be disabled
… developer trial is the goal for this eyar
Jonathan_Bingham: for model loader, dev trial is also the goal, starting very limited, with Chrome OS support only
… for a wider dev trial, we'll want to get help from Chrome or TensorFlow
… likely it will be late this year at the earliest
honglin: the chrome OS team has worked on the handwriting API before and has experience
anssik: our first origin trial for new sensor APIs was one of the first by non-Google people
… this isn't our first attempt
anssik: huge thanks Jonathan for scribing!
<dom> s|<honglin> https://