WebML CG Teleconference – 9 Mar 2022

Meeting minutes

<Jonathan_Bingham> anssik: does the agenda look good?

Model Loader API

... [silence means yes ;-)]

Versioning

minutes https://www.w3.org/2022/01/12-webmachinelearning-minutes.html

anssi: how to version the API for supported ops and formats was an open question from last time

honglin: we may need to support different versions of the backend
… currently we're thinking of the version of TF Lite for the prototype
… we're concerned about, in the future, if there are powerful new ops, but it has to be disabled, the version may not be enough
… it may not be possible to fully support the backend, but it will need to be standardized for the clients

anssi: can you open a new github issue for this for async feedback?
… not all of the stakeholders are here today
… any rough sketch of a proposal?
… it's briefly mentioned in the Explainer

Jonathan_Bingham: I think this topic is related to another agenda item, question on model formats
… in this prototype implementation TFLite flatbuf is the format and TF Runner an impl detail, the API abstracts over this backend
… perhaps the most natural way to do version is in terms of model format
… then question on op set a particular model supports, is that enough?

anssi: agree, let's connect the topics in the github issue

Streaming inputs

honglin: this is also beyond the current state. for some APIs like translation or speech to text, the output might not be ready immediately
… for speech recognition, you keep sending input. when the model thinks it's a complete sentence, it will give outputs
… there may need to be a callback to the model, rather than the current interface
… for now, we're not considering it. an API extension may be required

jiewei: how is the speech recognition model implemented under the hood?
… maybe the event should be handled by javascript rather than the model loader

honglin: that might be inefficient. for speech recognition in ChromeOS, the result is sent from the model spontaneously. it's not a TF Lite model in this example. but in the future TF Lite could support it.
… maybe provide 2 functions: data feeding, and result accepting
… the model will call it actively, by itself

sami: it will be good to have concrete models to look at. it's not a blocker currently

(or was that Jiewei again?)

anssi: let's gather feedback in github again, outside of this call

Model format

anssi: my perspective: thanks to Honglin and others' work, we're at the prototyping phase
… my expectation is the implementation experience will help inform the standard model format discussion

"The Model Loader API needs a standard format supported across browsers and devices for broad interoperability."

anssi: adoption of the model loader spec as a working group deliverable, or official standard, is conditional to agreeing on a standard format

WebML WG Charter: Tentative Deliverables

anssi: TF Lite format is the choice for experimentation. that's great. we have to start somewhere.
… let's discuss the next steps to move toward a standard format.
… what is the canonical reference for the TF Lite format used in the prototype?
… is there a spec, or is the implementation the de facto description?

Jonathan_Bingham: can provide link to TFLite flat buffer format as a reference
… that can probably contribute toward a standard
… I raised my hand for queue because I think Raphael from Msft would be a great person to summarize position from their side
… the developers needs a single format to work against and be confident it works across all devices and browser, developer ergonomics issue
… we at Google think the same
… then how we move forward from this prototype to something that could work for the wider community, a good topicc, would love for Ningxin and Msft folks to chime in on this topi
… proposal, let's look at Ningxin's tests for WebNN, he has looked at a few formats and tried to get them run on top of WebNN
… can we find a common intersection, starting point would be to explore what exists now
… I'm not suggesting Model Loader understands all those format, but that we can translate those formats into a format that Model Loader understands

anssi: that's a pragmatic approach. i agree it's about developer ergonomics.
… licensing and other concerns we can figure out.
… the purpose of standards is to have interoperability.
… probably we'll never get to a world where there is one format only. we'll need tools to work between formats.
… as with the previous topics, let's document in a github issue and get input from Ningxin and others.
… we can follow up with Rafael and Ningxin

jonathan: i'll file the github issue

anssi: there's mention in the Explainer, but it will be easier to get discussion in an Issue

<dom> TFLite FlatBuffers format

Support for non-IETF 754 floating point types

Open question: support for non-IETF 754 float point types #23

jiewei: neither webnn nor model loader explains how to support it in their API
… what's the plan?

WebNN API: Support for non-IETF 754 float point types #252

Web & ML workshop discussion: Support for Float16 in JS & WASM environments

anssi: there was an earlier question for bfloat 16, which is mostly used for training accelerators
… it's non standard currently and some concerns were raised
… is this type also for training? does it improve inference performance too?

jiewei: that's an experiment we can do
… i'm pretty sure it's not just for training, though it does improve training speed
… it applies to inference, with dedicated accelerators. the speedup would be kind of the same.
… it's worth experimenting whether the speedup is user-perceptible

anssi: how big of an effort would it be to generate and share this data?

jiewei: we might be able to do experiments with nvidia graphics cards

honglin: float16 can be easily supported, since the model loader accepts binary buffers
… we want to have an idea of how webnn would support it, and then we can make it consistent with model loader

anssi: jiewei pointed out that the discussion is different depending on the API
… for web nn, the discussion is how to use the types when building the graph
… for model loader, it's about how to use the types during IO

honglin: in model loader, execution is internal. correct.

jiewei: if we want to interoperate between model loader and webnn, how do we pass the types between the two?
… do we convert to a common format? or use a standard struct?

anssi: you proposed earlier that this could happen transparently, and auto-convert based on the accelerator. not sure if you'd get the performance benefit

jiewei: this applies to webnn; model loader can internally auto-convert
… for webnn, the non-standard format would be for just part of the graph

anssi: you had another proposal about the API specifying acceptable quantization levels

jiewei: i can give an example of the improvement. if we quantize images with fp16, the power consumption can be reduced by 50% with performance the same
… none of the APIs provide a way for developers to specify that
… for some accelerators, it may provide a speedup. depends on the chips we're executing on.

anssi: what's the state of the prototype with respect to these types?

jiewei: it doesn't provide this option

anssi: we can validate the expected performance improvement, and that would motivate people on webnn to find a solution.
… you can use the existing issues, and share data in the issue
… i can. nudge the issue in the Working Group to get reactions

Jonathan_Bingham: ChromeOS run on lower end devices where low power consumption is important, we probably don't have specific apps that are blocked on these types

anssik: would the extra types pose a privacy risk?
… eg, distinguish between Apple's neural engine vs nvidia gpu
… the first priority is to share data that the types would improve performance. the web gpu group is discussing similar things.

WebNN API status update

honglin: what's the current state of webnn? our sydney team can't join the working group meeting

anssik: i'll share links.

Intent to Prototype: Web Neural Network API (WebNN)

WebNN implementation in Chromium

anssik: the group has been focused on candidate recommendation spec. it's basically the feature complete state.

the target is to achieve it this year.
… other requirements to get to cr: we need a cross-browser test suite with web platform test (wpt), which chromium runs in continuous integration
… the test suite confirms that the implementation conforms to the spec.
… that's a major deliverable
… the group has also started work on a pure javascript implementation without 3rd party dependencies
… another project is webnn native
… if you're familiar with webgpu implementations, it's like the webnn version of dawn
… there's parallel work on scoping, polishing the implementation, wider review

WebNN API spec Candidate Recommendation readiness tracker

issues with a "cr" label

Add WebNN baseline implementation for first-wave ops #1

WebNN-native

honglin: because we're going to submit some CLs in a couple weeks, and we share some data structs with webnn, we may need to collaborate in the directory structure in blink
… that's why i was asking about the status of webnn, because we want to coordinate
… if they're going to submit later, we may just go ahead first

anssik: the aim is to have something behind a flag this year
… that means it can still change

Jonathan_Bingham: are you going to start with a developer trial or go straight to an origin trial?

anssik: developer trial comes before the origin trial in the chromium launch process
… we'll start with that, with a feature flag
… the origin trial is for real production use with less tech-savvy developers
… origin trials are not supposed to enable a feature for all users; if too many are using it, the trial will be disabled
… developer trial is the goal for this eyar

Jonathan_Bingham: for model loader, dev trial is also the goal, starting very limited, with Chrome OS support only
… for a wider dev trial, we'll want to get help from Chrome or TensorFlow
… likely it will be late this year at the earliest

honglin: the chrome OS team has worked on the handwriting API before and has experience

anssik: our first origin trial for new sensor APIs was one of the first by non-Google people
… this isn't our first attempt

anssik: huge thanks Jonathan for scribing!

<dom> s|<honglin> https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs||

– DRAFT –
WebML CG Teleconference – 9 Mar 2022

09 March 2022

Attendees