WebML WG Teleconference – 17 November 2022

Meeting minutes

<jonathan> special guest: Eugene Burmako, who will co-present with Stella

XLA, OpenXLA Project, StableHLO

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Nov/att-0001/XLA-Eugene_Burmako.pdf

<Zakim> anssik, you wanted to ask about StableHLO adoption by compiler projects

anssik: StableHLO adoption in XLA compiler and IREE, which compiler is the leading adopter?

Eugene: re main target for StableHLO, we provide equal support for these two compilers
… both take StableHLO inputStableHLO supports all the functionality of XLA; cannot speak for IREE personally, but we're closely working with that team on new interesting applications
… Stella could confirm we can cover ~95% of use case

anssik: do we still have an experimental compiler that interacts with StableHLO?

Eugene: I wish Stella could speak for this topic, IREE is addressing some of the most important emerging ML use cases
… consumers of StableHLO, there's also TFLite
… our aspirations go beyond support in Google initiated compiler projects

Chai: we are aware of XLA, we did some work TF codebase
… on Msft side we haven't leveraged XLA HLO yet, but are familiar with it, moving to open governance is a great thing
… in the process of defining the WebNN op set we also spent time looking at HLO so we can translate to HLO
… when we start with a set of ops it must make sense elsewhere
… looking at HLO op set is very helpful to understand how a reduced op set maps to actual models
… thanks for your work on this

Chai: one question re interop with GPU
… you mentioned XLA is used a lot inside Google, what is the model to interop with the GPU stack? E.g. WebGPU with Project Dawn, how is that supported?

Eugene: currently HLO is focused on datacenter usages
… this is the most tested path, we have aspirations to expand beyond servers but cannot speak for details there

Chai: Computer Vision running on the server usage, this will eventually touch the graphics stack, how does that work with HLO?

Eugene: within Alphabet we support multiple frameworks, TF, JAX, PyTorch etc. if involved with XLA compiler, JAX operated on it, the framework has to produce HLO to feed into compiler, JAX's mapping to HLO is fairly straightforward.
… TF has graph IR, stored, loaded, then we have TFXLA bridge that translates the graphs and compiles to XLA
… a bunch of TF ops cannot be compiler to XLA
… PyTorch uses lazy tensor tracing based mechanism to transform python programs into HLO
… within the compiler we have target independent optimizations, starting with simple things to advanced optimizations
… GPU compiler is available publicly, now in TF will be a separate project soon
… XLA GPU architecture on the high level, we also rewrite this arch to use MLIR more and more
… MLIR is very influential tech we love here at Google
… to recap, we support many frameworks with the same interface

anssik: StableHLO spec stability, timeline?

Eugene: targeting the first version of the spec EOY
… calling it v1.0 or v0.9
… feature complete EOY
… active work in progress, we have half of it specced and the rest in open issues
… not a perfect time to look at the spec right now, maybe beginning of 2023 is a good date for WebNN op set compatibility effort

StableHLO Spec draft

StableHLO Spec open issues

StableHLO Spec index of ops

Eugene: areas of major development after 1.0: finalizing sparsity, quantization including uniform quantization and beyond, extensibility
… aligning ops between WebNN and StableHLO is a great idea, happy to support there

ningxin_hu: thanks Eugene, great presentation!
… two groups looking to coordinate on the op sets makes a lot of sense
… question re op sets, you mentioned op set categories, also usage on server and mobile
… my impression is some ops are not so useful on mobile, distribution on multiple nodes, do you plan to have profiles in StableHLO e.g. for server and other classes?

Eugene: we are interested in having those profiles, haven't done a full study yet on this topic
… we have this as a 2023 goal
… I agree distribution ops have limited or no usefulness on mobile

ningxin_hu: another questions re compilers, you mentioned XLA and IREE compilers, for web usage we want to understand if these compilers support JIT?
… on-device compilation would be an interesting feature for us

Eugene: in general non-server story currently is where we have less clarity, an area of active work
… starting with server, XLA compiler is a JIT compiler predomantly
… OTOH, IREE is ahead-of-time
… a limitation is no dynamic shapes, no unknown tensor sizes (bounded dynamic dimension sizes are supported), in StableHLO we address this, if we want to do AOT compilation with dynamic tensor sizes it becomes feasible, that is what IREE does
… many practical programs are dynamically sized
… on mobile we look both AOT and JIT use cases

ningxin_hu: question re TFLite, WebNN works with TFLite Wasm version, in your slides you mentioned StableHLO is consumer as a flatbuffer schema, also TFLite delegate could leverage StableHLO
… is TFLite a consumer of StableHLO?

Eugene: what I shared on the slide is as much as I could share on behalf of the TFLite team, WIP, a bit too early to share the details

chai: Eugene you said you want to push this for mobile too
… WebNN is the op set for the web, so I think there's some synergy around mapping StableHLO on top WebNN for mobile use case
… this is an area of collaboration where we can connect the dots

anssik: we may want to add OpenXLA in the coordination for this WG's charter

Eugene: sounds great

<ningxin_hu> +1

<burmako> Thank you everyone for the invitation to present! It was great to meet you all, and I'm looking forward to collaborating.

<burmako> For the meeting minutes, is there a way to propose some edits?

<burmako> And I'll also share the slides shortly.

anssik: thank you Eugene for the presentation!

<burmako> Hi Anssi! Not sure if my previous message reached the channel (got what seems to be an error message), but I emailed you the slides and proposed edits. Thanks again!

anssik: edits to the minutes proposed by Eugene have been incorporated (see Diagnostics log below)

– DRAFT –
WebML WG Teleconference – 17 November 2022

17 November 2022

Attendees

Meeting minutes

XLA, OpenXLA Project, StableHLO

Diagnostics