RQTF meeting – 11 September 2024

Meeting minutes

XR accessibility.

*general group intros*

jasonjgw: Thanks Dave for your many contributions to web standards!

Dave: I have some slides; will share (and have/will share the link). I've been interested in extended reality and the web since the early '90s, back at CERN. Originally interested in VRML (the M started as Markup, and became Modelling).
… Hardware's got a lot more capable. There's been a lot of work on XR, but very low-level. There's not much being done at higher levels.
… Particularly with respect to accessibility.
… *shares slides*
… Many applications for XR in shopping, entertainment, industry, online meetings, and desktop replacement. You can see furniture, glasses, etc. before buying them. Games, smart glasses, online meetings, and a desktop replacement with unlimited virtual screens.
… Digital Twins: virtual embodiments of physical systems, processes, and people.

(digital twins is slide 3)

Dave: Digital Twins are associated with digital footprints, visible in XR. Semantics, metadata, affordances, can be added. Digital Twins could be used to provide repair instructions.
… (slide 4) The Immersive Web. Terms are still in flux. Slide defines AR, VR, XR, Metaverse, Omniverse and Immersive Web.
… (slide 5) Accessible Extended Reality. Enabling everyone to use applications. Based on open standards. Recent developments in lens/glasses hardware; rendering; networking; privacy-preserving accessibility, based on high-level models (a current gap in W3C work); taxonomies; scripting (enabling adaptation to device and accessibility

requirements); building on top of established standards.
… The idea is not to compete with existing standards, but to complement them.
… (slide 6) Both client and server based rendering.
… (Slide compares protocols/resources used by each.)
… (slide 7) Taxonomies of Resources.
… Knowledge graph + links can be used to model resources at different levels of abstraction (from high-level, e.g. a building, an animal, to low-level e.g textures).
… Links to lower-level resources. Resources have URNs so they can be found in named libraries.
… Accessibility based on multiple levels of abstraction, and intent-based interactions.
… (slide 9) Behaviours.
… Intents describe _what_ you're doing, rather than _how_ - i.e. they're goals. They could be implicit or explicit. Scripts could resolve the goals.
… Behaviors would be described in taxonomies. APIs for invoking them would be available.
… Intents map to high-level behaviors, which can be mapped to lower levels (e.g down to skeleton/model movements, informing rendering).
… (slide 10) Animating avatars.
… (Slide discusses how current/emerging web standards could be used in conjunction with established research to ensure the user's avatar reflects their facial expressions)
… (slide 11) Networking and Discovery
… Efficient communication between agents that are locally visible to each other (e.g. in the same VR room).
… (slide discuses current protocols such as WebRTC, and techniques, such as environmental level of detail, that could be used - appropriate handling of privacy is raised)
… (slide 12) Cognitive Low-Code Control.
… (Slide discusses the use of low-code solutions to specify real-time behaviour)
… Cognitive AI CG has worked in this area.
… Also exploring this in EU projects.
… There's a mature open-source JS library. Extensions to distributed agents.
… Cognition - Sequential Rules Engine, inspired by John Anderson's ACT-R. This provides an architecture in which rules can be described in terms of intents - link from slide to example/prototype.
… (slide 13) Microsoft's metaverse
… There's a VR meeting environment called Microsoft Mesh (linked from slide). Has many features (described in slide). Avatars are waist-up. Proprietary.
… (slide 14) Meta's metaverse
… Similar to Microsoft Mesh. Also waist-up avatars. Also proprietary. Higher quality rendering and environments needed for mainstream adoption.
… (slide 15) Are metaverses ready for everyday use?
… Challenges such as rendering fatigue, and improved rendering needed.
… Various accessibility challenges (including subtitles, TTS, speech understanding) discusssed.
… Full-body avatars needed.
… It'd be great to encourage open standards and open source, rather than proprietary apps.
… (slide 16) Developmental Challenges
… W3C is developing a bunch of standards (WebGPU, WebXR, WebRTC, WASM) but they're only building blocks, and there are key building blocks needed.
… E.g. a lightweight graph format for taxonomies and applications (Chunks may be good? Can layer on RDF).
… Slide discusses requirements around what needs to be possible technically, and in real-time.
… Slide discusses the importance of fostering a community of users, and enabling a viable ecosystem, with sound business models, and strong competition. Different funding models touched upon.
… The importance of being able to build it (rather than relying on what we're given).
… (slide 17) Facial Mesh Fitting
… (Slide gives an example of fitting a facial mesh, that can recognize expressions, in the browser, at >60fps - it can track the mouth, but not eyebrows)
… The code is very long (90k SLOC); need better/simpler libraries.
… (slide 18) Potential improvements
… (Slide discusses several, including: simpler wireframe meshes to reduce computational cost, and allowing more people in the scene; extracting textures from video; eyebrows handled via texture, rather than mesh, changes; hair and clothes being matched to latent models; and more.)
… (slide 19) Research Ideas.
… (slide includes: bootstrapping a simpler facial landmark model; improvements to generating training data based on existing datasets; upcoming standards such as WebNN and WebGPU; privacy-friendly machine learning based on federated approaches in the browser; and more)
… (slide 20) 3D Face Models
… (slide discusses using photogrammetry and privacy-friendly federated leaning to avoid uploading the images being used to seed the modelling. Slide proposes 5 stages to solving this problem)
… (slide 21) Next Steps
… (slide discusses setting up a new CG at W3C that liaises with existing other groups, and open source projects. Immersive Web CG (WebXR) is very low-level; XR Access; APA WG.)
… (slide references a short, but broad, range of articles on immersive web, from 1994 up to 2024)
… People have learned that the metaverse isn't ready for prime time yet, but there's lots of work being done.
… The omniverse awaits. Thanks for your attention.
… GenAI has great potential for synthesising environments, but there's quite some way to go.
… To kickstart the accessible immersive web, we need volunteers, particularly to work on the immersive and more accessible avatars (more advanced than others available).

jasonjgw: Thank you for the interesting talk on how existing and emerging technologies could be combined to create more accessible XR environments.

jasonjgw: I'd like to note as TF co-facilitator that your overview intersects with a number of activities we're engaged in, and have been engaged in - some of which you mentioned earlier by citing XAUR. Also RAUR. Also we're working on AUR in relation to AI (very early at the moment).
… There's also documentation we've been working on relating to collaborative and meeting environments. As well as work on accessibility APIs, taxonomies, accessibility object model (AOM), and recently with people in the Metaverse Standards Forum/XR Access and we are planning ongoing collaboration with MSF (Michael Cooper, formally of W3C, is

involved and we're working with him and team). The MSF will then take the work to the relevant standards bodies.

matatk: Valuable overview of the components

matatk: What are the biggest areas we should pay attention to?

Dave: Rethink a11y, not to be too low level, e.g. alt text won't work

dave, important to map high level intents

matatk: Reminds me of research on abstract UIs - affords lots of rendering flexibility.

janina: Lots of work will be needed on avatars, making them comprehensive, but also privacy-preserving (and work needed relating to decency). Like the idea of the expressions being realistically mapped through. Would we get further quicker if we started with something other than a representation of a person being a copy of them?

Dave: We can simplify application development. E.g. a building: we can model the building, and support the use of scripting to provide extra facets/behaviours.

Dave: Re avatars, one of the thing that's currently lacking is 'face-to-face' style interactions are very clumsy/artificial. We want to be able to convey the subtleties involved in water-cooler conversations. How people react, move their hands, as well as talking.

Dave: trying to get closer to face to face interactions.

janina: Thank you, hope to meet at TPAC

– DRAFT –
RQTF meeting

11 September 2024

Attendees

Meeting minutes

XR accessibility.

Diagnostics