1. Introduction
Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications are now broadly available to consumers, offering an immersive computing platform with both new opportunities and challenges. The ability to interact directly with immersive hardware is critical to ensuring that the web is well equipped to operate as a first-class citizen in this environment.
Immersive computing introduces strict requirements for high-precision, low-latency communication in order to deliver an acceptable experience. It also brings unique security concerns for a platform like the web. The WebXR Device API provides the interfaces necessary to enable developers to build compelling, comfortable, and safe immersive applications on the web across a wide variety of hardware formfactors.
Other web interfaces, such as the RelativeOrientationSensor
and AbsoluteOrientationSensor
, can be repurposed to surface input from some devices to polyfill the WebXR Device API in limited situations. These interfaces cannot support multiple features of high-end immersive experiences, however, such as 6DoF tracking, presentation to headset peripherals, or tracked input devices.
1.1. Terminology
This document uses the acronym XR throughout to refer to the spectrum of hardware, applications, and techniques used for Virtual Reality, Augmented Reality, and other related technologies. Examples include, but are not limited to:
-
Head mounted displays, whether they are opaque, transparent, or utilize video passthrough
-
Mobile devices with positional tracking
-
Fixed displays with head tracking capabilities
The important commonality between them being that they offer some degree of spatial tracking with which to simulate a view of virtual content.
Terms like "XR Device", "XR Application", etc. are generally understood to apply to any of the above. Portions of this document that only apply to a subset of these devices will indicate so as appropriate.
The terms 3DoF and 6DoF are used throughout this document to describe the tracking capabilities of XR devices.
-
A 3DoF device, short for "Three Degrees of Freedom", is one that can only track rotational movement. This is common in devices which rely exclusively on accelerometer and gyroscope readings to provide tracking. 3DoF devices do not respond translational movements from the user, though they may employ algorithms to estimate translational changes based on modeling of the neck or arms.
-
A 6DoF device, short for "Six Degrees of Freedom", is one that can track both rotation and translation, enabling for precise 1:1 tracking in space. This typically requires some level of understanding of the user’s environment. That environmental understanding may be achieved via inside-out tracking, where sensors on the tracked device itself (such as cameras or depth sensors) are used to determine the device’s position, or outside-in tracking, where external devices placed in the user’s environment (like a camera or light emitting device) provides a stable point of reference against which the XR device can determine it’s position.
1.2. Application flow
Most applications using the WebXR Device API will follow a similar usage pattern:
-
Query
navigator.xr.supportsSessionMode()
to determine if the desired type of XR content is supported by the hardware and UA. -
If so, advertise the XR content to the user.
-
Wait for the user to trigger a user activation event indicating they want to begin viewing XR content.
-
Request an
XRSession
within the user activation event withnavigator.xr.requestSession()
. -
If the
XRSession
request succeeds, use it to run a frame loop to respond to XR input and produce images to display on the XR device in response. -
Continue running the frame loop until the UA ends the session or the user indicates they want to exit the XR content.
2. Initialization
2.1. XR
The xr
object is the entry point to the API, used to query for XR features available to the user agent and initiate communication with XR hardware via the creation of XRSession
s.
[SecureContext ,Exposed =Window ]interface :
XR EventTarget { // MethodsPromise <void >supportsSessionMode (XRSessionMode );
mode Promise <XRSession >requestSession (optional XRSessionCreationOptions ); // Events
parameters attribute EventHandler ondevicechange ; }; [SecureContext ]partial interface Navigator { [SameObject ]readonly attribute XR xr ; };
The XR
object has an list of XR devices, which MUST be initially empty, and an XR device which MUST be initially null
and represents the active device from the list of XR devices that API calls will interact with.
The user agent MUST be able to enumerate XR devices attached to the system, at which time each available device is placed in the list of XR devices. Subsequent algorithms requesting enumeration MAY reuse the cached list of XR devices. Enumerating the devices should not initialize device tracking. After the first enumeration the user agent SHOULD begin monitoring device connection and disconnection, adding connected devices to the list of XR devices and removing disconnected devices.
Each time the list of XR devices changes the user agent should select an XR device by running the following steps:
-
Let oldDevice be the current XR device.
-
If the list of XR devices is empty, set the XR device to
null
. -
If the list of XR devices contains one device set the XR device to that device.
-
If there are any active
XRSession
s and oldDevice is in the list of XR devices, set the XR device to oldDevice. -
Else set the XR device to a device of the user agent’s choosing.
-
If this is the first time devices have been enumerated or oldDevice equals XR device, abort these steps.
-
Set the XR compatible boolean of all
WebGLRenderingContextBase
instances tofalse
. -
Queue a task that fires a simple event named
devicechange
on theXR
object.
NOTE: The user agent is allowed to use any criteria it wishes to select an XR device when the list of XR devices contains multiple devices. For example, the user agent may always select the first item in the list, or provide settings UI that allows users to manage device priority. Ideally the algorithm used to select the default device is stable and will result in the same device being selected across multiple browsing sessions.
Any time an XR device is needed by an algorithm it can ensure an XR device is selected by running the following steps:
-
If XR device is not
null
, abort these steps.
The ondevicechange
attribute is an Event handler IDL attribute for the devicechange
event type.
Each XR device has a list of supported modes, which MUST contain all of the XRSessionMode
types that the XR device can support, and MUST contain inline
.
When the supportsSessionMode(mode)
method is invoked, it MUST return a new Promise promise and run the following steps in parallel:
-
If XR device is
null
, reject promise with aNotSupportedError
and abort these steps. -
If mode is in not in XR device's list of supported modes, reject promise with a
NotSupportedError
and abort these steps. -
Else resolve promise.
Calling supportsSessionMode()
MUST NOT trigger device-selection UI as this would cause many sites to display XR-specific dialogs early in the document lifecycle without user activation.
immersive-vr
sessions are supported.
navigator. xr. supportsSessionMode( 'immersive-vr' ). then(() => { // 'immersive-vr' sessions are supported. // Page should advertise support to the user. }
The XR
object has a pending immersive session boolean, which MUST be initially false
, an active immersive session, which MUST be initially null
, and a list of inline sessions, which MUST be initially empty.
When the requestSession(options)
method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Let mode be the
mode
attribute of the options argument. -
Let immersive be a boolean set to
true
if mode isimmersive-vr
orimmersive-ar
andfalse
otherwise. -
If immersive is
true
:-
If pending immersive session is
true
or active immersive session is notnull
, reject promise with anInvalidStateError
and abort these steps. -
Else set pending immersive session to be
true
.
-
-
Else if mode is in not in XR device's list of supported modes, reject promise with a
NotSupportedError
. -
Else If immersive is
true
and the algorithm is not triggered by user activation, reject promise with aSecurityError
and abort these steps. -
If promise was rejected and immersive is
true
, set pending immersive session to befalse
. -
If promise was rejected, abort these steps.
-
Let session be a new
XRSession
. -
Initialize the session session with the session description given by options.
-
If immersive is
true
set the active immersive session to session and set pending immersive session tofalse
. -
Else append session to the list of inline sessions.
-
Resolve promise with session.
immersive-vr
XRSession
.
let xrSession; navigator. xr. requestSession({ mode: "immersive-vr" }). then(( session) => { xrSession= session; });
3. Session
3.1. XRSession
Any interaction with XR hardware is done via an XRSession
object, which can only be retrieved by calling requestSession()
on the XR
object. Once a session has been successfully acquired it can be used to poll the device pose, query information about the user’s environment and, present imagery to the user.
The user agent, when possible, SHOULD NOT initialize device tracking or rendering capabilities until an XRSession
has been acquired. This is to prevent unwanted side effects of engaging the XR systems when they’re not actively being used, such as increased battery usage or related utility applications from appearing when first navigating to a page that only wants to test for the presence of XR hardware in order to advertise XR features. Not all XR platforms offer ways to detect the hardware’s presence without initializing tracking, however, so this is only a strong recommendation.
enum {
XREnvironmentBlendMode "opaque" ,"additive" ,"alpha-blend" , }; [SecureContext ,Exposed =Window ]interface :
XRSession EventTarget { // Attributesreadonly attribute XRSessionMode ;
mode readonly attribute XRPresentationContext ?;
outputContext readonly attribute XREnvironmentBlendMode environmentBlendMode ;readonly attribute XRRenderState renderState ; // Methodsvoid updateRenderState (optional XRRenderStateInit );
state Promise <XRReferenceSpace >requestReferenceSpace (XRReferenceSpaceOptions );
options FrozenArray <XRInputSource >getInputSources ();long requestAnimationFrame (XRFrameRequestCallback );
callback void cancelAnimationFrame (long );
handle Promise <void >end (); // Eventsattribute EventHandler onblur ;attribute EventHandler onfocus ;attribute EventHandler onend ;attribute EventHandler onselect ;attribute EventHandler oninputsourceschange ;attribute EventHandler onselectstart ;attribute EventHandler onselectend ; };
When an XRSession
is created, the user agent MUST initialize the session by running the following steps:
-
Let session be the newly created
XRSession
object. -
Let options be the
XRSessionCreationOptions
passed torequestSession()
. -
Initialize session’s
outputContext
to optionsoutputContext
value. -
If no other features of the user agent have done so already, perform the necessary platform-specific steps to initialize the device’s tracking and rendering capabilities.
A number of different circumstances may shut down the session, which is permanent and irreversible. Once a session has been shut down the only way to access the XR device's tracking or rendering capabilities again is to request a new session. Each XRSession
has an ended boolean, initially set to false
, that indicates if it has been shut down.
When an XRSession
is shut down the following steps are run:
-
Let session be the target
XRSession
object. -
Set session’s ended value to
true
. -
If the active immersive session is equal to session, set the active immersive session to
null
. -
Remove session from the list of inline sessions.
-
If no other features of the user agent are actively using them, perform the necessary platform-specific steps to shut down the device’s tracking and rendering capabilities.
The end()
method provides a way to manually shut down a session. When invoked, it MUST return a new Promise promise and run the following steps in parallel:
Each XRSession
has an active render state which is a new XRRenderState
, a list of pending render states, which is initially empty.
The renderState
attribute returns the XRSession
's active render state.
When the updateRenderState(newState)
method is invoked, the user agent MUST run the following steps:
-
Append newState to the target
XRSession
's list of pending render states.
When requested, the XRSession
MUST apply pending render states by running the following steps:
-
Let session be the target
XRSession
. -
Let activeState be session’s active render state.
-
Let pendingStates be session’s list of pending render states.
-
Set session’s list of pending render states to the empty list.
-
For each newState in pendingStates:
When the requestReferenceSpace(options)
method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Create a reference space, referenceSpace, as described by options.
-
If referenceSpace is
null
, reject promise with aNotSupportedError
and abort these steps. -
Resolve promise with referenceSpace.
When the getInputSources()
method is invoked, the user agent MUST run the following steps:
-
Return the current list of active input sources.
Each XRSession
has a environment blending mode value, which is a enum which MUST be set to whichever of the following values best matches the behavior of imagery rendered by the session in relation to the user’s surrounding environment.
-
A blend mode of
opaque
indicates that the user’s surrounding environment is not visible at all. Alpha values in thebaseLayer
will be ignored, with the compositor treating all alpha values as 1.0. -
A blend mode of
additive
indicates that the user’s surrounding environment is visible and thebaseLayer
will be shown additively against it. Alpha values in thebaseLayer
will be ignored, with the compositor treating all alpha values as 1.0. When this blend mode is in use black pixels will appear fully transparent, and there is no way to make a pixel appear fully opaque. -
A blend mode of
alpha-blend
indicates that the user’s surrounding environment is visible and thebaseLayer
will be blended with it according to the alpha values of each pixel. Pixels with an alpha value of 1.0 will be fully opaque and pixels with an alpha value of 0.0 will be fully transparent.
The environmentBlendMode
attribute returns the XRSession
's environment blending mode
NOTE: Most Virtual Reality devices exhibit opaque
blending behavior. Augmented Reality devices that use transparent optical elements frequently exhibit additive
blending behavior, and Augmented Reality devices that use passthrough cameras frequently exhibit alpha-blend
blending behavior.
The onblur
attribute is an Event handler IDL attribute for the blur
event type.
The onfocus
attribute is an Event handler IDL attribute for the focus
event type.
The onend
attribute is an Event handler IDL attribute for the end
event type.
The oninputsourceschange
attribute is an Event handler IDL attribute for the inputsourceschange
event type.
The onselectstart
attribute is an Event handler IDL attribute for the selectstart
event type.
The onselectend
attribute is an Event handler IDL attribute for the selectend
event type.
The onselect
attribute is an Event handler IDL attribute for the select
event type.
Document what happens when we end the session.
Document effects when we blur the session.
Document how to poll the device pose.
Document how the list of active input sources is maintained.
3.2. XRSessionMode
The XRSessionMode
enum defines the modes that an XRSession
can operate in.
enum {
XRSessionMode "inline" ,"immersive-vr" ,"immersive-ar" };
-
A session mode of
inline
indicates that the session’s output will be shown as an element in the HTML document.inline
session content MAY be displayed in mono or stereo and MAY allow for viewer tracking. User agents MUST allowinline
sessions to be created for any XR device. -
A session mode of
immersive-vr
indicates that the session’s output will be given exclusive access to the XR device display and that content is not intended to be integrated with the user’s environment. TheenvironmentBlendMode
forimmersive-vr
sessions is expected to beopaque
when possible, but MAY beadditive
if the hardware requires it. -
A session mode of
immersive-ar
indicates that the session’s output will be given exclusive access to the XR device display and that content is intended to be integrated with the user’s environment. TheenvironmentBlendMode
MUST NOT beopaque
forimmersive-ar
sessions.
An immersive session refers to either an immersive-vr
or an immersive-ar
session. Immersive sessions MUST provide some level of viewer tracking, and content MUST be shown at the proper scale relative to the user and/or the surrounding environment. Additionally, Immersive sessions MUST be given exclusive access to the XR device, meaning that while the immersive session is not blurred the HTML document is not shown on the XR device's display, nor is content from other applications shown on the XR device's display.
NOTE: Examples of ways exclusive access may be presented include stereo content displayed on a virtual reality or augmented reality headset, or augmented reality content displayed fullscreen on a mobile device.
Document restrictions and capabilities of immersive sessions.
3.3. XRSessionCreationOptions
The XRSessionCreationOptions
dictionary provides a session description, indicating the desired properties of a session to be returned from requestSession()
.
dictionary {
XRSessionCreationOptions XRSessionMode = "inline";
mode XRPresentationContext ?=
outputContext null ; };
3.4. XRRenderState
There are multiple values that developers can configure which affect how the session’s output is composited. These values are tracked by an XRRenderState
object.
dictionary {
XRRenderStateInit double ;
depthNear double ;
depthFar XRLayer ?; }; [
baseLayer SecureContext ,Exposed =Window ]interface {
XRRenderState readonly attribute double ;
depthNear readonly attribute double ;
depthFar readonly attribute XRLayer ?; };
baseLayer
When an XRRenderState
object is created, the user agent MUST initialize the render state by running the following steps:
-
Let state be the newly created
XRRenderState
object. -
Initialize state’s
depthNear
to0.1
. -
Initialize state’s
depthFar
to1000.0
. -
Initialize state’s
baseLayer
tonull
.
3.5. Animation Frames
The primary way an XRSession
provides information about the tracking state of the XR device is via callbacks scheduled by calling requestAnimationFrame()
on the XRSession
instance.
callback =
XRFrameRequestCallback void (DOMHighResTimeStamp ,
time XRFrame );
frame
Each XRFrameRequestCallback
object has a cancelled boolean initially set to false
.
Each XRSession
has a list of animation frame callbacks, which is initially empty, and an animation frame callback identifier, which is a number initially be zero.
When the requestAnimationFrame(callback)
method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSession
object. -
Increment session’s animation frame callback identifier by one.
-
Append callback to session’s list of animation frame callbacks, associated with session’s animation frame callback identifier’s current value.
-
Return session’s animation frame callback identifier’s current value.
When the cancelAnimationFrame(handle)
method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSession
object. -
Find the entry in session’s list of animation frame callbacks that is associated with the value handle.
-
If there is such an entry, set it’s cancelled boolean to
true
and remove it from session’s list of animation frame callbacks.
When an XRSession
session receives updated viewer state from the XR device, it runs an XR animation frame with a timestamp now and an XRFrame
frame, which MUST run the following steps regardless of if the list of animation frame callbacks is empty or not:
-
Let callbacks be a list of the entries in session’s list of animation frame callback, in the order in which they were added to the list.
-
If session’s list of pending render states is not empty, apply pending render states.
-
Set session’s list of animation frame callbacks to the empty list.
-
Set frame’s active boolean to
true
. -
For each entry in callbacks, in order:
-
If the entry’s cancelled boolean is
true
, continue to the next entry. -
Invoke the Web IDL callback function, passing now and frame as the arguments
-
If an exception is thrown, report the exception.
-
-
Set frame’s active boolean to
false
.
3.6. The XR Compositor
This needs to be broken up a bit more and more clearly describe things such as the frame lifecycle.
The user agent MUST maintain an XR Compositor which handles presentation to the XR device and frame timing. The compositor MUST use an independent rendering context whose state is isolated from that of any WebGL contexts used as XRWebGLLayer
sources to prevent the page from corrupting the compositor state or reading back content from other pages. the compositor MUST also run in separate thread or processes to decouple performance of the page from the ability to present new imagery to the user at the appropriate framerate.
The XR Compositor has a list of layer images, which is initially empty.
4. Frame Loop
4.1. XRFrame
An XRFrame
represents a snapshot of the state of all of the tracked objects for an XRSession
. Applications can acquire an XRFrame
by calling requestAnimationFrame()
on an XRSession
with an XRFrameRequestCallback
. When the callback is called it will be passed an XRFrame
. Events which need to communicate tracking state, such as the select
event, will also provide a XRFrame
.
[SecureContext ,Exposed =Window ]interface {
XRFrame readonly attribute XRSession session ;XRViewerPose ?getViewerPose (optional XRReferenceSpace );
referenceSpace XRInputPose ?getInputPose (XRInputSource ,
inputSource optional XRReferenceSpace ); };
referenceSpace
Each XRFrame
has a active boolean which is initially set to false
.
The session
attribute returns the XRSession
that produced the XRFrame
.
When the getViewerPose(referenceSpace)
method is invoked, the user agent MUST run the following steps:
-
If the
XRFrame
's active boolean isfalse
, throw aInvalidStateError
and abort these steps. -
If referenceSpace’s session does not equal session, return
null
and abort these steps. -
If the viewer's pose cannot be determined relative to referenceSpace, return
null
-
Return a new
XRViewerPose
describing the viewer's pose relative to the origin of referenceSpace at the timestamp of theXRFrame
.
When the getInputPose(inputSource, referenceSpace)
method is invoked, the user agent MUST run the following steps:
-
If the
XRFrame
's active boolean isfalse
, throw aInvalidStateError
and abort these steps. -
If referenceSpace’s session does not equal session, return
null
and abort these steps. -
If inputSource’s pose cannot be determined relative to referenceSpace, return
null
-
Return a new
XRInputPose
describing inputSource’s pose relative to the origin of referenceSpace.
Describe behavior for passing null
XRReferenceSpace
s to getViewerPose()
and getInputPose()
The last two steps of the getViewerPose()
and getInputPose()
algorithms need to be expanded.
5. Spaces
5.1. XRSpace
An XRSpace
describes an entity that is tracked by the XR device's tracking systems. XRSpace
s MAY NOT have a fixed spatial relationship to one another or to any given XRReferenceSpace
. The transform between two XRSpace
can be evaluated by calling the getTransformTo()
method every XR animation frame.
[SecureContext ,Exposed =Window ]interface :
XRSpace EventTarget {XRRigidTransform ?getTransformTo (XRSpace ); };
other
Each XRSpace
has a session which is set to the XRSession
that created the {{XRSpace}.
When the getTransformTo(other)
method is invoked, the user agent MUST run the following steps:
-
Let current be the target
XRSpace
object. -
If a known transform exists from the space described by current to the space described by other, return it as an
XRRigidTransform
. -
Else return
null
5.2. XRReferenceSpace
An XRReferenceSpace
describes an XRSpace
that is generally expected to remain static for the duration of the XRSession
, with the most common exception being mid-session reconfiguration by the user. Every XRReferenceSpace
describes a coordinate system where the Y axis MUST be aligned with gravity, with +Y
being "Up". -Z
is considered "Forward", and +X
is considered "Right".
enum {
XRReferenceSpaceType "stationary" ,"bounded" ,"unbounded" };dictionary {
XRReferenceSpaceOptions required XRReferenceSpaceType ; }; [
type SecureContext ,Exposed =Window ]interface :
XRReferenceSpace XRSpace {attribute XRRigidTransform originOffset ;attribute EventHandler onreset ; };
An XRReferenceSpace
is obtained by calling requestReferenceSpace()
, which creates an instance of an interface extending XRReferenceSpace
, determined by the type
value of the XRReferenceSpaceOptions
dictionary passed into the call:
-
Passing a
type
ofstationary
creates anXRStationaryReferenceSpace
instance. -
Passing a
type
ofbounded
creates anXRBoundedReferenceSpace
instance if supported by the XR device and theXRSession
. -
Passing a
type
ofunbounded
creates anXRUnboundedReferenceSpace
instance if supported by the XR device and theXRSession
.
The originOffset
attribute is a XRRigidTransform
that describes an additional translation and rotation to be applied to any poses queried using the XRReferenceSpace
. It is initially set to an identity transform. Changes to the originOffset
take effect immediately, and subsequent poses queried with the XRReferenceSpace
will take into account the new transform.
Note: Changing the originOffset
between pose queries in a single XR animation frame is not advised, since it will cause inconsistencies in the tracking data and rendered output.
The onreset
attribute is an Event handler IDL attribute for the reset
event type.
When an XRReferenceSpace
is requested, the user agent MUST create a reference space by running the following steps:
-
Initialize session be the
XRSession
object that requested creation of a reference space. -
Let options be the
XRReferenceSpaceOptions
passed torequestReferenceSpace()
. -
Let type be set to options
type
. -
Let referenceSpace be set to
null
. -
If type is
stationary
, let referenceSpace be a newXRStationaryReferenceSpace
with asubtype
of optionssubtype
. -
Else if type is
bounded
, let referenceSpace be a newXRBoundedReferenceSpace
. -
Else if type is
unbounded
, let referenceSpace be a newXRUnboundedReferenceSpace
. -
Return referenceSpace.
Describe circumstances when a reference space type might be rejected.
5.3. XRStationaryReferenceSpace
An XRStationaryReferenceSpace
represents a tracking space that the user is not expected to move around within. Tracking in a stationary
reference space is optimized for the assumption that the user will not move much beyond their starting point, if at all. For devices with 6DoF tracking, stationary
reference spaces should emphasize keeping the origin stable relative to the user’s environment.
enum {
XRStationaryReferenceSpaceSubtype "eye-level" ,"floor-level" ,"position-disabled" };dictionary :
XRStationaryReferenceSpaceOptions XRReferenceSpaceOptions {required XRStationaryReferenceSpaceSubtype ; }; [
subtype SecureContext ,Exposed =Window ]interface :
XRStationaryReferenceSpace XRReferenceSpace {readonly attribute XRStationaryReferenceSpaceSubtype subtype ; };
There are several subtypes of XRStationaryReferenceSpace
, determined by the subtype
value of the XRStationaryReferenceSpaceOptions
dictionary passed into the requestReferenceSpace()
call:
-
Passing a
subtype
ofeye-level
creates anXRStationaryReferenceSpace
with it’s origin near the user’s head at the time of creation. The exact position and orientation will be initialized based on the conventions of the underlying platform. -
Passing a
type
offloor-level
creates anXRStationaryReferenceSpace
with it’s origin positioned at the floor in a safe position for the user to stand. The `y` axis equals `0` at floor level, with the `x` and `z` position and orientation initialized based on the conventions of the underlying platform. If the floor level isn’t known it will be estimated. -
Passing a
type
ofposition-disabled
creates anXRStationaryReferenceSpace
where orientation is tracked but the viewers position is always reported as being at the origin.
Note: The position-disabled
subtype is primarily intended for use with pre-rendered media such as panoramic photos or videos. It should not be used for most other media types due to user discomfort associated with the lack of a neck model or full positional tracking.
The XRStationaryReferenceSpace
's subtype
attribute is the XRStationaryReferenceSpaceSubtype
that the XRStationaryReferenceSpace
was created with.
5.4. XRBoundedReferenceSpace
An XRBoundedReferenceSpace
represents a floor-relative tracking space where the user is expected to move within a pre-established boundary. Tracking in an bounded
reference space is optimized for keeping the reference space origin and bounds geometry stable relative to the user’s environment.
[SecureContext ,Exposed =Window ]interface :
XRBoundedReferenceSpace XRReferenceSpace {readonly attribute FrozenArray <DOMPointReadOnly >boundsGeometry ; };
The origin of a XRBoundedReferenceSpace
MUST be positioned at the floor, such that the `y` axis equals `0` at floor level. The `x` and `z` position and orientation are initialized based on the conventions of the underlying platform, typically expected to be near the center of the room facing in a logical forward direction.
Note: Other XR platforms sometimes refer to the type of tracking offered by a bounded
reference space as "room scale" tracking. An XRBoundedReferenceSpace
is not intended to describe multi-room spaces, areas with uneven floor levels, or very large open areas. Content that needs to handle those scenarios should use an XRUnboundedReferenceSpace
.
The boundsGeometry
attribute describes the border around the XRBoundedReferenceSpace
, which the user can expect to safely move within.
The polygonal boundary is given as an array of DOMPointReadOnly
s, which represents a loop of points at the edges of the safe space. The points describe offsets from the XRReferenceSpace
origin in meters. Points MUST be given in a clockwise order as viewed from above, looking towards the negative end of the Y axis. The y
value of each point MUST be 0
and the w
value of each point MUST be 1
. The bounds can be considered to originate at the floor and extend infinitely high. The shape it describes MAY not be convex.
Note: Content should not require the user to move beyond the boundsGeometry
. It is possible for the user to move beyond the bounds if their physical surroundings allow for it, resulting in position values outside of the polygon they describe. This is not an error condition and should be handled gracefully by page content.
Note: Content generally should not provide a visualization of the boundsGeometry
, as it’s the user agent’s responsibility to ensure that safety critical information is provided to the user.
5.5. XRUnboundedReferenceSpace
An XRUnboundedReferenceSpace
represents a tracking space where the user is expected to move freely around their environment, potentially even long distances from their starting point. Tracking in an unbounded
reference space is optimized for stability around the user’s current position, and as such the tracking origin may drift over time.
[SecureContext ,Exposed =Window ]interface :
XRUnboundedReferenceSpace XRReferenceSpace { };
6. Views
6.1. XRView
An XRView
describes a single view into an XR scene. Each view corresponds to a display or portion of a display used by an XR device to present imagery to the user. They are used to retrieve all the information necessary to render content that is well aligned to the view's physical output properties, including the field of view, eye offset, and other optical properties. Views may cover overlapping regions of the user’s vision. No guarantee is made about the number of views any XR device uses or their order, nor is the number of views required to be constant for the duration of an XRSession
.
NOTE: Many HMDs will request that content render two views, one for the left eye and one for the right, while most magic window devices will only request one view, but applications should never assume a specific view configuration. For example: A magic window device may request two views if it is capable of stereo output, but may revert to requesting a single view for performance reasons if the stereo output mode is turned off. Similarly, HMDs may request more than two views to facilitate a wide field of view or displays of different pixel density.
enum {
XREye ,
"left" }; [
"right" SecureContext ,Exposed =Window ]interface {
XRView readonly attribute XREye eye ;readonly attribute Float32Array projectionMatrix ;readonly attribute Float32Array viewMatrix ;readonly attribute XRRigidTransform transform ; };
The eye
attribute describes which eye this view is expected to be shown to. This attribute’s primary purpose is to ensure that pre-rendered stereo content can present the correct portion of the content to the correct eye. If the view does not have an intrinsically associated eye (the display is monoscopic, for example) this attribute MUST be set to "left"
.
The projectionMatrix
attribute provides a matrix describing the projection to be used when rendering the view. It is strongly recommended that applications use this matrix without modification. Failure to use the provided projection matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort.
The viewMatrix
attribute provides a matrix describing the view transform to be used when rendering the view. The matrix represents the inverse of the transform
's matrix
. It is strongly recommended that applications use this matrix without modification. Failure to use the provided view matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort.
The transform
attribute is the XRRigidTransform
of the viewpoint.
NOTE: The transform
can be used to position camera objects in many rendering libraries instead of using the viewMatrix
directly if the library is more naturally set up to consume data in that format.
6.2. XRViewport
An XRViewport
object describes a viewport, or rectangular region, of a graphics surface.
[SecureContext ,Exposed =Window ]interface {
XRViewport readonly attribute long x ;readonly attribute long y ;readonly attribute long width ;readonly attribute long height ; };
The x
and y
attributes define an offset from the surface origin and the width
and height
attributes define the rectangular dimensions of the viewport.
The exact interpretation of the viewport values depends on the conventions of the graphics API the viewport is associated with:
-
When used with a
XRWebGLLayer
thex
andy
attributes specify the lower left corner of the viewport rectangle, in pixels, with the viewport rectangle extendingwidth
pixels to the right ofx
andheight
pixels abovey
. The values can be passed to the WebGL viewport function directly.
XRView
s of an XRViewerPose
, queries an XRViewport
from an XRWebGLLayer
for each, and uses them to set the appropriate WebGL viewports for rendering.
xrSession. requestAnimationFrame(( time, xrFrame) => { let viewer= xrFrame. getViewerPose( xrReferenceSpace); gl. bindFramebuffer( xrWebGLLayer. framebuffer); for ( xrViewof viewer. views) { let xrViewport= xrWebGLLayer. getViewport( xrView); gl. viewport( xrViewport. x, xrViewport. y, xrViewport. width, xrViewport. height); // WebGL draw calls will now be rendered into the appropriate viewport. } });
7. Geometric Primitives
7.1. Matrices
WebXR provides various transforms in the form of matrices. WebXR matrices are always 4x4 and given as 16 element Float32Array
s in column major order. They may be passed directly to WebGL’s uniformMatrix4fv
function, used to create an equivalent DOMMatrix
, or used with a variety of third party math libraries.
Translations specified by WebXR matrices are always given in meters.
7.2. XRRigidTransform
An XRRigidTransform
is a transform described by a position
and orientation
. When interpreting an XRRigidTransform
the orientation
is always applied prior to the position
.
[SecureContext ,Exposed =Window ,Constructor (optional DOMPointInit ,
position optional DOMPointInit )]
orientation interface {
XRRigidTransform readonly attribute DOMPointReadOnly position ;readonly attribute DOMPointReadOnly orientation ;readonly attribute Float32Array matrix ; };
The XRRigidTransform(position, orientation)
constructor MUST perform the following steps when invoked:
-
Let transform be a new
XRRigidTransform
. -
If position is not a
DOMPointInit
initialize transform’sposition
to{ x: 0.0, y: 0.0, z: 0.0, w: 1.0 }
. -
Else initialize transform’s
position
’sx
value to position’s x dictionary member,y
value to position’s y dictionary member,z
value to position’s z dictionary member andw
to1.0
. -
If orientation is not a
DOMPointInit
initialize transform’sorientation
to{ x: 0.0, y: 0.0, z: 0.0, w: 1.0 }
. -
Else initialize transform’s
orientation
’sx
value to orientation’s x dictionary member,y
value to orientation’s y dictionary member,z
value to orientation’s z dictionary member andw
value to orientation’s w dictionary member. -
Normalize transform’s
orientation
. -
Return transform.
The position
attribute is a 3-dimensional point, given in meters, describing the translation component of the transform. The position
's w
attribute MUST be 1.0
.
The orientation
attribute is a quaternion describing the rotational component of the transform. The orientation
MUST be normalized to have a length of 1.0
.
The matrix
attribute returns the transform described by the position
and orientation
attributes as a matrix.
An XRRigidTransform
with a position
of { x: 0, y: 0, z: 0 w: 1 }
and an orientation
of { x: 0, y: 0, z: 0, w: 1 }
is known as an identity transform.
7.3. XRRay
An XRRay
is a geometric ray described by a origin
point and direction
vector.
[SecureContext ,Exposed =Window ,Constructor (optional DOMPointInit ,
origin optional DOMPointInit ),
direction Constructor (XRRigidTransform )]
transform interface {
XRRay readonly attribute DOMPointReadOnly origin ;readonly attribute DOMPointReadOnly direction ;readonly attribute Float32Array matrix ; };
The XRRay(origin, direction)
constructor MUST perform the following steps when invoked:
-
Let ray be a new
XRRay
. -
If origin is not a
DOMPointInit
initialize ray’sorigin
to{ x: 0.0, y: 0.0, z: 0.0, w: 1.0 }
. -
Else initialize ray’s
origin
’sx
value to origin’s x dictionary member,y
value to origin’s y dictionary member,z
value to origin’s z dictionary member andw
to1.0
. -
If direction is not a
DOMPointInit
initialize ray’sdirection
to{ x: 0.0, y: 0.0, z: -1.0, w: 0.0 }
. -
Else initialize ray’s
direction
’sx
value to direction’s x dictionary member,y
value to direction’s y dictionary member,z
value to direction’s z dictionary member andw
value to to0.0
. -
Return ray.
The XRRay(transform)
constructor MUST perform the following steps when invoked:
The origin
attribute defines the 3-dimensional point in space that the ray originates from, given in meters. The origin
's w
attribute MUST be 1.0
.
The direction
attribute defines the ray’s 3-dimensional directional vector. The direction
's w
attribute MUST be 0.0
and the vector MUST be normalized to have a length of 1.0
.
The matrix
attribute is a matrix which represents the transform from a ray originating at [0, 0, 0]
and extending down the negative Z axis to the ray described by the XRRay
's origin
and direction
.
NOTE: The XRRay
's matrix
can be used to easily position graphical representations of the ray when rendering.
8. Pose
8.1. XRViewerPose
An XRViewerPose
describes the state of a viewer of the XR scene as tracked by the XR device. A viewer may represent a tracked piece of hardware, the observed position of a users head relative to the hardware, or some other means of computing a series of viewpoints into the XR scene. The XRViewerPose
describes the position and orientation of the viewer relative to the XRReferenceSpace
it was queried with, as well as an array of views, which include view and projection matrices. These matrices should be used by the application when render a frame of an XR scene.
[SecureContext ,Exposed =Window ]interface {
XRViewerPose readonly attribute XRRigidTransform transform ;readonly attribute FrozenArray <XRView >views ; };
The transform
is the XRRigidTransform
of the viewer relative to the origin of the XRReferenceSpace
the XRViewerPose
was queried with.
NOTE: The transform
can be used to position graphical representations of the viewer for spectator views of the scene or multi-user interaction.
The views
array is a sequence of XRView
s describing the viewpoints of the XR scene, relative to the XRReferenceSpace
the XRViewerPose
was queried with. Every view of the XR scene in the array must be rendered in order to display correctly on the XR device. Each XRView
includes view and projection matrices, and can be used to query XRViewport
s from layers when needed.
9. Input
9.1. XRInputSource
Need some intro text for XRInputSource
enum {
XRHandedness ,
"" ,
"left" };
"right" enum {
XRTargetRayMode "gaze" ,"tracked-pointer" ,"screen" }; [SecureContext ,Exposed =Window ]interface {
XRInputSource readonly attribute XRHandedness handedness ;readonly attribute XRTargetRayMode targetRayMode ; };
Each XRInputSource
SHOULD define a primary action. The primary action is a platform-specific action that, when engaged, produces selectstart
, selectend
, and select
events. Examples of possible primary actions are pressing a trigger, touchpad, or button, speaking a command, or making a hand gesture. If the platform guidelines define a recommended primary input then it should be used as the primary action, otherwise the user agent is free to select one.
The handedness
attribute describes which hand the input source is associated with, if any. Input sources with no natural handedness (such as headset-mounted controls or standard gamepads) or for which the handedness is not currently known MUST set this attribute to the empty string.
The targetRayMode
attribute describes the method used to produce the target ray, and indicates how the application should present the target ray to the user if desired.
-
gaze
indicates the target ray will originate at the user’s head and follow the direction they are looking (this is commonly referred to as a "gaze input" device). -
tracked-pointer
indicates that the target ray originates from either a handheld device or other hand-tracking mechanism and represents that the user is using their hands or the held device for pointing. -
screen
indicates that the input source was an interaction with the canvas element associated with a inline session’s output context, such as a mouse click or touch event.
Note: Some input sources, like an XRInputSource
with targetRayMode
set to screen
, will only be added to the session’s list of active input sources immediately before the selectstart
event, and removed from the session’s list of active input sources immediately after the selectend
event.
9.2. XRInputPose
Need some intro text for XRInputPose
[SecureContext ,Exposed =Window ]interface {
XRInputPose readonly attribute boolean emulatedPosition ;readonly attribute XRRay targetRay ;readonly attribute XRRigidTransform ?gripTransform ; };
The targetRay
describes the preferred pointing ray of the XRInputSource
, as defined by the targetRayMode
.
The gripTransform
MUST describe a transform to a space which, if the user were to hold a straight rod in their hand their hand, places the origin at the centroid of their curled fingers and where the -Z
axis points along the rod towards their thumb. The X
axis is perpendicular to the back of the hand being described, with back of the users right hand pointing towards +X
and the back of the user’s left hand pointing towards -X
. The Y
axis is implied by the relationship between the X
and Z
axis, with +Y
roughly pointing in the direction of the user’s arm.
The gripTransform
MAY represent an emulated translation or rotation if the input source’s cannot supply full 6DoF tracking.
The gripTransform
MUST be `null` if the input source isn’t trackable.
The emulatedPosition
attribute indicates the accuracy of the origin
of the targetRay
and position
of the gripTransform
. emulatedPosition
MUST be set to true
if positional values are software estimations, such as those provided by a neck or arm model. emulatedPosition
MUST be set to false
if the positional values are based on sensor readings.
10. Layers
10.1. XRLayer
An XRLayer
defines a source of bitmap images and a description of how the image is to be rendered to the XR device. Initially only one type of layer, the XRWebGLLayer
, is defined but future revisions of the spec may extend the available layer types.
[SecureContext ,Exposed =Window ]interface {};
XRLayer
10.2. XRWebGLLayer
An XRWebGLLayer
is a layer which provides a WebGL framebuffer to render into, enabling hardware accelerated rendering of 3D graphics to be presented on the XR device.
typedef (WebGLRenderingContext or WebGL2RenderingContext );
XRWebGLRenderingContext dictionary {
XRWebGLLayerInit boolean =
antialias true ;boolean =
depth true ;boolean =
stencil false ;boolean =
alpha true ;double = 1.0; }; [
framebufferScaleFactor SecureContext ,Exposed =Window ,Constructor (XRSession ,
session XRWebGLRenderingContext ,
context optional XRWebGLLayerInit )]
layerInit interface :
XRWebGLLayer XRLayer { // Attributesreadonly attribute XRWebGLRenderingContext context ;readonly attribute boolean antialias ;readonly attribute boolean depth ;readonly attribute boolean stencil ;readonly attribute boolean alpha ;readonly attribute WebGLFramebuffer framebuffer ;readonly attribute unsigned long framebufferWidth ;readonly attribute unsigned long framebufferHeight ; // MethodsXRViewport ?getViewport (XRView );
view void (
requestViewportScaling double ); // Static Methods
viewportScaleFactor static double getNativeFramebufferScaleFactor (XRSession ); };
session
The XRWebGLLayer(session, context, layerInit)
constructor MUST perform the following steps when invoked:
-
Let layer be a new
XRWebGLLayer
-
If session’s ended value is
true
, throw anInvalidStateError
and abort these steps. -
If context is lost, throw an
InvalidStateError
and abort these steps. -
If context’s XR compatible boolean is false, throw an
InvalidStateError
and abort these steps. -
Initialize layer’s
context
to context. -
Initialize layer’s
antialias
to layerInit’santialias
value. -
Initialize layer’s
framebuffer
to a new opaque framebuffer created with context. -
Initialize the layer’s swap chain.
-
If layer’s swap chain was unable to be created for any reason, throw an
OperationError
and abort these steps. -
Return layer.
The context
attribute is the WebGLRenderingContext
the XRWebGLLayer
was created with.
The framebuffer
attribute of an XRWebGLLayer
is an instance of a WebGLFramebuffer
which has been marked as opaque. An opaque framebuffer functions identically to a standard WebGLFramebuffer
with the following changes that make it behave more like the default framebuffer:
-
An opaque framebuffer MAY support antialiasing, even in WebGL 1.0.
-
An opaque framebuffer's attachments cannot be inspected or changed. Calling
framebufferTexture2D
,framebufferRenderbuffer
,getFramebufferAttachmentParameter
, orgetRenderbufferParameter
with an opaque framebuffer MUST generate anINVALID_OPERATION
error. -
An opaque framebuffer is considered incomplete outside of an
requestAnimationFrame()
callback. When not in arequestAnimationFrame()
callback calls tocheckFramebufferStatus
outside of anrequestAnimationFrame()
callback MUST generate aFRAMEBUFFER_UNSUPPORTED
error and attempts to clear, draw to, or read from the opaque framebuffer MUST generate anINVALID_FRAMEBUFFER_OPERATION
error.
The framebufferWidth
and framebufferHeight
attributes return the width and height of the framebuffer
's attachments, respectively.
The antialias
attribute is true
if the framebuffer
supports antialiasing using a technique of the UAs choosing, and false
if no antialiasing will be performed.
The depth
attribute is true
if the framebuffer
has a depth buffer attachment and false
if no depth buffer is attached.
The stencil
attribute is true
if the framebuffer
has a stencil buffer attachment and false
if no stencil buffer is attached.
The alpha
attribute is true
if the framebuffer
has an alpha buffer attachment and false
if no alpha buffer is attached.
Each XRWebGLLayer
MUST have a list of viewports which contains one WebGL viewport for each XRView
the XRSession
currently exposes. The viewports MUST NOT be overlapping. The XRWebGLLayer
MUST also have a viewport scale factor, initially set to 1.0, and a minimum viewport scale factor set to a UA-determined value between 0 and 1.
getViewport()
queries the XRViewport
the given XRView
should use when rendering to the layer.
The getViewport(view)
method, when invoked, MUST run the following steps:
-
If layer was created with a different
XRSession
than the one that produced view returnnull
. -
Let glViewport be the WebGL viewport from the list of viewports associated with view.
-
Let viewport be a new
XRViewport
instance. -
Initialize viewport’s
x
to glViewport’sx
component. -
Initialize viewport’s
y
to glViewport’sy
component. -
Initialize viewport’s
width
to glViewport’swidth
component multiplied by the viewport scale factor. -
Initialize viewport’s
height
to glViewport’sheight
component multiplied by the viewport scale factor. -
Return viewport.
The framebuffer
size cannot be adjusted by the developer after the XRWebGLLayer
has been created, but it can be useful to adjust the resolution content is rendered at at runtime to aid application performance. To do so, developers can request that the size of the viewports in the list of viewports be changed using the requestViewportScaling()
method.
The requestViewportScaling(scaleFactor)
method, when invoked, MUST run the following steps:
-
If scaleFactor is greater than 1.0 set scaleFactor to 1.0.
-
If scaleFactor is less than the minimum viewport scale factor set scaleFactor to the minimum viewport scale factor.
-
If the XR device places additional device-specific restrictions on viewport size, adjust scaleFactor accordingly.
-
Set the viewport scale factor to scaleFactor.
Viewport changes (and a lot of other changes) should not take place mid-frame.
Each XRSession
MUST identify a native WebGL framebuffer resolution, which is the pixel resolution of a WebGL framebuffer required to match the physical pixel resolution of the XR device.
The native WebGL framebuffer resolution is determined by running the following steps:
-
Let session be the target
XRSession
. -
If session’s
mode
value is not"inline"
, set the native WebGL framebuffer resolution to the resolution required to have a 1:1 ratio between the pixels of a framebuffer large enough to contain all of the session’sXRView
s and the physical screen pixels in the area of the display under the highest magnification and abort these steps. If no method exists to determine the native resolution as described, the recommended WebGL framebuffer resolution MAY be used. -
If session’s
mode
value is"inline"
, set the native WebGL framebuffer resolution to the size of the session’soutputContext
'scanvas
in physical display pixels and reevaluate these steps every time the size of the canvas changes.
Additionally, the XRSession
MUST identify a recommended WebGL framebuffer resolution, which represents a best estimate of the WebGL framebuffer resolution large enough to contain all of the session’s XRView
s that provides an average application a good balance between performance and quality. It MAY be smaller than, larger than, or equal to the native WebGL framebuffer resolution.
NOTE: The user agent is free to use and method of it’s choosing to estimate the recommended WebGL framebuffer resolution. If there are platform-specific methods for querying a recommended size it is recommended that they be used, but not required.
The getNativeFramebufferScaleFactor(session)
method, when invoked, MUST run the following steps:
-
Let session be the target
XRSession
. -
If session’s ended value is
true
, return0.0
and abort these steps. -
Return the value that the session’s recommended WebGL framebuffer resolution must be multiplied by to yield the session’s native WebGL framebuffer resolution.
Document the creation of a swap chain.
10.3. WebGL Context Compatibility
In order for a WebGL context to be used as a source for XR imagery it must be created on a compatible graphics adapter for the XR device. What is considered a compatible graphics adapter is platform dependent, but is understood to mean that the graphics adapter can supply imagery to the XR device without undue latency. If a WebGL context was not already created on the compatible graphics adapter, it typically must be re-created on the adapter in question before it can be used with an XRWebGLLayer
.
Note: On an XR platform with a single GPU, it can safely be assumed that the GPU is compatible with the XR devices advertised by the platform, and thus any hardware accelerated WebGL contexts are compatible as well. On PCs with both an integrated and discreet GPU the discreet GPU is often considered the compatible graphics adapter since it generally a higher performance chip. On desktop PCs with multiple graphics adapters installed, the one with the XR device physically connected to it is likely to be considered the compatible graphics adapter.
partial dictionary WebGLContextAttributes {boolean =
xrCompatible null ; };partial interface mixin WebGLRenderingContextBase {Promise <void >makeXRCompatible (); };
When a user agent implements this specification it MUST set a XR compatible boolean, initially set to false
, on every WebGLRenderingContextBase
. Once the XR compatible boolean is set to true
, the context can be used with layers for any XRSession
requested from the current XR device.
The XR compatible boolean can be set either at context creation time or after context creation, potentially incurring a context loss. To set the XR compatible boolean at context creation time, the xrCompatible
context creation attribute must be set to true
when requesting a WebGL context.
When the HTMLCanvasElement
's getContext() method is invoked with a WebGLContextAttributes
dictionary with xrCompatible
set to true
, run the following steps:
-
Create the WebGL context as usual, ensuring it is created on a compatible graphics adapter for the XR device.
-
Let context be the newly created WebGL context.
-
Set context’s XR compatible boolean to true.
-
Return context.
XRWebGLLayer
.
function onXRSessionStarted( xrSession) { let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" , { xrCompatible: true }); loadWebGLResources(); xrSession. updateRenderState({ baseLayer: new XRWebGLLayer( xrSession, gl) }); }
To set the XR compatible boolean after the context has been created, the makeXRCompatible()
method is used.
When the makeXRCompatible()
method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Let context be the target
WebGLRenderingContextBase
object. -
If context’s WebGL context lost flag is set, reject promise with an
InvalidStateError
and abort these abort these steps. -
If context’s XR compatible boolean is
true
, resolve promise and abort these steps. -
If context was created on a compatible graphics adapter for the XR device:
-
Set context’s XR compatible boolean to
true
. -
Resolve promise and abort these steps.
-
-
Queue a task to perform the following steps:
-
Force context to be lost and handle the context loss as described by the WebGL specification.
-
If the canceled flag of the "webglcontextlost" event fired in the previous step was not set, reject promise with an
AbortError
and abort these steps. -
Restore the context on a compatible graphics adapter for the XR device.
-
Set context’s XR compatible boolean to
true
. -
Resolve promise.
-
Additionally, when any WebGL context is lost run the following steps prior to firing the "webglcontextlost" event:
-
Set the context’s XR compatible boolean to
false
.
XRWebGLLayer
from a pre-existing WebGL context.
let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" ); loadWebGLResources(); glCanvas. addEventListener( "webglcontextlost" , ( event) => { // Indicates that the WebGL context can be restored. event. canceled= true ; }); glCanvas. addEventListener( "webglcontextrestored" , ( event) => { // WebGL resources need to be re-created after a context loss. loadWebGLResources(); }); function onXRSessionStarted( xrSession) { // Make sure the canvas context we want to use is compatible with the device. // May trigger a context loss. return gl. makeXRCompatible(). then(() => { return xrSession. updateRenderState({ baseLayer: new XRWebGLLayer( xrSession, gl) }); }); }
11. Canvas Rendering Context
11.1. XRPresentationContext
[SecureContext ,Exposed =Window ]interface {
XRPresentationContext readonly attribute HTMLCanvasElement canvas ; };
canvas
12. Events
12.1. XRSessionEvent
XRSessionEvent
s are fired to indicate changes to the state of an XRSession
.
[SecureContext ,Exposed =Window ,(
Constructor DOMString ,
type XRSessionEventInit )]
eventInitDict interface :
XRSessionEvent Event {readonly attribute XRSession session ; };dictionary :
XRSessionEventInit EventInit {required XRSession ; };
session
The session
attribute indicates the XRSession
that generated the event.
12.2. XRInputSourceEvent
XRInputSourceEvent
s are fired to indicate changes to the state of an XRInputSource
.
[SecureContext ,Exposed =Window ,(
Constructor DOMString ,
type XRInputSourceEventInit )]
eventInitDict interface :
XRInputSourceEvent Event {readonly attribute XRFrame frame ;readonly attribute XRInputSource inputSource ; };dictionary :
XRInputSourceEventInit EventInit {required XRFrame ;
frame required XRInputSource ; };
inputSource
The inputSource
attribute indicates the XRInputSource
that generated this event.
The frame
attribute is an XRFrame
that corresponds with the time that the event took place. It may represent historical data. Any XRViewerPose
queried from the frame
MUST have an empty views
array.
When the user agent fires an XRInputSourceEvent
event it MUST run the following steps:
12.3. XRReferenceSpaceEvent
XRReferenceSpaceEvent
s are fired to indicate changes to the state of an XRReferenceSpace
.
[SecureContext ,Exposed =Window ,(
Constructor DOMString ,
type XRReferenceSpaceEventInit )]
eventInitDict interface :
XRReferenceSpaceEvent Event {readonly attribute XRReferenceSpace referenceSpace ;readonly attribute XRRigidTransform ?transform ; };dictionary :
XRReferenceSpaceEventInit EventInit {required XRReferenceSpace ;
referenceSpace XRRigidTransform ; };
transform
The referenceSpace
attribute indicates the XRReferenceSpace
that generated this event.
The transform
attribute describes the transform the referenceSpace
underwent during this event, if applicable.
12.4. Event Types
The user agent MUST provide the following new events. Registration for and firing of the events must follow the usual behavior of DOM4 Events.
The user agent MAY fire a devicechange
event on the XR
object to indicate that the availability of XR devices has been changed. The event MUST be of type Event
.
A user agent MAY dispatch a blur
event on an XRSession
to indicate that presentation to the XRSession
by the page has been suspended by the user agent, OS, or XR hardware. While an XRSession
is blurred it remains active but it may have its frame production throttled. This is to prevent tracking while the user interacts with potentially sensitive UI. For example: The user agent SHOULD blur the presenting application when the user is typing a URL into the browser with a virtual keyboard, otherwise the presenting page may be able to guess the URL the user is entering by tracking their head motions. The event MUST be of type XRSessionEvent
.
A user agent MAY dispatch a focus
event on an XRSession
to indicate that presentation to the XRSession
by the page has resumed after being suspended. The event MUST be of type XRSessionEvent
.
A user agent MUST dispatch a end
event on an XRSession
when the session ends, either by the application or the user agent. The event MUST be of type XRSessionEvent
.
A user agent MUST dispatch a inputsourceschange
event on an XRSession
when the session’s list of active input sources has changed. The event MUST be of type XRSessionEvent
.
A user agent MUST dispatch a selectstart
event on an XRSession
when one of its XRInputSource
s begins its primary action. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a selectend
event on an XRSession
when one of its XRInputSource
s ends its primary action or when an XRInputSource
that has begun a primary action is disconnected. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a select
event on an XRSession
when one of its XRInputSource
s has fully completed a primary action. The event MUST be of type XRInputSourceEvent
.
A user agent MUST dispatch a reset
event on an XRReferenceSpace
when discontinuities of the origin occur. (That is, significant changes in the origin’s position or orientation relative to the user’s environment.) It also fires when the boundsGeometry
changes for a XRBoundedReferenceSpace
. The event MUST be of type XRReferenceSpaceEvent
, and MUST be dispatched prior to the execution of any XR animation frames that make use of the new origin.
13. Security, Privacy, and Comfort Considerations
The WebXR Device API provides powerful new features which bring with them several unique privacy, security, and comfort risks that user agents must take steps to mitigate.
13.1. Gaze Tracking
While the API does not yet expose eye tracking capabilities a lot can be inferred about where the user is looking by tracking the orientation of their head. This is especially true of XR devices that have limited input capabilities, such as Google Cardboard, which frequently require users to control a "gaze cursor" with their head orientation. This means that it may be possible for a malicious page to infer what a user is typing on a virtual keyboard or how they are interacting with a virtual UI based solely on monitoring their head movements. For example: if not prevented from doing so a page could estimate what URL a user is entering into the user agent’s URL bar.
To prevent this risk the user agent MUST blur all sessions when the users is interacting with sensitive, trusted UI such as URL bars or system dialogs. Additionally, to prevent a malicious page from being able to monitor input on a other pages the user agent MUST blur all sessions on non-focused pages.
13.2. Trusted Environment
If the virtual environment does not consistently track the user’s head motion with low latency and at a high frame rate the user may become disoriented or physically ill. Since it is impossible to force pages to produce consistently performant and correct content the user agent MUST provide a tracked, trusted environment and an XR Compositor which runs asynchronously from page content. The compositor is responsible for compositing the trusted and untrusted content. If content is not performant, does not submit frames, or terminates unexpectedly the user agent should be able to continue presenting a responsive, trusted UI.
Additionally, page content has the ability to make users uncomfortable in ways not related to performance. Badly applied tracking, strobing colors, and content intended to offend, frighten, or intimidate are examples of content which may cause the user to want to quickly exit the XR experience. Removing the XR device in these cases may not always be a fast or practical option. To accommodate this the user agent SHOULD provide users with an action, such as pressing a reserved hardware button or performing a gesture, that escapes out of WebXR content and displays the user agent’s trusted UI.
When navigating between pages in XR the user agent should display trusted UI elements informing the user of the security information of the site they are navigating to which is normally presented by the 2D UI, such as the URL and encryption status.
13.3. Context Isolation
The trusted UI must be drawn by an independent rendering context whose state is isolated from any rendering contexts used by the page. (For example, any WebGL rendering contexts.) This is to prevent the page from corrupting the state of the trusted UI’s context, which may prevent it from properly rendering a tracked environment. It also prevents the possibility of the page being able to capture imagery from the trusted UI, which could lead to private information being leaked.
Also, to prevent CORS-related vulnerabilities each page will see a new instance of objects returned by the API, such as XRSession
. Attributes such as the context
set by one page must not be able to be read by another. Similarly, methods invoked on the API MUST NOT cause an observable state change on other pages. For example: No method will be exposed that enables a system-level orientation reset, as this could be called repeatedly by a malicious page to prevent other pages from tracking properly. The user agent MUST, however, respect system-level orientation resets triggered by a user gesture or system menu.
13.4. Fingerprinting
Given that the API describes hardware available to the user and its capabilities it will inevitably provide additional surface area for fingerprinting. While it’s impossible to completely avoid this, steps can be taken to mitigate the issue. This spec limits reporting of available hardware to only a single device at a time, which prevents using the rare cases of multiple headsets being connected as a fingerprinting signal. Also, the devices that are reported have no string identifiers and expose very little information about the devices capabilities until an XRSession is created, which may only be triggered via user activation in the most sensitive case.
Discuss use of sensor activity as a possible fingerprinting vector.
14. Integrations
14.1. Feature Policy
This specification defines a feature that controls whether thexr
attribute is exposed on the Navigator
object.
The feature name for this feature is "xr"
.
The default allowlist for this feature is ["self"]
.
15. Acknowledgements
The following individuals have contributed to the design of the WebXR Device API specification:
-
Sebastian Sylvan (Formerly Microsoft)
And a special thanks to Vladimir Vukicevic (Unity) for kick-starting this whole adventure!