Media Capabilities

W3C Working Draft,

More details about this document
This version:
https://www.w3.org/TR/2024/WD-media-capabilities-20241007/
Latest published version:
https://www.w3.org/TR/media-capabilities/
Editor's Draft:
https://w3c.github.io/media-capabilities/
Previous Versions:
History:
https://www.w3.org/standards/history/media-capabilities/
Feedback:
GitHub
Inline In Spec
Editors:
Jean-Yves Avenard (Apple Inc.)
Mark Foltz (Google Inc.)
Former Editors:
Will Cassella (Google Inc.)
Mounir Lamouri (Google Inc.)
Chris Cunningham (Google Inc.)
Vi Nguyen (Microsoft Corporation)
Participate:
Git Repository.
File an issue.
Version History:
https://github.com/w3c/media-capabilities/commits

Abstract

This specification intends to provide APIs to allow websites to make an optimal decision when picking media content for the user. The APIs will expose information about the decoding and encoding capabilities for a given format but also output capabilities to find the best match based on the device’s display.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

Feedback and comments on this specification are welcome. GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the Media Working Group’s mailing-list, public-media-wg@w3.org (archives). This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.

This document was published by the Media Working Group as a Working Draft using the Recommendation track. This document is intended to become a W3C Recommendation.

Publication as a Working Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

This section is non-normative

This specification defines an API to query the user agent with regards to its audio and video decoding and encoding capabilities, based on information such as the codecs, profile, resolution, bitrates, etc., of the media. The API indicates if the configuration is supported and whether the playback is expected to be smooth and/or power efficient.

This specification focuses on encoding and decoding capabilities. It is expected to be used with other web APIs that provide information about the display properties, such as supported color gamut or dynamic range capabilities, which enable web applications to pick the right content for the display and to, for example, avoid providing HDR content to an SDR display.

2. Decoding and Encoding Capabilities

2.1. Media Configurations

2.1.1. MediaConfiguration

dictionary MediaConfiguration {
  VideoConfiguration video;
  AudioConfiguration audio;
};
dictionary MediaDecodingConfiguration : MediaConfiguration {
  required MediaDecodingType type;
  MediaCapabilitiesKeySystemConfiguration keySystemConfiguration;
};
dictionary MediaEncodingConfiguration : MediaConfiguration {
  required MediaEncodingType type;
};

The input to the decoding capabilities is represented by a MediaDecodingConfiguration dictionary and the input to the encoding capabilities by a MediaEncodingConfiguration dictionary.

For a MediaConfiguration to be a valid MediaConfiguration, all of the following conditions MUST be true:

  1. audio and/or video MUST exist.
  2. audio MUST be a valid audio configuration if it exists.
  3. video MUST be a valid video configuration if it exists.

For a MediaDecodingConfiguration to be a valid MediaDecodingConfiguration, all of the following conditions MUST be true:

  1. It MUST be a valid MediaConfiguration.
  2. If keySystemConfiguration exists:
    1. The type MUST be media-source or file.
    2. If keySystemConfiguration.audio exists, audio MUST also exist.
    3. If keySystemConfiguration.video exists, video MUST also exist.

For a MediaDecodingConfiguration to describe [ENCRYPTED-MEDIA], a keySystemConfiguration MUST exist.

2.1.2. MediaDecodingType

enum MediaDecodingType {
  "file",
  "media-source",
  "webrtc"
};

A MediaDecodingConfiguration has three types:

  • file is used to represent a configuration that is meant to be used for playback of media sources other than MediaSource as defined in [media-source] and RTCPeerConnection as defined in [webrtc].
  • media-source is used to represent a configuration that is meant to be used for playback of a MediaSource.
  • webrtc is used to represent a configuration that is meant to be received using RTCPeerConnection.

2.1.3. MediaEncodingType

enum MediaEncodingType {
  "record",
  "webrtc"
};

A MediaEncodingConfiguration can have one of two types:

2.1.4. MIME types

In the context of this specification, a MIME type is also called content type. A valid media MIME type is a string that is a valid MIME type per [mimesniff].

Please note that the definition of MIME subtypes and parameters is context dependent. For file, media-source, and record, the MIME types are specified as defined for HTTP, whereas for webrtc the MIME types are specified as defined for RTP.

A valid audio MIME type is a string that is a valid media MIME type and for which the type per [RFC9110] is either audio or application.

A valid video MIME type is a string that is a valid media MIME type and for which the type per [RFC9110] is either video or application.

2.1.4.1. HTTP

If the MIME type does not imply a codec, the string MUST also have one and only one parameter that is named codecs with a value describing a single media codec. Otherwise, it MUST contain no parameters.

2.1.4.2. RTP

The MIME types used with RTP are defined in the specifications of the corresponding RTP payload formats [RFC4855] [RFC6838]. The codec name is typically specified as subtype and zero or more parameters may be present depending on the codec.

2.1.5. VideoConfiguration

dictionary VideoConfiguration {
  required DOMString contentType;
  required unsigned long width;
  required unsigned long height;
  required unsigned long long bitrate;
  required double framerate;
  boolean hasAlphaChannel;
  HdrMetadataType hdrMetadataType;
  ColorGamut colorGamut;
  TransferFunction transferFunction;
  DOMString scalabilityMode;
  boolean spatialScalability;
};

The contentType member represents the MIME type of the video track.

To check if a VideoConfiguration configuration is a valid video configuration, the following steps MUST be run:

  1. If configuration’s contentType is not a valid video MIME type, return false and abort these steps.
  2. If framerate is not finite or is not greater than 0, return false and abort these steps.
  3. If an optional member is specified for a MediaDecodingType or MediaEncodingType to which it’s not applicable, return false and abort these steps. See applicability rules in the member definitions below.
  4. Return true.

The width and height members represent respectively the visible horizontal and vertical encoded pixels in the encoded video frames.

The bitrate member represents the average bitrate of the video track given in units of bits per second. In the case of a video stream encoded at a constant bit rate (CBR) this value should be accurate over a short term window. For the case of variable bit rate (VBR) encoding, this value should be usable to allocate any necessary buffering and throughput capability to provide for the un-interrupted decoding of the video stream over the long-term based on the indicated contentType.

The framerate member represents the framerate of the video track. The framerate is the number of frames used in one second (frames per second). It is represented as a double.

The hasAlphaChannel member represents whether the video track contains alpha channel information. If true, the encoded video stream can produce per-pixel alpha channel information when decoded. If false, the video stream cannot produce per-pixel alpha channel information when decoded. If undefined, the UA should determine whether the video stream encodes alpha channel information based on the indicated contentType, if possible. Otherwise, the UA should presume that the video stream cannot produce alpha channel information.

If present, the hdrMetadataType member represents that the video track includes the specified HDR metadata type, which the UA needs to be capable of interpreting for tone mapping the HDR content to a color volume and luminance of the output device. Valid inputs are defined by HdrMetadataType. hdrMetadataType is only applicable to MediaDecodingConfiguration for types media-source and file.

If present, the colorGamut member represents that the video track is delivered in the specified color gamut, which describes a set of colors in which the content is intended to be displayed. If the attached output device also supports the specified color, the UA needs to be able to cause the output device to render the appropriate color, or something close enough. If the attached output device does not support the specified color, the UA needs to be capable of mapping the specified color to a color supported by the output device. Valid inputs are defined by ColorGamut. colorGamut is only applicable to MediaDecodingConfiguration for types media-source and file.

If present, the transferFunction member represents that the video track requires the specified transfer function to be understood by the UA. Transfer function describes the electro-optical algorithm supported by the rendering capabilities of a user agent, independent of the display, to map the source colors in the decoded media into the colors to be displayed. Valid inputs are defined by TransferFunction. transferFunction is only applicable to MediaDecodingConfiguration for types media-source and file.

If present, the scalabilityMode member represents the scalability mode as defined in [webrtc-svc]. If absent, the implementer defined default mode for this contentType is assumed (i.e., the mode you get if you don’t specify one via setParameters()). scalabilityMode is only applicable to MediaEncodingConfiguration for type webrtc.

If present, the spatialScalability member represents the ability to do spatial prediction, that is, using frames of a resolution different than the current resolution as dependencies. If absent, spatialScalability will default to false. spatialScalability is closely coupled to scalabilityMode in the sense that streams encoded with modes using spatial scalability (e.g. "L2T1") can only be decoded if spatialScalability is supported. spatialScalability is only applicable to MediaDecodingConfiguration for types media-source, file, and webrtc.

2.1.6. HdrMetadataType

enum HdrMetadataType {
  "smpteSt2086",
  "smpteSt2094-10",
  "smpteSt2094-40"
};

If present, HdrMetadataType describes the capability to interpret HDR metadata of the specified type.

The VideoConfiguration may contain one of the following types:

  • smpteSt2086, representing the static metadata type defined by [SMPTE-ST-2086].
  • smpteSt2094-10, representing the dynamic metadata type defined by [SMPTE-ST-2094].
  • smpteSt2094-40, representing the dynamic metadata type defined by [SMPTE-ST-2094].

2.1.7. ColorGamut

enum ColorGamut {
  "srgb",
  "p3",
  "rec2020"
};

The VideoConfiguration may contain one of the following types:

  • srgb, representing the [sRGB] color gamut.
  • p3, representing the DCI P3 Color Space color gamut. This color gamut includes the srgb gamut.
  • rec2020, representing the ITU-R Recommendation BT.2020 color gamut. This color gamut includes the p3 gamut.

2.1.8. TransferFunction

enum TransferFunction {
  "srgb",
  "pq",
  "hlg"
};

The VideoConfiguration may contain one of the following types:

  • srgb, representing the transfer function defined by [sRGB].
  • pq, representing the "Perceptual Quantizer" transfer function defined by [SMPTE-ST-2084].
  • hlg, representing the "Hybrid Log Gamma" transfer function defined by BT.2100.

2.1.9. AudioConfiguration

dictionary AudioConfiguration {
  required DOMString contentType;
  DOMString channels;
  unsigned long long bitrate;
  unsigned long samplerate;
  boolean spatialRendering;
};

The contentType member represents the MIME type of the audio track.

To check if a AudioConfiguration configuration is a valid audio configuration, the following steps MUST be run:

  1. If configuration’s contentType is not a valid audio MIME type, return false and abort these steps.
  2. Return true.

The channels member represents the audio channels used by the audio track. channels is only applicable to the decoding types media-source, file, and webrtc and the encoding type webrtc.

The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current definition is a placeholder.

The bitrate member represents the average bitrate of the audio track. The bitrate is the number of bits used to encode a second of the audio track.

The samplerate member represents the sample rate of the audio track. The sample rate is the number of samples of audio carried per second. samplerate is only applicable to the decoding types media-source, file, and webrtc and the encoding type webrtc.

The samplerate is expressed in Hz (ie. number of samples of audio per second). Sometimes the samplerates value are expressed in kHz which represents the number of thousands of samples of audio per second.
44100 Hz is equivalent to 44.1 kHz.

The spatialRendering member indicates that the audio SHOULD be rendered spatially. The details of spatial rendering SHOULD be inferred from the contentType. If it does not exist, the UA MUST presume spatial rendering is not required. When true, the user agent SHOULD only report this configuration as supported if it can support spatial rendering for the current audio output device without failing back to a non-spatial mix of the stream. spatialRendering is only applicable to MediaDecodingConfiguration for types media-source and file.

2.1.10. MediaCapabilitiesKeySystemConfiguration

dictionary MediaCapabilitiesKeySystemConfiguration {
  required DOMString keySystem;
  DOMString initDataType = "";
  MediaKeysRequirement distinctiveIdentifier = "optional";
  MediaKeysRequirement persistentState = "optional";
  sequence<DOMString> sessionTypes;
  KeySystemTrackConfiguration audio;
  KeySystemTrackConfiguration video;
};

This dictionary refers to a number of types defined by [ENCRYPTED-MEDIA] (EME). Sequences of EME types are flattened to a single value whenever the intent of the sequence was to have requestMediaKeySystemAccess() choose a subset it supports. With MediaCapabilities, callers provide the sequence across multiple calls, ultimately letting the caller choose which configuration to use.

The keySystem member represents a keySystem name as described in [ENCRYPTED-MEDIA].

The initDataType member represents a single value from the initDataTypes sequence described in [ENCRYPTED-MEDIA].

The distinctiveIdentifier member represents a distinctiveIdentifier requirement as described in [ENCRYPTED-MEDIA].

The persistentState member represents a persistentState requirement as described in [ENCRYPTED-MEDIA].

The sessionTypes member represents a sequence of required sessionTypes as described in [ENCRYPTED-MEDIA].

The audio member represents a KeySystemTrackConfiguration associated with the AudioConfiguration.

The video member represents a KeySystemTrackConfiguration associated with the VideoConfiguration.

2.1.11. KeySystemTrackConfiguration

dictionary KeySystemTrackConfiguration {
  DOMString robustness = "";
  DOMString? encryptionScheme = null;
};

The robustness member represents a robustness level as described in [ENCRYPTED-MEDIA].

The encryptionScheme member represents an encryptionScheme as described in [ENCRYPTED-MEDIA-DRAFT].

2.2. Media Capabilities Information

dictionary MediaCapabilitiesInfo {
  required boolean supported;
  required boolean smooth;
  required boolean powerEfficient;
};
dictionary MediaCapabilitiesDecodingInfo : MediaCapabilitiesInfo {
  required MediaKeySystemAccess keySystemAccess;
  MediaDecodingConfiguration configuration;
};
dictionary MediaCapabilitiesEncodingInfo : MediaCapabilitiesInfo {
  MediaEncodingConfiguration configuration;
};

A MediaCapabilitiesInfo has associated supported, smooth, powerEfficient fields which are booleans.

Encoding or decoding is considered power efficient when the power draw is optimal. The definition of optimal power draw for encoding or decoding is left to the user agent. However, a common implementation strategy is to consider hardware usage as indicative of optimal power draw. User agents SHOULD NOT mark hardware encoding or decoding as power efficient by default, as non-hardware-accelerated codecs can be just as efficient, particularly with low-resolution video. User agents SHOULD NOT take the device’s power source into consideration when determining encoding power efficiency unless the device’s power source has side effects such as enabling different encoding or decoding modules.

A MediaCapabilitiesDecodingInfo has associated keySystemAccess which is a MediaKeySystemAccess or null as appropriate.

If the encrypted decoding configuration is supported, the resulting MediaCapabilitiesInfo will include a MediaKeySystemAccess. Authors may use this to create MediaKeys and setup encrypted playback.

A MediaCapabilitiesDecodingInfo has an associated configuration which is the decoding configuration properties used to generate the MediaCapabilitiesDecodingInfo.

A MediaCapabilitiesEncodingInfo has an associated configuration which is the encoding configuration properties used to generate the MediaCapabilitiesEncodingInfo.

2.3. Algorithms

2.3.1. Create a MediaCapabilitiesEncodingInfo

Given a MediaEncodingConfiguration configuration, this algorithm returns a MediaCapabilitiesEncodingInfo. The following steps are run:

  1. Let info be a new MediaCapabilitiesEncodingInfo instance. Unless stated otherwise, reading and writing apply to info for the next steps.
  2. Set configuration to be a new MediaEncodingConfiguration. For every property in configuration create a new property with the same name and value in configuration.
  3. If the user agent is able to encode the media represented by configuration, set supported to true. Otherwise set it to false.
  4. If the user agent is able to encode the media represented by configuration at the indicated framerate, set smooth to true. Otherwise set it to false.
  5. If the user agent is able to encode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false.
  6. Return info.

2.3.2. Create a MediaCapabilitiesDecodingInfo

Given a MediaDecodingConfiguration configuration, this algorithm returns a MediaCapabilitiesDecodingInfo. The following steps are run:

  1. Let info be a new MediaCapabilitiesDecodingInfo instance. Unless stated otherwise, reading and writing apply to info for the next steps.
  2. Set configuration to be a new MediaDecodingConfiguration. For every property in configuration create a new property with the same name and value in configuration.
  3. If configuration.keySystemConfiguration exists:
    1. Set keySystemAccess to the result of running the Check Encrypted Decoding Support algorithm with configuration.
    2. If keySystemAccess is not null set supported to true. Otherwise set it to false.
  4. Otherwise, run the following steps:
    1. Set keySystemAccess to null.
    2. If the user agent is able to decode the media represented by configuration, set supported to true.
    3. Otherwise, set it to false.
  5. If the user agent is able to decode the media represented by configuration at the indicated framerate without dropping frames, set smooth to true. Otherwise set it to false.
  6. If the user agent is able to decode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false.
  7. Return info.

2.3.3. Check Encrypted Decoding Support

Given a MediaDecodingConfiguration config where keySystemConfiguration exists, this algorithm returns a MediaKeySystemAccess or null as appropriate. The following steps are run:

  1. If the keySystem member of config.keySystemConfiguration is not one of the Key Systems supported by the user agent, return null. String comparison is case-sensitive.
  2. Let origin be the origin of the calling context’s Document.
  3. Let implementation be the implementation of config.keySystemConfiguration.keySystem.
  4. Let emeConfiguration be a new MediaKeySystemConfiguration, and initialize it as follows:
    1. Set the initDataTypes attribute to a sequence containing config.keySystemConfiguration.initDataType.
    2. Set the distinctiveIdentifier attribute to config.keySystemConfiguration.distinctiveIdentifier.
    3. Set the persistentState attribute to config.keySystemConfiguration.peristentState.
    4. Set the sessionTypes attribute to config.keySystemConfiguration.sessionTypes.
    5. If audio exists in config, set the audioCapabilities attribute to a sequence containing a single MediaKeySystemMediaCapability, initialized as follows:
      1. Set the contentType attribute to config.audio.contentType.
      2. If config.keySystemConfiguration.audio exists:
        1. Set the robustness attribute to config.keySystemConfiguration.audio.robustness.
        2. Set the encryptionScheme attribute to config.keySystemConfiguration.audio.encryptionScheme.
    6. If video exists in config, set the videoCapabilities attribute to a sequence containing a single MediaKeySystemMediaCapability, initialized as follows:
      1. Set the contentType attribute to config.video.contentType.
      2. If config.keySystemConfiguration.video exists:
        1. Set the robustness attribute to config.keySystemConfiguration.video.robustness.
        2. Set the encryptionScheme attribute to config.keySystemConfiguration.video.encryptionScheme.
  5. Let supported configuration be the result of executing the Get Supported Configuration algorithm on implementation, emeConfiguration, and origin.
  6. If supported configuration is NotSupported, return null and abort these steps.
  7. Let access be a new MediaKeySystemAccess object, and initialize it as follows:
    1. Set the keySystem attribute to emeConfiguration.keySystem.
    2. Let the configuration value be supported configuration.
    3. Let the cdm implementation value be implementation.
  8. Return access.

[Exposed=Window]
partial interface Navigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};
[Exposed=Worker]
partial interface WorkerNavigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

2.5. Media Capabilities Interface

[Exposed=(Window, Worker)]
interface MediaCapabilities {
  [NewObject] Promise<MediaCapabilitiesDecodingInfo> decodingInfo(MediaDecodingConfiguration configuration);
  [NewObject] Promise<MediaCapabilitiesEncodingInfo> encodingInfo(MediaEncodingConfiguration configuration);
};

2.5.1. Media Capabilities Task Source

The task source for the tasks mentioned in this specification is the media capabilities task source.

When an algorithm queues a Media Capabilities task T, the user agent MUST queue a global task T on the media capabilities task source using the global object of the the current realm record.

2.5.2. decodingInfo() Method

The decodingInfo() method MUST run the following steps:

  1. If configuration is not a valid MediaDecodingConfiguration, return a Promise rejected with a newly created TypeError.
  2. If configuration.keySystemConfiguration exists, run the following substeps:
    1. If the global object is of type WorkerGlobalScope, return a Promise rejected with a newly created DOMException whose name is InvalidStateError.
    2. If the global object’s relevant settings object is a non-secure context, return a Promise rejected with a newly created DOMException whose name is SecurityError.
  3. Let p be a new Promise.
  4. Run the following steps in parallel:
    1. Run the Create a MediaCapabilitiesDecodingInfo algorithm with configuration.
    2. Queue a Media Capabilities task to resolve p with its result.
  5. Return p.

Note, calling decodingInfo() with a keySystemConfiguration present may have user-visible effects, including requests for user consent. Such calls should only be made when the author intends to create and use a MediaKeys object with the provided configuration.

2.5.3. encodingInfo() Method

The encodingInfo() method MUST run the following steps:

  1. If configuration is not a valid MediaConfiguration, return a Promise rejected with a newly created TypeError.
  2. Let p be a new Promise.
  3. Run the following steps in parallel:
    1. Run the Create a MediaCapabilitiesEncodingInfo algorithm with configuration.
    2. Queue a Media Capabilities task to resolve p with its result.
  4. Return p.

3. Security and Privacy Considerations

This specification does not introduce any security-sensitive information or APIs but it provides easier access to some information that can be used to fingerprint users.

3.1. Decoding/Encoding and Fingerprinting

The information exposed by the decoding/encoding capabilities can already be discovered via experimentation with the exception that the API will likely provide more accurate and consistent information. This information is expected to have a high correlation with other information already available to web pages as a given class of device is expected to have very similar decoding/encoding capabilities. In other words, high end devices from a certain year are expected to decode some type of videos while older devices may not. Therefore, it is expected that the entropy added with this API isn’t going to be significant.

HDR detection is more nuanced. Adding colorGamut, transferFunction, and hdrMetadataType has the potential to add significant entropy. However, for UAs whose decoders are implemented in software and therefore whose capabilities are fixed across devices, this feature adds no effective entropy. Additionally, for many cases, devices tend to fall into large categories, within which capabilities are similar thus minimizing effective entropy.

If an implementation wishes to implement a fingerprint-proof version of this specification, it would be recommended to fake a given set of capabilities (i.e., decode up to 1080p VP9, etc.) instead of returning always yes or always no as the latter approach could considerably degrade the user’s experience. Another mitigation could be to limit these Web APIs to top-level browsing contexts. Yet another is to use a privacy budget that throttles and/or blocks calls to the API above a threshold.

4. Examples

4.1. Query playback capabilities with decodingInfo()

The following example shows how to use decodingInfo() to query media playback capabilities when using Media Source Extensions [media-source].

<script>
  const contentType = 'video/mp4;codecs=avc1.640028';

  const configuration = {
    type: 'media-source',
    video: {
      contentType: contentType,
      width: 640,
      height: 360,
      bitrate: 2000,
      framerate: 29.97
    }
  };

  navigator.mediaCapabilities.decodingInfo(configuration)
    .then((result) => {
      console.log('Decoding of ' + contentType + ' is'
        + (result.supported ? '' : ' NOT') + ' supported,'
        + (result.smooth ? '' : ' NOT') + ' smooth and'
        + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
    })
    .catch((err) => {
      console.error(err, ' caused decodingInfo to reject');
    });
</script>

The following examples show how to use decodingInfo() to query WebRTC receive capabilities [webrtc].

<script>
  const contentType = 'video/VP8';

  const configuration = {
    type: 'webrtc',
    video: {
      contentType: contentType,
      width: 640,
      height: 360,
      bitrate: 2000,
      framerate: 25
    }
  };

  navigator.mediaCapabilities.decodingInfo(configuration)
    .then((result) => {
      console.log('Decoding of ' + contentType + ' is'
        + (result.supported ? '' : ' NOT') + ' supported,'
        + (result.smooth ? '' : ' NOT') + ' smooth and'
        + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
    })
    .catch((err) => {
      console.error(err, ' caused decodingInfo to reject');
    });
</script>
<script>
  const contentType = 'video/H264;level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f';

  const configuration = {
    type: 'webrtc',
    video: {
      contentType: contentType,
      width: 640,
      height: 360,
      bitrate: 2000,
      framerate: 25
    }
  };

  navigator.mediaCapabilities.decodingInfo(configuration)
    .then((result) => {
      console.log('Decoding of ' + contentType + ' is'
        + (result.supported ? '' : ' NOT') + ' supported,'
        + (result.smooth ? '' : ' NOT') + ' smooth and'
        + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
    })
    .catch((err) => {
      console.error(err, ' caused decodingInfo to reject');
    });
</script>

4.2. Query recording capabilities with encodingInfo()

The following example shows how to use encodingInfo() to query WebRTC send capabilities [webrtc] including the optional field scalabilityMode.
<script>
  const contentType = 'video/VP9';

  const configuration = {
    type: 'webrtc',
    video: {
      contentType: contentType,
      width: 640,
      height: 480,
      bitrate: 10000,
      framerate: 29.97,
      scalabilityMode: "L3T3_KEY"
    }
  };

  navigator.mediaCapabilities.encodingInfo(configuration)
    .then((result) => {
      console.log(contentType + ' is:'
        + (result.supported ? '' : ' NOT') + ' supported,'
        + (result.smooth ? '' : ' NOT') + ' smooth and'
        + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
    })
    .catch((err) => {
      console.error(err, ' caused encodingInfo to reject');
    });
</script>
The following example can also be found in this codepen with minimal modifications.
<script>
  const contentType = 'video/webm;codecs=vp8';

  const configuration = {
    type: 'record',
    video: {
      contentType: contentType,
      width: 640,
      height: 480,
      bitrate: 10000,
      framerate: 29.97
    }
  };

  navigator.mediaCapabilities.encodingInfo(configuration)
    .then((result) => {
      console.log(contentType + ' is:'
        + (result.supported ? '' : ' NOT') + ' supported,'
        + (result.smooth ? '' : ' NOT') + ' smooth and'
        + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
    })
    .catch((err) => {
      console.error(err, ' caused encodingInfo to reject');
    });
</script>

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[ENCRYPTED-MEDIA]
Joey Parrish; Greg Freedman. Encrypted Media Extensions. 7 August 2024. WD. URL: https://www.w3.org/TR/encrypted-media-2/
[ENCRYPTED-MEDIA-DRAFT]
Encrypted Media Extensions. 13 December 2019. URL: https://w3c.github.io/encrypted-media
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[MEDIA-SOURCE]
Jean-Yves Avenard; Mark Watson. Media Source Extensions™. 4 July 2024. WD. URL: https://www.w3.org/TR/media-source-2/
[MIMESNIFF]
Gordon P. Hemsley. MIME Sniffing Standard. Living Standard. URL: https://mimesniff.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SMPTE-ST-2084]
High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays. 2014. URL: https://ieeexplore.ieee.org/document/7291452
[SMPTE-ST-2086]
Mastering Display Color Volume Metadata Supporting High Luminance and Wide Color Gamut Images. 2014. URL: https://ieeexplore.ieee.org/document/7291707
[SMPTE-ST-2094]
Dynamic Metadata for Color Volume Transform Core Components. 2016. URL: https://ieeexplore.ieee.org/document/7513361
[sRGB]
Multimedia systems and equipment - Colour measurement and management - Part 2-1: Colour management - Default RGB colour space - sRGB. URL: https://webstore.iec.ch/publication/6169
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
[WEBRTC]
Cullen Jennings; et al. WebRTC: Real-Time Communication in Browsers. 6 March 2023. REC. URL: https://www.w3.org/TR/webrtc/

Informative References

[MEDIASTREAM-RECORDING]
Miguel Casas-sanchez. MediaStream Recording. 11 May 2023. WD. URL: https://www.w3.org/TR/mediastream-recording/
[RFC4855]
S. Casner. Media Type Registration of RTP Payload Formats. February 2007. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc4855
[RFC6838]
N. Freed; J. Klensin; T. Hansen. Media Type Specifications and Registration Procedures. January 2013. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc6838
[RFC9110]
R. Fielding, Ed.; M. Nottingham, Ed.; J. Reschke, Ed.. HTTP Semantics. June 2022. Internet Standard. URL: https://httpwg.org/specs/rfc9110.html
[WEBRTC-SVC]
Bernard Aboba. Scalable Video Coding (SVC) Extension for WebRTC. 17 August 2024. WD. URL: https://www.w3.org/TR/webrtc-svc/

IDL Index

dictionary MediaConfiguration {
  VideoConfiguration video;
  AudioConfiguration audio;
};

dictionary MediaDecodingConfiguration : MediaConfiguration {
  required MediaDecodingType type;
  MediaCapabilitiesKeySystemConfiguration keySystemConfiguration;
};

dictionary MediaEncodingConfiguration : MediaConfiguration {
  required MediaEncodingType type;
};

enum MediaDecodingType {
  "file",
  "media-source",
  "webrtc"
};

enum MediaEncodingType {
  "record",
  "webrtc"
};

dictionary VideoConfiguration {
  required DOMString contentType;
  required unsigned long width;
  required unsigned long height;
  required unsigned long long bitrate;
  required double framerate;
  boolean hasAlphaChannel;
  HdrMetadataType hdrMetadataType;
  ColorGamut colorGamut;
  TransferFunction transferFunction;
  DOMString scalabilityMode;
  boolean spatialScalability;
};

enum HdrMetadataType {
  "smpteSt2086",
  "smpteSt2094-10",
  "smpteSt2094-40"
};

enum ColorGamut {
  "srgb",
  "p3",
  "rec2020"
};

enum TransferFunction {
  "srgb",
  "pq",
  "hlg"
};

dictionary AudioConfiguration {
  required DOMString contentType;
  DOMString channels;
  unsigned long long bitrate;
  unsigned long samplerate;
  boolean spatialRendering;
};

dictionary MediaCapabilitiesKeySystemConfiguration {
  required DOMString keySystem;
  DOMString initDataType = "";
  MediaKeysRequirement distinctiveIdentifier = "optional";
  MediaKeysRequirement persistentState = "optional";
  sequence<DOMString> sessionTypes;
  KeySystemTrackConfiguration audio;
  KeySystemTrackConfiguration video;
};

dictionary KeySystemTrackConfiguration {
  DOMString robustness = "";
  DOMString? encryptionScheme = null;
};

dictionary MediaCapabilitiesInfo {
  required boolean supported;
  required boolean smooth;
  required boolean powerEfficient;
};

dictionary MediaCapabilitiesDecodingInfo : MediaCapabilitiesInfo {
  required MediaKeySystemAccess keySystemAccess;
  MediaDecodingConfiguration configuration;
};

dictionary MediaCapabilitiesEncodingInfo : MediaCapabilitiesInfo {
  MediaEncodingConfiguration configuration;
};

[Exposed=Window]
partial interface Navigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

[Exposed=Worker]
partial interface WorkerNavigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

[Exposed=(Window, Worker)]
interface MediaCapabilities {
  [NewObject] Promise<MediaCapabilitiesDecodingInfo> decodingInfo(MediaDecodingConfiguration configuration);
  [NewObject] Promise<MediaCapabilitiesEncodingInfo> encodingInfo(MediaEncodingConfiguration configuration);
};

Issues Index

The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current definition is a placeholder.