Copyright © 2024 World Wide Web Consortium. W3C® liability, trademark and permissive document license rules apply.
This specification extends MediaStreamTrack
to provide an
optional hint about the user's preference on how the media should
be treated when insufficient resources for perfect reproduction
are available.
This optional hint permits
MediaStreamTrack
sinks such as RTCPeerConnection
(defined in [webrtc]) or MediaRecorder (defined in
[mediastream-recording]) that process a track's audio or video
content to choose processing parameters that are appropriate to the
user's preferences.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Web Real-Time Communications Working Group as a Working Draft using the Recommendation track.
Publication as a Working Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 03 November 2023 W3C Process Document.
Algorithms used for processing speech and music differ greatly. Echo-cancellation algorithms developed for speech-type content might not work well on music, and noise-suppression algorithms might remove drum snares or other "noisy" content. While this makes speech more intelligible it is less appropriate for music signals.
For video, webcam content often require denoising and is often intelligible even when downscaled or with high quantization levels. Screencast content of presentations or webpages with a lot of text content is completely unintelligible if the quantization levels are too high or if the content is downscaled or otherwise blurry.
Without automatic detection of media content, a MediaStreamTrack
consumer can only make an educated guess. This guess may be based on
assuming that screencast content, such as
chrome.desktopCapture
, contains text content and must use low
quantization levels, and drop frames extensively to meet bitrate
requirements. Another assumption is that regular USB video devices provide
webcam video, and higher quantization levels and downscaling are
acceptible.
While usually appropriate this educated guess leads to sub-optimal settings when incorrect. This manifests as high framedropping when screencasting high-motion content such as a movie or streaming a video game and treating it as text. Treating highly-detailed content as regular webcam video on the other hand leads to too-blurry content when being either quantized or downscaled beyond readability to meet bitrate requirements. This mismatch may also happen when HDMI video-capture cards are seen as USB webcams but actually screencast webpage text.
In some cases the web application can make a more-educated guess or take user input to inform consumers of what kind of content is being encoded. A web application that streams video-game content would be able to preserve motion from desktop capture at the cost of individual frame detail. A music-studio application would be able to prevent noise suppression from removing snares from a music track.
These settings are not intended to replace encoder-level settings completely but rather complement them with a simpler hint that does not require broad knowledge of video encoders, audio-processing steps or more extensive tuning.
A separate section of this specification describes expected behavior for specific components that process a MediaStreamTrack.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key word MUST in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The terms RTCRtpSender
and RTCRtpSendParameters
are defined in [WEBRTC].
WebIDLpartial interface MediaStreamTrack {
attribute DOMString contentHint
;
};
This specification extends MediaStreamTrack
and makes use of
its kind
attribute as defined in [GETUSERMEDIA].
Each MediaStreamTrack
has an associated application-set
content hint, which is initially ""
, signifying unset.
This application-set content hint corresponds to the
contentHint
attribute of MediaStreamTrack
which may be
used by the web application to provide a hint of what type of content is
contained within the track, to guide how it should be treated by
MediaStreamTrack
consumers.
Valid values for the application-set content hint are dependent on the
kind
of MediaStreamTrack
contained. On setting
contentHint
to value,
MediaStreamTrack
's kind
attribute is
"audio"
, and value is not one of ""
,
"speech"
, "speech-recognition"
, or "music"
, abort these steps.
MediaStreamTrack
's kind
attribute is
"video"
, and value is not one of ""
,
"motion"
, "detail"
or "text"
,
abort these steps.
MediaStreamTrack
's application-set content hint to
value.
MediaStreamTrack
according to the new value of
its application-set content hint. This adaptation should happen as
quickly as reasonable, e.g. within the next couple of captured video
frames or audio buffers.
On getting contentHint
,
MediaStreamTrack
's application-set content hint.
Note that the initial value of application-set content hints is
""
, corresponding to that no hint has been provided. It
does not default to the implementation's best guess of contained type of
content.
Audio content hints are only applicable when the MediaStreamTrack
contains an audio track.
Audio content hints | |
---|---|
"" | No hint has been provided, the implementation should make its best-informed guess on how to handle contained audio data. This may be inferred from how the track was opened or by doing content analysis. |
"speech" | The track should be treated as if it contains speech data. Consuming this signal it may be appropriate to apply noise suppression or boost intelligibility of the incoming signal. |
"speech-recognition" | The track should be treated as if it contains data for the purpose of speech recognition by a machine. Consuming this signal it may be appropriate to boost intelligibility of the incoming signal for transcription and turn off audio-processing components that are used for human consumption. |
"music" | The track should be treated as if it contains music data. Generally this might imply tuning or turning off audio-processing components that are used to process speech data to prevent the audio from being distorted. |
Video content hints are only applicable when the MediaStreamTrack
contains a video track.
Video content hints | |
---|---|
"" | No hint has been provided, the implementation should make its best-informed guess on how contained video content should be treated. This can for example be inferred from how the track was opened or by doing content analysis. |
"motion" | The track should be treated as if it contains video where motion is important. This is normally webcam video, movies or video games. Quantization artefacts and downscaling are acceptible in order to preserve motion as well as possible while still retaining target bitrates. During low bitrates when compromises have to be made, more effort is spent on preserving frame rate than edge quality and details. |
"detail" | The track should be treated as if video details are extra important. This is generally applicable to presentations or web pages with text content, painting or line art. This setting would normally optimize for detail in the resulting individual frames rather than smooth playback. Artefacts from quantization or downscaling that make small text or line art unintelligible should be avoided. |
"text" |
The track should be treated as if video details are extra important,
and that significant sharp edges and areas of consistent color can
occur frequently.
This is generally applicable to presentations or web pages with text
content. This setting would normally optimize for
detail in the resulting individual frames rather than smooth
playback, and may take advantage of encoder tools that optimize for
text rendering.
Artefacts from quantization or downscaling that make small text
or line art unintelligible should be avoided.
|
When setting a contentHint
value for a MediaStreamTrack
, the
UA MUST apply a default as follows:
"music"
, and for constraints
echoCancellation, autoGainControl and noiseSuppression
apply a default of "false"."speech"
, and for constraints
echoCancellation and autoGainControl, apply a default of
"true"."speech-recognition"
, and for
constraints echoCancellation, autoGainControl, and noiseSuppression,
apply a default of "false".To apply a default to the constraint c with value t, perform the following steps:
Whenever the "apply constraints" algorithm is subsequently run, the UA MUST choose the remembered value t if it is now a permitted value.
When setting a contentHint
value of ""
, all remembered values of
t are removed.
When encoding video, and some constraint (bandwidth, CPU) prevents encoding at the configured framerate and resolution, the encoder must make a choice on how to modify the encoding parameters.
This section defines terms to describe the choice, and an enum that can be used to indicate that choice in an API.
WebIDLenum RTCDegradationPreference
{
"maintain-framerate
",
"maintain-resolution
",
"balanced
"
};
Enum value | Description |
---|---|
maintain-framerate |
Degrade resolution in order to maintain framerate. |
maintain-resolution |
Degrade framerate in order to maintain resolution. |
balanced |
Degrade a balance of framerate and resolution. |
RTCRtpSendParameters
that allows this
to be explicitly indicated for an RTCRtpSender
:
WebIDLpartial dictionary RTCRtpSendParameters
{
RTCDegradationPreference
degradationPreference
;
};
RTCRtpSendParameters
New MembersdegradationPreference
of type
RTCDegradationPreference
.
When bandwidth is constrained and the
RTCRtpSender
needs to choose between degrading
resolution or degrading framerate,
degradationPreference
indicates which is
preferred.
An RTCRtpSender
transmitting a MediaStreamTrack
for which a
contentHint
attribute has been
set MUST use the following degradation preferences, unless an
explicit degradationPreference
attribute has been set in the sender's
parameters:
"motion"
,
use "maintain-framerate
".
"detail"
,
use "maintain-resolution
".
"text"
,
use "maintain-resolution
". In addition, if the
encoding codec is AV1, activate encoding tools for "text" mode.
For a video track with the attribute value "text"
, if the
encoding codec is AV1, activate encoding tools for "text"
mode.
This specification adds an API where the user can send information to the user agent. This information is not communicated elsewhere, and is not stored in any permanent location.
By probing this API, the client may get some information on how the user agent implements this specification, which may give some insight into its configuration. Apart from this, no user agent information or data is exposed to the client through this API, and the API does not give any access to browser UI, influence over the user agent's security characteristics or the generation of temporary identifiers.
This API does not distinguish between first-party, third-party or incognito contexts; it behaves identically in all these cases.
No personally identifiable or otherwise sensitive information is handled via the use of this API.
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: