Copyright
©
2012
2014
W3C
®
(
MIT
,
ERCIM
,
Keio
,
Beihang
),
All
Rights
Reserved.
W3C
liability
,
trademark
and
document
use
rules
apply.
This
document
aggregates
presents
the
accessibility
requirements
of
users
with
disabilities
that
the
W3C
HTML5
Accessibility
Task
Force
has
collected
have
with
respect
to
audio
and
video
on
the
Web.
web.
It
first
provides
an
introduction
to
the
needs
of
users
with
disabilties
disabilities
in
relation
to
audio
and
video.
Then it explains what alternative content technologies have been developed to help such users gain access to the content of audio and video.
A
third
section
explains
how
these
content
technologies
fit
in
the
larger
picture
of
accessibility,
both
technically
within
a
Web
web
user
agent
and
from
a
production
process
point
of
view.
This
document
is
most
explicitly
not
a
collection
of
baseline
user
agent
or
authoring
tool
requirements.
It
is
important
to
recognize
that
not
all
user
agents
(nor
all
authoring
tools)
will
support
all
the
features
discussed
in
this
document.
Rather,
this
document
attempts
to
supply
a
comprehensive
collection
of
user
requirements
needed
to
support
media
accessibility
in
the
context
of
HTML5.
As
such,
it
should
be
expected
that
this
document
will
continue
to
develop
for
some
time.
Please
also
note
this
document
is
not
an
inventory
of
technology
currently
provided
by,
or
missing
from
HTML5
specification
drafts.
Technology
listed
here
is
listed
here
because
it's
important
for
accommodating
the
alternative
access
needs
of
users
with
disabilities
to
web-based
media.
This
document
is
our
an
inventory
of
Media
Accessibility
User
Requirements.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This
is
a
First
Public
Working
Draft
by
the
Protocols
&
Formats
Working
Group
(PFWG)
of
the
Web
Accessibility
Initiative
.
The
This
document
was
originally
developed
in
is
reasonably
stable,
and
represents
a
wiki
page
consensus
within
the
Working
Group.
This
draft
addresses
comments
received
since
the
publication
of
the
HTML
Accessibility
Task
Force
previous
working
draft
.
A
diff
to
elaborate
accessibility
requirements
for
HTML
5
media.
The
file
identifying
the
resulting
documentation
changes
is
applicable
to
any
content
technology
that
provides
video
or
audio
support,
so
the
document
available
along
with
a
commit
history
.
The
Working
Group
is
published
by
the
PFWG
looking
for
feedback
prior
to
more
clearly
serve
publication
as
a
universal
resource.
Changes
to
the
above
wiki
page
may
be
migrated
to
this
document
over
time,
but
this
document
reflects
the
consensus
of
the
PFWG.
After
this
document
receives
thorough
public
review,
Note.
In
particular,
the
PFWG
plans
to
publish
it
as
a
Working
Group
Note.
A
history
of
seeks
input
about
substantive
changes
to
Media
Accessibility
User
Requirements
is
available.
Feedback
on
the
requirements
is
essential
to
ongoing
efforts
to
make
media
content
practices
and
technologies
accessible.
The
PFWG
asks
in
particular:
Are
the
use
cases
for
media
accessibility
clear
and
complete?
Do
the
features
to
enhance
media
accessibility
meet
the
use
cases?
Are
since
the
technical
requirements
for
media
accessibility
complete
and
achievable?
last
publication
of
this
document.
Start
with
the
instructions
for
commenting
page
to
submit
comments
(preferred),
or
To
comment,
send
email
to
public-pfwg-comments@w3.org
(
comment
archive
).
Comments
should
be
made
are
requested
by
10
February
2012
19
September
2014
.
In-progress
updates
to
the
document
may
be
viewed
in
the
publicly
visible
editors'
draft
.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 1 August 2014 W3C Process Document .
The following User Requirements have also been distilled into a Media Accessibility Checklist . Developers and implementers may want to refer to this checklist when implementing audio and video content and features.
Editorial note: This section is a rough draft. It will be edited to align with How People with Disabilities Use the Web once that document is complete. This draft is included now to provide general background for sections 2 and 3 of this document.
Comprehension
of
media
may
be
affected
by
loss
of
visual
function,
loss
of
audio
function,
cognitive
issues,
or
a
combination
of
all
three.
Cognitive
disabilities
may
affect
access
to
and/or
comprehension
of
media.
Physical
disabilities
such
as
dexterity
impairment,
loss
of
limbs,
or
loss
of
use
of
limbs
may
affect
access
to
media.
Once
richer
forms
of
media,
such
as
virtual
reality,
become
more
commonplace,
tactile
issues
may
come
into
play.
Control
of
the
media
player
can
be
an
important
issue,
e.g.,
for
people
with
physical
disabilties,
disabilities,
however
this
is
typically
not
addressed
by
the
media
formats
themselves,
but
is
a
requirement
of
the
technology
used
to
build
the
player.
People
who
are
blind
cannot
access
information
if
it
is
presented
only
in
the
visual
mode.
They
require
information
in
an
alternative
representation,
which
typically
means
the
audio
mode,
although
information
can
also
be
presented
as
text.
It
is
important
to
remember
that
not
only
the
main
video
is
inaccessible,
but
any
other
visible
ancillary
information
such
as
stock
tickers,
status
indicators,
or
other
on-screen
graphics,
as
well
as
any
visual
controls
needed
to
operate
the
content.
Since
people
who
are
blind
use
a
screen
reader
and/or
refreshable
braille
display,
these
assistive
technologies
(
AT
s)
(ATs)
need
to
work
hand-in-hand
with
the
access
mechanism
provided
for
the
media
content.
People
with
low
vision
can
use
some
visual
information.
Depending
on
their
visual
ability
they
might
have
specific
issues
such
as
difficulty
discriminating
foreground
information
from
background
information,
or
discriminating
colors.
Glare
caused
by
excessive
scattering
in
the
eye
can
be
a
significant
problem,
challenge,
especially
for
very
bright
content
or
surroundings.
They
may
be
unable
to
react
quickly
to
transient
information,
and
may
have
a
narrow
angle
of
view
and
so
may
not
detect
key
information
presented
temporarily
where
they
are
not
looking,
or
in
text
that
is
moving
or
scrolling.
A
person
using
a
low-vision
AT
aid,
such
as
a
will
likely
use
screen
magnifier,
magnification
software.
This
means
that
they
will
only
be
viewing
a
portion
of
the
screen,
and
so
must
manage
tracking
media
content
via
their
AT
.
They
may
have
difficulty
reading
when
text
is
too
small,
has
poor
background
contrast,
contrast
(too
high
or
too
low),
or
when
outline
outlined
or
other
fancy
font
types
or
effects
are
used.
If
the
font
is
an
image,
it
is
likely
to
appear
grainy
when
magnified.
They
may
be
using
an
AT
that
adjusts
all
the
colors
of
the
screen,
such
as
inverting
the
colors,
so
the
media
content
must
be
viewable
through
the
AT
.
Users
with
low
vision
will
often
benefit
from
the
same
text
streams
and
instructions
that
are
sometimes
hidden
or
displayed
off
screen
for
users
of
screen
readers
or
refreshable
Braille.
A significant percentage of the population has atypical color perception, and may not be able to discriminate between different colors, or may miss key information when coded with color only. They might have difficulty discriminating foreground information from background information, or discriminating colors. Such issues can be minimized when the user has the ability to customize the color and contrast of text content.
People who are deaf generally cannot use audio. Thus, an alternative representation is required, typically through synchronized captions and/or sign translation.
People
who
are
hard
of
hearing
may
be
able
to
use
some
audio
material,
but
might
not
be
able
to
discriminate
certain
types
of
sound,
and
may
miss
any
information
presented
as
audio
only
if
it
contains
frequencies
they
can't
hear,
or
is
masked
by
background
noise
or
distortion.
They
may
miss
audio
which
is
too
quiet,
or
of
poor
quality.
Speech
may
be
problematic
challenging
if
it
is
too
fast
and
cannot
be
played
back
more
slowly.
Information
presented
using
multichannel
audio
(e.g.,
stereo)
may
not
be
perceived
by
people
who
are
deaf
in
one
ear.
Individuals who are deaf-blind have a combination of conditions that may result in one of the following: blindness and deafness; blindness and difficulty in hearing; low vision and deafness; or low vision and difficulty in hearing. Depending on their combination of conditions, individuals who are deaf-blind may need captions that can be enlarged, changed to high-contrast colors, or otherwise styled; or they may need captions and/or described video that can be presented with AT (e.g., a refreshable braille display). They may need synchronized captions and/or described video, or they may need a non-time-based transcript which they can read at their own pace.
People
with
physical
disabilities
such
as
poor
dexterity,
loss
of
limbs,
or
loss
of
use
of
limbs
may
use
the
keyboard
alone
rather
than
the
combination
of
a
pointing
device
plus
keyboard
to
interact
with
content
and
controls,
or
may
use
a
switch
with
an
on-screen
keyboard,
or
other
assistive-technology
access.
assistive
technology.
The
player
itself
must
be
usable
via
the
keyboard
and
pointing
devices.
The
user
must
have
full
access
to
all
player
controls,
including
methods
for
selecting
alternative
content.
Cognitive
and
neurological
disabilities
include
a
wide
range
of
conditions
that
may
include
intellectual
disabilities
(called
learning
disabilities
in
some
regions),
autism-spectrum
disorders,
memory
impairments,
mental-health
disabilities,
attention-deficit
disorders,
audio-
and/or
visual-perceptive
disorders,
dyslexia
and
dyscalculia
(called
learning
disabilities
in
other
some
regions),
or
seizure
disorders.
Necessary
accessibility
supports
vary
widely
for
these
different
conditions.
Individuals
with
some
conditions
may
process
information
aurally
better
than
by
reading
text;
therefore,
information
that
is
presented
as
text
embedded
in
a
video
should
also
be
available
as
audio
descriptions.
Individuals
with
other
conditions
may
need
to
reduce
distractions
or
flashing
in
presentations
of
video.
Some
conditions
such
as
autism-spectrum
disorders
may
have
multi-system
effects
and
individuals
may
need
a
combination
of
different
accommodation.
Overall,
the
media
experience
for
people
on
the
autism
spectrum
should
be
customizable
and
well
designed
so
as
to
not
be
overwhelming.
Care
must
be
taken
to
present
a
media
experience
that
focuses
on
the
purpose
of
the
content
and
provides
alternative
content
in
a
clear,
concise
manner.
A number of alternative content types have been developed to help users with sensory disabilities gain access to audio-visual content. This section lists them, explains generally what they are, and provides a number of requirements on each that need to be satisfied with technology developed in HTML5 around the media elements.
Described video contains descriptive narration of key visual elements designed to make visual media accessible to people who are blind or visually impaired. The descriptions include actions, costumes, gestures, scene changeset or any other important visual information that someone who cannot see the screen might ordinarily miss. Descriptions are traditionally audio recordings timed and recorded to fit into natural pauses in the program, although they may also briefly obscure the main audio track. (See the section on extended descriptions for an alternative approach.) The descriptions are usually read by a narrator with a voice that cannot be easily confused with other voices in the primary audio track. They are authored to convey objective information (e.g., a yellow flower) rather than subjective judgments (e.g., a beautiful flower).
As with captions, descriptions can be open or closed.
Described video provides benefits that reach beyond blind or visually impaired viewers; e.g., students grappling with difficult materials or concepts. Descriptions can be used to give supplemental information about what is on screen—the structure of lengthy mathematical equations or the intricacies of a painting, for example.
Described video is available on some television programs and in many movie theaters in the U.S. and other countries. Regulations in the U.S. and Europe are increasingly focusing on description, especially for television, reflecting its priority with citizens who have visual impairments. The technology needed to deliver and render basic video descriptions is in fact relatively straightforward, being an extension of common audio-processing solutions. Playback products must support multi-audio channels required for description, and any product dealing with broadcast TV content must provide adequate support for descriptions. Descriptions can also provide text that can be indexed and searched.
Systems supporting described video that are not open descriptions must:
Described video that uses text for the description source rather than a recorded voice creates specific requirements.
Text video descriptions (TVDs) are delivered to the client as text and rendered locally by assistive technology such as a screen reader or a braille device. This can have advantages for screen-reader users who want full control of the preferred voice and speaking rate, or other options to control the speech synthesis.
Text video descriptions are provided as text files containing start times for each description cue. Since the duration that a screen reader takes to read out a description cannot be determined during authoring of the cues, it is difficult to ensure they don't obscure the main audio or other description cues. This is likely to be caused by at least three reasons:
People with low-vision may also benefit from having access to text video descriptions.
Systems supporting text video descriptions must:
Video descriptions are usually provided as recorded speech, timed to play in the natural pauses in dialog or narration. In some types of material, however, there is not enough time to present sufficient descriptions. To meet such cases, the concept of extended description was developed. Extended descriptions work by pausing the video and program audio at key moments, playing a longer description than would normally be permitted, and then resuming playback when the description is finished playing. This will naturally extend the timeline of the entire presentation. This procedure has not been possible in broadcast television; however, hard-disk recording and on-demand Internet systems can make this a practical possibility.
Extended
video
description
(EVD)
has
been
reported
to
have
benefits
for
cognitive
disabilities;
for
example,
it
might
benefit
people
with
Aspergers
Asperger
Syndrome
and
other
Autistic
Spectrum
Disorders,
in
that
it
can
make
connections
between
cause
and
effect,
point
out
what
is
important
to
look
at,
or
explain
moods
that
might
otherwise
be
missed.
Systems supporting extended audio descriptions must:
Note
that
this
Because
the
user
is
the
ultimate
arbiter
of
the
rate
at
which
TTS
playback
occurs,
it
is
not
feasible
for
an
advanced
feature
and
would
only
author
to
guarantee
that
any
texted
audio
description
can
be
expected
by
advanced
systems.
played
within
the
natural
pauses
in
dialog
or
narration
of
the
primary
audio
resource.
Therefore,
all
texted
descriptions
must
be
treated
as
extended
text
descriptions
potentially
requiring
the
pausing
and
resumption
of
primary
resource
playback.
A relatively recent development in television accessibility is the concept of clean audio , which takes advantage of the increased adoption of multichannel audio. This is primarily aimed at audiences who are hard of hearing, and consists of isolating the audio channel containing the spoken dialog and important non-speech information that can then be amplified or otherwise modified, while other channels containing music or ambient sounds are attenuated.
Using the isolated audio track may make it possible to apply more sophisticated audio processing such as pre-emphasis filters, pitch-shifting, and so on to tailor the audio to the user's needs, since hearing loss is typically frequency-dependent, and the user may have usable hearing in some bands yet none at all in others.
Systems supporting clean audio and multiple audio tracks must:
For people who are deaf or hard-of-hearing, captioning is a prime alternative representation of audio. Captions are in the same language as the main audio track and, in contrast to foreign-language subtitles, render a transcription of dialog or narration as well as important non-speech information, such as sound effects, music, and laughter. Historically, captions have been either closed or open. Closed captions have been transmitted as data along with the video but were not visible until the user elected to turn them on, usually by invoking an on-screen control or menu selection. Open captions have always been visible; they had been merged with the video track and could not be turned off.
Ideally, captions should be a verbatim representation of the audio; however, captions are sometimes edited for various reasons— for example, for reading speed or for language level. In general, consumers of captions have expressed that the text should represent exactly what is in the audio track. If edited captions are provided, then they should be clearly marked as such, and the full verbatim version should also be available as an option.
The timing of caption text can coincide with the mouth movement of the speaker (where visible), but this is not strictly necessary. For timing purposes, captions may sometimes precede or extend slightly after the audio they represent. Captioning should also use adequate means to distinguish between speakers as turn-taking occurs during conversation; this has in the past been done by positioning the text near the speaker, by associating different colors to different speakers, or by putting the name and a colon in front of the text line of a speaker.
Captions
are
useful
to
a
wide
array
of
users
in
addition
to
their
originally
intended
audiences.
Gyms,
bars
bars,
and
restaurants
regularly
employ
captions
as
a
way
for
patrons
to
watch
television
while
in
those
establishments.
People
learning
to
read
or
learning
the
language
of
the
country
where
they
live
as
a
second
language
also
benefit
from
captions:
research
has
shown
that
captions
help
reinforce
vocabulary
and
language.
Captions
can
also
provide
a
powerful
search
capability,
allowing
users
and
search
engines
to
search
the
caption
text
to
locate
a
specific
video
or
an
exact
point
in
a
video.
Formats for captions, subtitles or foreign-language subtitles must:
Most of the time, the main audio track would be the best candidate for the timebase. Where a video without audio, but with a text track, is available, the video track becomes the timebase master. Also, there may be situations where an explicit timing track is available.
This should be possible both within media resources and caption formats.
This means that caption cues should be able to either let the start time of the subsequent cue be determined by the duration of the cue or have the end time be implied by the start of the next cue. For overlapping captions, explicit start and end times are then required.
This means that determined character encodings should be supported - which could be either by making the character encoding explicit or by enforcing a single default one such as UTF-8.
The minimum requirement is a bounding box (with an optional background) into which text is flowed, and that probably needs to be pixel aligned. The absolute position of text within the bounding box is less critical, although it is important to be able to avoid bad word-breaks and have adequate white space around letters and so on. There is more on this in a separate requirement.
The
caption
format
could
provide
a
min-width/min-height
for
its
bounding
box,
which
typically
is
calculated
from
the
bottom
of
the
video
viewport,
but
can
be
placed
elsewhere
by
the
Web
web
page,
with
the
Web
web
page
being
able
to
make
that
box
larger
and
scale
the
text
relatively,
too.
The
positions
inside
the
box
should
probably
be
into
regions,
such
as
top,
right,
bottom,
left,
center.
This typically relates to multiple text cues that are defined on overlapping times. If the cues' rendering target are made out to different spatial regions, they can be displayed simultaneously.
Internationalization is important not just for subtitles, as captions can be used in all languages.
The user should have final control over rendering styles like color and fonts; e.g., through user preferences.
It may be technically possible to have cues without text.
Similarly, in karaoke, individual characters are often "painted on".
Caption/subtitle files that are alternatives in different languages are probably best provided in different caption resources and are user selectable. Realistically, having no more than 2 languages present at the same time on screen is probably the limit.
Italics
markup
may
be
sufficient
for
a
human
user,
but
it
is
important
to
be
able
to
mark
up
languages
so
that
the
text
can
be
rendered
correctly,
since
the
same
Unicode
can
be
shared
between
languages
and
rendered
differently
in
different
contexts.
This
is
mainly
an
I18n
localization
issue.
It
is
also
important
for
audio
rendering,
to
get
pronunciation
correct.
correct
pronunciation.
Further, systems that support captions must:
It is desirable to expose the same API to both.
This requires a menu of some sort that displays the available tracks for activation/deactivation.
Edited and verbatim captions can be provided in two different caption resources. There is a need to expose to the user how they differ, similar to how there can be caption tracks in different languages.
These different-language "tracks" can be provided in different resources.
Enhanced captions are timed text cues that have been enriched with further information - examples are glossary definitions for acronyms and other intialisms, foreign terms (for example, Latin), jargon or descriptions for other difficult language. They may be age-graded, so that multiple caption tracks are supplied, or the glossary function may be added dynamically through machine lookup.
Glossary information can be added in the normal time allotted for the cue (e.g., as a callout or other overlay), or it might take the form of a hyperlink that, when activated, pauses the main content and allows access to more complete explanatory material.
Such
extensions
can
provide
important
additional
information
to
the
content
that
will
enable
or
improve
the
understanding
of
the
main
content
to
accessibility
users.
users
of
assistive
technology.
Enhanced
text
cues
will
be
particularly
useful
for
those
with
restricted
reading
skills,
to
subtitle
users,
and
to
caption
users.
Users
may
often
come
across
keywords
in
text
cues
that
lend
themselves
to
further
in-depth
information
or
hyperlinks,
such
as
an
e-mail
contact
or
phone
number
for
a
person,
a
strange
term
that
needs
a
Wikipedia
link
for
to
a
definition,
or
an
idiom
that
needs
comments
to
explain
it
to
a
foreign-language
speaker.
Systems that support enhanced captions must:
Such
"metadata"
markup
can
be
realised
realized
through
a
@title
attribute
on
a
<span>
of
the
text,
or
a
hyperlink
to
another
location
where
a
term
is
explained,
an
<abbr>
element,
an
<acronym>
element,
a
<dfn>
element,
or
through
RDFa
or
microdata.
This
can
be
realised
realized
through
inclusion
of
<a>
elements
links
or
buttons
into
timed
text
cues,
where
additional
overlays
could
be
created
or
a
different
page
be
loaded.
One
needs
to
deal
here
with
the
need
to
pause
the
media
timeline
for
reading
of
the
additional
information.
This feature is analogous to extended video descriptions - where timing for a text cue is longer than the available time for the cue, it may be necessary to halt the media to allow for more time to read back on the text and its additional material. In this case, the pause is dependent on the user's reading speed, so this may imply user control or timeouts.
This can be a setting in the UA, which will define user-interface behavior.
Sign language shares the same concept as captioning: it presents both speech and non-speech information in an alternative format. Note that due to the wide regional variation in signing systems (e.g., American Sign Language vs British Sign Language), sign translation may not be appropriate for content with a global audience unless localized variants can be made available.
Signing
can
be
open,
mixed
with
the
video
and
offered
as
an
entirely
alternate
alternative
stream
or
closed
(using
some
form
of
picture-in-picture
or
alpha-blending
technology).
It
is
possible
to
use
quite
low
bit
rates
for
much
of
the
signing
track,
but
it
is
important
that
facial,
arm,
hand
and
other
body
gestures
be
delivered
at
sufficient
resolution
to
support
legibility.
Animated
avatars
may
not
currently
be
sufficient
as
a
substitute
for
human
signers,
although
research
continues
in
this
area
and
it
may
become
practical
at
some
point
in
the
future.
Acknowledging
that
not
all
devices
will
be
capable
of
handling
multiple
video
streams,
this
is
a
should
SHOULD
requirement
for
browsers
where
hardware
is
capable
of
support.
Strong
authoring
guidance
for
content
creator
creators
will
mitigate
situations
where
user-agents
are
unable
to
support
multiple
video
streams
(WCAG)
-
for
example,
on
mobile
devices
that
cannot
support
multiple
streams,
authors
should
be
encouraged
to
offer
two
versions
of
the
media
stream,
including
one
with
signed
captions
burned
into
the
media.
Selecting from multiple tracks for different sign languages should be achieved in the same fashion that multiple caption/subtitle files are handled.
Systems supporting sign language must:
While synchronized captions are generally preferable for people with hearing impairments, for some users they are not viable – those who are deaf-blind, for example, or those with cognitive or reading impairments that make it impossible to follow synchronized captions. And even with ordinary captions, it is possible to miss some information as the captions and the video require two separate loci of attention. The full transcript supports different user needs and is not a replacement for captioning. A transcript can either be presented simultaneously with the media material, which can assist slower readers or those who need more time to reference context, but it should also be made available independently of the media.
A full text transcript should include information that would be in both the caption and video description, so that it is a complete representation of the material, as well as containing any interactive options.
Systems supporting transcripts must:
While all devices may not support the capability, a standard control API must support the ability to speed up or slow down content presentation without altering audio pitch.
While perhaps unfamiliar to some, this feature has been present on many devices, especially audiobook players, for some 20 years now.
The
user
can
adjust
the
playback
rate
of
prerecorded
time-based
media
content,
such
that
all
of
the
following
are
true
(UAAG
2.0
4.9.5):
2.11.4):
One
of
the
biggest
problems
challenges
to
date
has
been
the
lack
of
a
universal
system
for
media
access.
In
response
to
user
requirements
various
countries
and
groups
have
defined
systems
to
provide
accessibility,
especially
captioning
for
television.
However
these
systems
are
typically
not
compatible.
In
some
cases
the
formats
can
be
inter-converted,
but
some
formats
—
for
example
DVD
sub-pictures
—
are
image
based
and
are
difficult
to
convert
to
text.
Caption formats are often geared towards delivery of the media, for example as part of a television broadcast. They are not well suited to the production phases of media creation. Media creators have developed their own internal formats which are more amenable to the editing phase, but to date there has been no common format that allows interchange of this data.
Any media based solution should attempt to reduce as far as possible layers of translation between production and delivery.
In general captioners use a proprietary workstation to prepare caption files; these can often export to various standard broadcast ingest formats, but in general files are not inter-convertible. Most video editing suites are not set up to preserve captioning, and so this has typically to be added after the final edit is decided on; furthermore since this work is often outsourced, the copyright holder may not hold the final editable version of the captions. Thus when programming is later re-purposed, e.g. a shorter edit is made, or a ‘directors cut’ produced, the captioning may have to be redone in its entirety. Similarly, and particularly for news footage, parts of the media may go to web before the final TV edit is made, and thus the captions that are produced for the final TV edit are not available for the web version.
It is important when purchasing or commissioning media, that captioning and described video is taken into account and made equal priority in terms of ownership, rights of use, etc., as the video and audio itself.
This
is
primarily
an
authoring
requirement.
It
is
a
understood
that
a
common
time-stamp
format
must
be
declared
in
HTML5,
so
that
authoring
tools
can
conform
to
a
required
output.
Systems supporting accessibility needs for media must:
As
described
above,
individuals
need
a
variety
of
media
(alternative
content)
in
order
to
perceive
and
understand
the
content.
The
author
or
some
Web
web
mechanism
provides
the
alternative
content.
This
alternative
content
may
be
part
of
the
original
content,
embedded
within
the
media
container
as
'fallback
content',
or
linked
from
the
original
content.
The
user
is
faced
with
discovering
the
availability
of
alternative
content.
Alternative content must be both discoverable by the user, and accessible in device agnostic ways. The development of APIs and user-agent controls should adhere to the following UAAG guidance:
The user agent can facilitate the discovery of alternative content by following these criteria:
This feature can be user configurable to allow maximum flexibility in trading off the anticipated future need for the description against the amount of extra data storage required. A flexible solution giving maximum control to the user would be to provide a global setting with the following options:
Often forgotten in media systems, especially with the newer forms of packaging such as DVD menus and on-screen program guides, is the fact that the user needs to actually get to the content, control its playback, and turn on any required accessibility options. For user agents supporting accessibility APIs implemented for a platform, any media controls need to be connected to that API.
On self-contained products that do not support assistive technology, any menus in the content need to provide information in alternative formats (e.g., talking menus). Products with a separate remote control, or that are self-contained boxes, should ensure the physical design does not block access, and should make accessibility controls, such as the closed-caption toggle, as prominent as the volume or channel controls.
The video viewport plays a particularly important role with respect to alternative-content technologies. Mostly it provides a bounding box for many of the visually represented alternative-content technologies (e.g., captions, hierarchical navigation points, sign language), although some alternative content does not rely on a viewport (e.g., full transcripts, descriptive video).
One key principle to remember when designing player ‘skins’ is that the lower-third of the video may be needed for caption text. Caption consumers rely on being able to make fast eye movements between the captions and the video content. If the captions are in a non-standard place, this may cause viewers to miss information. The use of this area for things such as transport controls, while appealing aesthetically, may lead to accessibility conflicts.
If
alternative
content
has
a
different
height
or
width
than
the
media
content,
then
the
user
agent
will
reflow
the
(HTML)
viewport.
(UAAG
2.0
3.1.4).
1.8.7).
This
may
create
a
need
to
provide
an
author
hint
to
the
Web
web
page
when
embedding
alternate
alternative
content
in
order
to
instruct
the
Web
web
page
how
to
render
the
content:
to
scale
with
the
media
resource,
scale
independently,
or
provide
a
position
hint
in
relation
to
the
media.
On
small
devices
where
the
video
takes
up
the
full
viewport,
only
limited
rendering
choices
may
be
possible,
such
that
the
UA
may
need
to
override
author
preferences.
This should be achievable through UA configuration or even through something like a greasemonkey script or user CSS which can override styles dynamically in the browser.
This
can
be
achieved
by
simply
zooming
into
the
Web
web
page,
which
will
automatically
rescale
the
layout
and
reflow
the
content.
This
is
a
user-agent
device
requirement
and
should
already
be
addressed
in
the
UAAG.
In
live
content,
it
may
even
be
possible
to
adjust
camera
settings
to
achieve
this
requirement.
It
is
also
a
"
should
SHOULD
"
level
requirement,
since
it
does
not
account
for
limitations
of
various
devices.
If there are several types of overlapping overlays, the controls should stay on the bottom edge of the viewport and the others should be moved above this area, all stacked above each other.
Multiple
secondary
user
devices
must
be
directly
addressable.
This
functionality
is
increasingly
also
known
by
the
new
term,
"Second
Screen,"
even
though
there
may
be
more
than
two
screens
in
any
given
viewing
environment,
and
even
though
not
all
secondary
devices
are
video
displays.
It
must
be
assumed
that
many
users
will
have
multiple
video
displays
at
least
one
additional
display
device
(such
as
a
tablet),
and/or
multiple
audio-output
devices
at
least
one
additional
audio
output
device
(such
as
a
Bluetooth
headset)
attached
to
a
primary
video
display
device,
an
individual
computer,
or
locally
addressable
via
on
a
LAN.
It
must
be
possible
to
configure
certain
types
of
media
for
presentation
on
specific
devices,
and
these
configuration
settings
must
be
readily
overwritable
on
a
case-by-case
basis
by
users.
(A
request
to
the
UAAG
on
clarifications
to
a
number
of
these
points
was
made,
and
a
detailed
response
was
provided.
The
response
requires
review
and
integration
into
this
document,
but
can
be
found
today
in
the
22
July
2010
message
on
this
topic
).
Systems
supporting
multiple
secondary
devices
for
accessibility
must:
The following people contributed to the development of this document.
Jim
Allan
(TSB),
Kazuyuki
Ashimura
(
W3C
),
Simon
Bates,
Chris
Blouch
(AOL),
Judy
Brewer
(
W3C
/
MIT
),
Ben
Caldwell
(Trace),
Charles
Chen
(Google,
Inc.),
Christian
Cohrs,
Dimitar
Denev
(Frauenhofer
Gesellschaft),
Donald
Evans
(AOL),
Geoff
Freed
(Invited
Expert,
NCAM),
Kentarou
Fukuda
(IBM
Corporation),
Becky
Gibson
(IBM),
Alfred
S.
Gilman,
Andres
Gonzalez
(Adobe
Systems
Inc.),
Georgios
Grigoriadis
(SAP
AG),
Jeff
Grimes
(Oracle),
Barbara
Hartel,
John
Hrvatin
(Microsoft
Corporation),
Masahiko
Kaneko
(Microsoft
Corporation),
Earl
Johnson
(Sun),
Jael
Kurz,
Diego
La
Monica
(International
Webmasters
Association
/
HTML
Writers
Guild
(IWA-HWG)),
Gez
Lemon
(International
Webmasters
Association
/
HTML
Writers
Guild
(IWA-HWG)),
Aaron
Leventhal
(IBM
Corporation),
Alex
Li
(SAP),
Thomas
Logan
(HiSoftware
Inc.),
William
Loughborough
(Invited
Expert),
Linda
Mao
(Microsoft),
Anders
Markussen
(Opera
Software),
Matthew
May
(Adobe
Systems
Inc.),
Joshue
O
Connor
(Invited
Expert),
Artur
Ortega
(Yahoo!,
Inc.),
Lisa
Pappas
(Society
for
Technical
Communication
(STC)),
Dave
Pawson
(RNIB),
David
Poehlman,
Simon
Pieters
(Opera
Software),
Sarah
Pulis
(Media
Access
Australia),
T.V.
Raman
(Google,
Inc.),
Jan
Richards
(IDRC),
Gregory
Rosmaita
(Invited
Expert),
Tony
Ross
(Microsoft
Corporation),
Martin
Schaus
(SAP
AG),
Marc
Silbey
(Microsoft
Corporation),
Henri
Sivonen
(Mozilla),
Andi
Snow-Weaver
(IBM
Corporation),
Henny
Swan
(Opera
Software),
Vitaly
Sourikov,
Mike
Squillace
(IBM),
Gregg
Vanderheiden
(Invited
Expert,
Trace),
Ryan
Williams
(Oracle),
Tom
Wlodkowski.
This
publication
has
been
funded
in
part
with
Federal
funds
from
the
U.S.
Department
of
Education,
National
Institute
on
Disability
Disability,
Independent
Living,
and
Rehabilitation
Research
(NIDRR)
(NIDILRR)
under
contract
number
ED-OSE-10-C-0067.
The
content
of
this
publication
does
not
necessarily
reflect
the
views
or
policies
of
the
U.S.
Department
of
Education,
nor
does
mention
of
trade
names,
commercial
products,
or
organizations
imply
endorsement
by
the
U.S.
Government.