DRAFT: Accessibility Features of SMIL

W3C Draft NOTE 12 July 1999

This Version:: http://www.w3.org/WAI/EO/NOTE-smil-access-19990712
Latest Version:: http://www.w3.org/WAI/EO/NOTE-smil-access

Editors:: Marja-Riitta Koivunen (mrk@w3.org); Ian Jacobs (ij@w3.org)

Abstract

This document summarizes the accessibility features of the Synchronized Multimedia Language (SMIL), version 1.0 Recommendation ([SMIL10]). This document has been written so that other documents may refer in a consistent manner to the accessibility features of SMIL.

Status of this document

This document is a draft W3C Note made available by the W3C and the W3C Web Accessibility Initiative. This NOTE has not yet been jointly approved by the WAI Education and Outreach Working Group (EOWG), the WAI Protocols and Formats Working Group (PFWG), and the Synchronized Multimedia (SYMM) Working Group.

Publication of a W3C Note does not imply endorsement by the W3C Membership. A list of current W3C technical reports and publications, including working drafts and notes, can be found at http://www.w3.org/TR.

1 Introduction to Accessible Multimedia

Guidelines for creating accessible synchronized multimedia presentations are very similar to those for other Web documents, such as HTML documents [HTML]. They explain to authors how to create documents that may be used by people who cannot see, hear, move, or may not be able to process some types of information easily or at all. They explain how to design documents that work for a variety of input devices, such as pointing devices, keyboards, head wands, or speech input. The W3C Recommendation "Web Content Accessibility Guidelines" [WAI-WEBCONTENT] explains to authors how to create accessible documents that are rich in (synchronized) multimedia content. For instance, users with blindness or low vision may not be able to use the visual part of a presentation (video, images, graphics, etc.), so authors must provide equivalent information in another format such as text or audio. Similarly, other users with deafness or who are hard of hearing may not be able to use the audio part of a presentation (sound track, sound cues, etc.), and require text or visual equivalents. Documents that satisfy the "Web Content Accessibility Guidelines" will be accessible to users with disabilities, will render on a variety of browsing devices, and will benefit the Web community as a whole.

Some issues discussed in the "Web Content Accessibility Guidelines" that arise specifically in the context of synchronized multimedia presentations include:

In a synchronized presentation, equivalent alternatives (text captions, auditory descriptions) must be synchronized as well. Alternatives that are improperly synchronized may do more harm than good.
When multimedia content occupies the user's ears and eyes, it is more difficult to provide equivalent alternatives for these senses. Auditory equivalents must be synchronized carefully or they will overlap and create noise. Text equivalents displayed on the screen may interfere with other content unless positioned correctly.

As for HTML documents, part of the responsibility for making SMIL 1.0 documents accessible lies with the author and part with the user's software, the SMIL player. Authors must include equivalent alternatives for images, video, audio, etc. They must acknowledge that to ensure accessibility, users must be able to control document style and layout. They must synchronize tracks correctly, describe relationships between tracks, provide useful default behavior, mark up the natural language of content, etc.

In turn, SMIL players must allow users to control document style and layout (e.g., to control font size) and to choose from alternatives provided by the author. Users be able to speed up, slow down, or pause a presentation (as one can do with most home video players). Users must also be able to turn on and off alternatives and control their size, position, volume, etc. Users might also want to define the presentation details for simultaneous audio tracks, for instance, by changing the voice of the audio description to a male if the other dialog contains female voices. Users with some disabilities may require that time-sensitive information be rendered in a time-independent form. For instance, scenes that change at a rate specified by the author may change too quickly for some users; rendering the presentation as linked scenes allows the user to decide when to proceed. Mulimedia players can also offer an index to time-dependent information in a time independent form. For more information about accessible multimedia players, please consult the W3C "User Agent Accessibility Guidelines" [WAI-USERAGENT].

This Note describes the accessibility features of [SMIL10] and explains how authors and SMIL players should make use of them. Below is a summary of the features described in this Note, organized by section. The summary lists the SMIL elements and attributes discussed followed by links to descriptions in SMIL 1.0 specification.

Section 2: Equivalent Alternatives

media-object elements: 'alt'' , 'title' , 'longdesc', 'abstract', and 'author' attributes (4.2.3)
par and seq elements: 'title', 'abstract', and 'author' attributes (4.2.1, and 4.2.2)
region and root-layout elements:'title' attribute (3.3.1 and 3.3.2)
switch element:'title' attribute (4.3)
a and anchor elements:'title' attribute (4.5.1 and 4.5.2)
synchronization elements: 'system-captions', and 'system-overdub-or-captions' attributes (4.4)

Section 3: Layout and Style

layout element and style languages (3.2 and 3.3)
region and root-layout elements:'background-color' attribute (3.3.1, and 3.3.2)

Section 4: Adapting Content to User and System Settings

synchronization elements: 'system-bitrate', 'system-screen-size', and 'system-screen-depth' attributes (4.4)
synchronization elements (see Section 2): 'system-captions', and 'system-overdub-or-captions' attributes (4.4)

2 Equivalent Alternatives

Multimedia documents have two main types of equivalent alternatives: discrete and streams. Discrete equivalents do not contain any time references or have intrinsic duration (unless they are part of synchronizing elements, such as par element for parallel presentation and seq for sequential presentation). Most common in SMIL are discrete text equivalents specified by attributes such as the alt attribute of the img element.

Stream equivalents, such as text captions or auditory descriptions, have intrinsic duration and may contain references to time. For instance, a text stream equivalent consists of pieces of text associated with a time code.

As explained in [WAI-WEBCONTENT], text equivalents are fundamental to accessibility since they may be rendered visually, as speech, or by a braille device. Authors do not require special software to produce text. However, in multimedia documents text stream equivalents must be synchronized with other time-dependent media. We recommend embedding time codes in text streams in these cases. When resources allow, authors are encouraged to provide audio and video equivalents for the benefit of users who may not be able to read text or may not have access to software or hardware for speech synthesis or braille.

The following sections describe in more detail the SMIL features for specifying discrete and stream equivalents for video, audio, text, and other SMIL elements.

2.1 Discrete Equivalents

Authors specify discrete text equivalents for SMIL elements through the following attributes.

alt

For media objects (image, video, audio, textstream, etc.). Specifies a short text equivalent that conveys the function of the media object element. Alternative text may be rendered instead of media content, for instance when images or sound are turned off or not supported by the player.

longdesc

For media objects. Specifies a link to a long, more complete description of media content supplementing the description provided by alt attribute. Authors should provide long descriptions of complex content, such as charts and graphs. When the media-object has associated anchors, this attribute should also provide information about the anchor's contents.

title

For media objects. Specifies a title. Title information may be rendered by players as a "tool tip".
For the par and seq elements. Specifies a title for a group of media objects to be played in parallel or sequentially.
For the a and anchor elements. Describes the destination of a hyperlink.
For the region and root-layout elements. Describes how content should be presented visually.

abstract

For media objects, par, and seq. Summarizes the content of media objects.

author

For media objects, par, and seq. Names the author of media objects.

The following example is a simple definition of a video element using alt, title, and abstract attributes to specify a discrete text equivalent. In addition, it uses the longdesc attribute to link to a longer description of the video and its associated anchors.

<video src="rtsp://foo.com/graph.imf"
       title="Web Trends: Graph 1"
       alt="Web trends: increase in the number of online stores
            and consumers, but decrease in privacy."
       abstract="The number of Web users, online stores, and      
                 the influence of Web communities are 
                 all steadily increasing while privacy for
                 Web users is slowly diminishing."
       longdesc="http://foo.com/graph-description.htm"/>

The next example shows how to use attributes in conjunction with hyperlinking elements a and anchor. Presentation A is a video interview of Joe and Tim, in that order. Anchors have been specified so that people may link directly to either Joe's or Tim's portion of the interview. The title attributes give information about the portions defined by the anchor elements. A link in presentation B makes use of those anchors; when the graph video containing the link is selected, Tim's portion of the interview will be presented.

Presentation A:

<video src="http://www.w3.org/BBC" title="BBC interview"
  alt="Joe's and Tim's interview for BBC"
  abstract="BBC interviews Joe and Tim about the Future of the Web">
   <anchor id="joe" begin="0s" end="5s"
    title="Joe's interview on Web trends"/>
   <anchor id="tim" begin="5s" end="10s"
    title="Tim's interview on Web trends"/>
 </video>

Presentation B:

<a href="http://www.hut.fi/presentationB#tim"
  title="Tim's interview on Web trends" >
    <video region="win1" id="graph"
     src="rtsp://foo.com/graph.imf"
     alt="Dynamic graphs of Web trends"/>
 </a>

2.2 Stream Equivalents

Two stream equivalent formats that promote accessibility are captions and audio descriptions. A caption is a text transcript of spoken words and non-spoken sound effects that is synchronized with an audio stream. Captions benefit people who with deafness or who are hard of hearing. They also benefit anyone in a setting where audio tracks would cause disturbance, where ambient noise prevents them from hearing the audio track, or when they have difficulties understanding spoken language.

An audio description is a recorded or synthesized voice that describes key visual elements of the presentation including information about actions, body language, graphics, and scene changes. Like captions, audio descriptions are synchronized with the original audio stream, generally during natural pauses in the sound track. Synchronizing long audio descriptions may also affect the the timing of the original audio and video tracks since natural pauses may not be long enough to include them. Audio descriptions benefit people with blindness or low vision. They also benefit anyone in an eyes-busy setting.

Below we discuss how to associate captions and audio descriptions with multimedia presentations in SMIL 1.0 such that users may control the presentation of the alternative stream. We also examine how SMIL 1.0 supports multilingual presentations and how this affects stream equivalents for accessibility.

Note. The SMIL 1.0 specification explains how to synchronize events in one or more text streams with events in other tracks. The examples in the following sections do not include explicit information about synchronization.

2.2.1 Captions

In SMIL 1.0, captions may be included in a presentation with the textstream element. The following example includes a caption in addition to the audio and video tracks.

 <par>
   <audio      src="audio.rm"/>
   <video      src="video.rm"/>
   <textstream src="closed-caps.rtx">
 </par>

The limitation of the previous example is that the user cannot easily turn on or off the caption. Style sheets (in conjunction with markup such as an "id" attribute) may be used to hide the text stream, but only for SMIL 1.0 players that support the particular style sheet language. Section 3.1 includes an example that illustrates how users may turn on and off captions descriptions through style sheets.

Since user control of presentation is vital to accessibility, SMIL 1.0 allows authors to create documents whose behavior varies depending on how the user has configured the player. When a SMIL element such as textstream has the system-captions test attribute with value "on" and the user has configured the player to support captions, the element may be rendered. Whether the element is actually rendered depends on other markup in the document (such as language support).

The following example is a TV news presentation that consists of four media object elements: a video track that shows the news announcer, an audio track containing her voice, and two text streams containing a stream of stock values and captions. All the elements are to be played in parallel due to the par element. The caption will only be rendered if the user has configured the player to support captions.

 <par>
   <audio      src="audio.rm"/>
   <video      src="video.rm"/>
   <textstream src="stockticker.rtx"/>
   <textstream src="closed-caps.rtx"
    system-captions="on"/>
 </par>

The system-captions attribute can be used with elements other than textstream. Like the other SMIL test attributes (refer to [SMIL], section 4.4), system-captions acts like a boolean flag that returns "true" or "false" according to the player configuration. Section 3.1 illustrates how system-captions can be used to specify different presentation layouts according to whether the user has configured the SMIL player to support captions.

Note. Authors should only use system-captions="on" for captions and system-captions="off" for caption-related effects such as layout changes.

2.2.2 Audio Descriptions

In SMIL 1.0, audio descriptions may be included in a presentation with the audio element. However, SMIL 1.0 does not provide a mechanism (other than through style sheets) that allows users to turn on or off player support for audio descriptions. Section 3.1 includes an example that illustrates how users may turn on and off auditory descriptions through style sheets.

2.2.3 Multilingual presentations and stream equivalents

SMIL 1.0 allows authors to create multilingual presentations with subtitles (which are text streams) and overdubs (which are audio streams). Since these text and audio streams may co-exist with text and audio streams provided for accessibility, authors of accessible multilingual presentations should be aware of how they interact. For instance, captions and subtitles should be laid out so that they do not overlap on the screen. Audio tracks should not overlap unless carefully synchronized.

In SMIL 1.0, the system-overdub-or-caption test attribute allows users to select (through the player's user interface) whether they would rather have the player render overdubs or subtitles. Note. The term "caption" in "system-overdub-or-caption" does not refer to accessibility captions. Authors must not use this attribute to create accessibility captions.

In the following example, the TV news are offered in both Spanish and English. If the user has the player configured to support both Spanish and overdubs, the Spanish audio track will be rendered. Otherwise the second audio track of the first switch element (the English audio track) will be rendered.

<par>
   <switch> <!-- audio -->
     <audio src="audio-es.rm"
      system-overdub-or-caption="overdub"
      system-language="es"/>
     <audio src="audio.rm"/>
   </switch>
   <video src="video.rm"/>
   <textstream src="stockticker.rtx"/>
   <textstream src="closed-caps.rtx"
               system-captions="on"/>
 </par>

To add Spanish subtitles to the example, we add a second textstream element. The first textstream element uses test attributes so that it will be rendered when the user expresses a preference for subtitles and Spanish. The second text stream will be rendered when the user has configured the player to support accessibility captions.

<par>
   <!-- audio section same as before -->
   <video src="video.rm"/>
   <textstream src="stockticker.rtx"/>
   <switch> <!-- captions or subtitles -->
     <textstream src="closed-caps-es.rtx"
      system-overdub-or-caption="caption"
      system-language="es"/>
     <textstream src="closed-caps.rtx"
      system-captions="on"/>
   </switch>
 </par>

Authors may not have to specify both subtitles and accessibility captions in the same language for the same presentation since the two will be very similar. Captions are preferred since they include text descriptions of actions, sounds, etc. in addition to dialog. If captions are used as subtitles, authors should take care to ensure that it appears one time only, either as a caption or as a subtitle. Below, we modify the previous example to include Spanish accessibility captions (the first textstream). Since this element is within a switch element, if it is selected, the two that follow are not rendered. If the user has not configured the player for captions but has configured it for subtitles, only the second textstream (the same source as the caption) is rendered. Finally, if the user has configured the player to support captions and languages other than Spanish, the third textstream is rendered.

<par>
   <!-- audio section same as before -->
   <video src="video.rm"/>
   <textstream src="stockticker.rtx"/>
   <switch> <!-- captions or subtitles -->
     <textstream src="closed-caps-es.rtx"
      system-captions="on"
      system-language="es"/>
     <textstream src="closed-caps-es.rtx"
      system-overdub-or-caption="caption"
      system-language="es"/>
     <textstream src="closed-caps.rtx"
      system-captions="on"/>
   </switch>
 </par>

Note. In SMIL 1.0, values for system-overdub-or-caption only refer to user preferences for either subtitles or overdubs; there are no values for the test attribute that refer to user preferences for neither or both.

3 Layout and Style

3.1 Layout

Authors may specify the visual layout of SMIL 1.0 media objects through SMIL's own layout markup or with a style sheet language such as CSS [CSS1, CSS2]. In both cases, the layout element specifies the presentation information. We recommend style sheets for a number of reasons: they are designed to ensure that the user has final control of the presentation, they may be shared by several documents, they make document and site management easier. Style sheets may not be supported by all SMIL players, however. SMIL's layout facilities allow authors to arrange rectangular regions visually (via the region element), much like frames in HTML.

In the following example, the CSS 'display' property is used to hide captions when system-captions="off".


<smil>
  <head>
<layout type="text/css">
   [system-captions="off"] { display: none }
</layout>
  </head>
  <body>
    <par>
        <video src="movie-vid.rm"/>
        <textstream src="closed-caps.rtx"/>
    </par>
  </body>
</smil>

Authors may also use style sheets to allow users to turn on and off auditory descriptions. However, since SMIL 1.0 does not include a test attribute for auditory descriptions, users can only turn them on by overriding the author's style sheet. In this example, the auditory description is not played by default. The user could override the value 'none' with another value (e.g., 'block').


<smil>
  <head>
<layout type="text/css">
   [system-captions="off"] { display: none }
   #audio { display: none }
</layout>
  </head>
  <body>
    <par>
        <video src="movie-vid.rm"/>
        <audio id="audio" src="audio.rm"/>
        <textstream src="closed-caps.rtx"/>
    </par>
  </body>
</smil>

The previous caption example does not make ideal use of screen real estate. The following example illustrates how to regain space when captions are turned off or not supported. In this example, the same layout is defined both with SMIL markup and CSS2 style sheets. Since both style sheets appear in a switch element, the SMIL player will use the CSS style sheet if supported, otherwise the SMIL style sheet. Note that the type attribute of the layout element specifies the MIME type of the style sheet language, here "text/css".

The style sheets in this example specify two layouts. When the user has chosen to view captions, they appear in a region (the "captext" region) that takes up 20% of available vertical space below a region for the video presentation (the "capvideo" region), which takes up the other 80%. When the user does not wish to view captions, the video region takes up all available vertical space (the "video" region). The choice of which layout to use depends on the value of the system-captions test attribute.

<smil>
  <head>
    <switch>
      <layout type="text/css">
        { top: 20px; left: 20px }
        [region="video"] {top: 0px; height: 100%}
        [region="capvideo"] {top: 0px; height: 80%}
        [region="captext"] {top: 80%; height: 20%; overflow: scroll}
      </layout>
      <layout>
        <region id="video" top="0" height="100%" fit="meet"/>
        <region id="capvideo" top="0" height="80%" fit="meet"/>
        <region id="captext" top="80%" height="20%" fit="scroll"/>
      </layout>
    </switch>
  </head>
  <body>
    <par>
      <switch> <!-- if captions off use first region, else second -->
        <video region="video" src="movie-vid.rm"
         system-captions="off"/>
        <video region="capvideo" src="movie-vid.rm"/>
      </switch> <!-- if captions on render also captions -->
      <textstream region="captext" src="closed-caps.rtx"
       system-captions="on"/>
    </par>
  </body>
</smil>

3.2 Style

In SMIL 1.0, the only style attribute that can be set in SMIL 1.0 is background-color but without other color definitions, that has little effect on accessibility.

4 Adapting Content to User and System Settings

As mentioned earlier, the authors of SMIL 1.0 documents can define alternative designs based on some user or system settings. The author can test these settings through test attributes set on various elements.

Testattributes for captions, overdubs, and language are described in Section 2.2. SMIL 1.0 also includes attributes to test the speed of connection and some characteristics of the player. Authors may use these tests to tailor the content or presentation of a document according to the user's device or connection. These are the SMIL 1.0 test attributes that may be used with synchronization elements:

system-captions: Tests support for captions (see Section 2.2).
system-overdub-or-caption: Tests support for overdubs or subtitles (see Section 2.2).
system-language: Tests natural language preferences.
system-bitrate: Tests the minimum approximate bandwidth required to display the element. The author of the document can use this to decide that by default high quality images are not shown with slow connection speeds.
system-screen-depth: Tests the the minimum depth of the screen color palette in bits required to display the element. This attribute controls the presentation according to the capability of the screen to display images or video at a certain color depth.
system-screen-size: Tests the minimum required screen size to display the element. It can be used to control what will be shown to the user with a certain screen size.

These attributes may be used e.g., to deliver content more appropriately for various devices and connections. For example, if a connection is slow, the author may specify that images should not be downloaded. While these attributes may make some content more accessible, they may overly constrain what a user can access. For instance, users may still want to download important images despite a slow connection. Authors should use these attributes conservatively. In addition, players should offer possibilities for overriding the restrictions built in by the author when necessary.

The following example delivers different qualities of video based on available bandwidth. The player evaluates each of the choices in the switch element in order and chooses the first one whose system-bitrate value is equal to or greater to the speed of the connection between the media player and media server.

 <switch> <!-- video -->
   <video src="high-quality-movie.rm" system-bitrate="40000">
   <video src="medium-quality-movie.rm" system-bitrate="24000">
   <video src="low-quality-movie.rm" system-bitrate="10000">
 </switch>

5 About W3C and WAI

5.1 About the Web Accessibility Initiative

W3C's Web Accessibility Initiative (WAI) addresses accessibility of the Web through five complementary activities that:

Ensure that the technology of the Web supports accessibility
Develop accessibility guidelines
Develop tools to facilitate evaluation and repair of Web sites
Conduct education and outreach
Conduct research and development

WAI's International Program Office enables partnering of industry, disability organizations, accessibility research organizations, and governments interested in creating an accessible Web. WAI sponsors include the US National Science Foundation and Department of Education's National Institute on Disability and Rehabilitation Research; the European Commission's DG XIII Telematics for Disabled and Elderly Programme; IBM, Lotus Development Corporation, and NCR.

Additional information on WAI is available at http://www.w3.org/WAI.

5.2 About the WAI Web Accessibility Guidelines

Web accessibility guidelines are essential for Web site development and for Web-related applications development. WAI is coordinating with many organizations to produce three sets of guidelines:

Web Content Accessibility Guidelines ([WAI-WEBCONTENT]) for accessible Web sites
User Agent Accessibility Guidelines ([WAI-USERAGENT]) for accessible user agents (browsers, multimedia players, etc.)
Authoring Tool Accessibility Guidelines ([WAI-AUTOOLS]) for accessible authoring tools (editors and site management tools).

5.3 About the World Wide Web Consortium (W3C)

The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Laboratory for Computer Science (LCS) in the USA, the National Institute for Research in Computer Science and Control (INRIA) in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users; reference code implementations to embody and promote standards; and various prototype and sample applications to demonstrate use of new technology. To date, more than 320 organizations are Members of the Consortium. For more information about the World Wide Web Consortium, see http://www.w3.org/

Acknowledgements

Many people in W3C and WAI have given valuable comments to this document. Especially I want to thank Charles McCathieNevile, Philipp Hoschka and Ian Jacobs.

References

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

[CSS2]: "Cascading Style Sheets, level 2", B. Bos, H. W. Lie, C. Lilley, and I. Jacobs, 17 May 1998.
[CSS1]: "Cascading Style Sheets, level 1", H. W. Lie and B. Bos, 17 December 1996. Revised 11 January 1999.
[HTML40]: "HTML 4.0 Recommendation", D. Raggett, A. Le Hors, and I. Jacobs, eds., 18 December 1997, revised 24 April 1998.
[SMIL10]: Synchronized Multimedia Integration Language (SMIL) Specification, Philipp Hoschka, 15 June 1998.
[WAI-AUTOOLS]: "Authoring Tool Accessibility Guidelines", J. Treviranus, J. Richards, I. Jacobs, C. McCathieNevile, eds.
[WAI-WEBCONTENT]: "Web Content Accessibility Guidelines",W. Chisholm, G. Vanderheiden, and I. Jacobs, eds., 5 May 1999.
[WAI-USERAGENT]: "User Agent Accessibility Guidelines", J. Gunderson and I. Jacobs, eds.