This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.
Media Navigation
Contents
Related Bugs and Issues
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12662
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10693
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13184
http://www.w3.org/html/wg/tracker/issues/163
Proposals due: 1st July
http://lists.w3.org/Archives/Public/public-html/2011May/0428.html
Use Case
Media resources are typically large, time-based objects which do not easily expose direct access to their semantic content. Seeking on the transport bar often provides the only means to navigate to specific locations in the timeline that a user believes to contain the information they are looking for. This, however, is a very in-exact means of navigation, based on guessing and not on semantic knowledge. It is particularly useless for blind users that cannot even gain the small insights that the timeline exposes.
We know from DVDs that direct access to chapters and subchapters in videos are a more successful and accurate means of navigation.
The DAISY standard is providing a similarly accurate and useful means of navigation to blind users. It allows them to gain an overview of the media resource's content through section markers that allow a type of "speed reading".
Just as the structures introduced particularly by nonfiction titles make books more usable, media is more usable when its inherent semantic structure is exposed. Direct access to semantic structure is critical for persons with disabilities who cannot infer structure from purely presentational queues.
Requirements
HTML5 has introduced the notion of "chapter tracks" to satisfy the navigational needs of users on media resources. As the specification stands right now, chapter tracks satisfy the DVD use case: a timeline is broken into a linearly successive sections without any further subdivision.
However, to replicate the flexibility of the DAISY standard, we need to introduce several levels of hierarchical navigation. While DAISY supports an unlimited number of hierarchically organised navigation levels, a maximum of 6 levels has been seen in the wild and 3-4 levels are typical.
An example with multiple levels may be a reading of the bible with the following levels:
- h1: testaments (old/new)
- h2: books inside the testaments
- h3: chapters inside the books
- h4: verses inside chapters
- h5: phrases inside verses
- h6: words inside phrases
DAISY devices provide the following keyboard controls to support the navigation:
- up arrow: move up a hierarchical level
- down arrow: drill down into a hierarchical level
- left/right arrow: move between entities of a single hierarchical level
- enter: select to execute the navigation
Examples of hierarchical chapters from DAISY can be found at http://www.daisy.org/sample-content - take in particular those that say "demonstrating DAISY navigation".
Related Markup examples as background information
1. Headers (h1, h2, ..., h6)
Screen readers navigate HTML headers in a similar manner to how DAISY navigates: it is possible to jump between headers of the same level and drill down into lower levels.
2. Lists (ol, ul)
Screen readers also navigate ordered or unordered HTML lists in a similar manner.
3. Navigation (nav)
Typically ol/ul is used inside a nav to provide for navigation structure. nav provides additional semantics for screen readers.
4. Section / Article (section, article)
These are new HTML5 elements and not supported in screen readers yet, so tend to just be mapped to div.
Note that all screen readers that navigate through hierarchical constructs do so in a depth-first manner. Some also allow to ignore the depth and continue on the same level. This seems best implemented using headers. (is this true?)
DAISY/DTB/EPUB use a special XML file format called NCX (navigation control file for XML applications) to create a navigation structure over the HTML files provided as part of a document package. Its development was motivated by the need to provide quick access to the main structural elements of the DAISY document without the need to parse the entire marked-up text files. It introduces "navMap", "navPoint" and "navList" elements. Here is an example:
<navMap> <navPoint id="ncx1" class="h1" playOrder="1"> <navLabel><text>Valentin Haüy The father of the education for the blind</text></navLabel> <content src="valentinhauy11.html#ops1" /> <navPoint id="ncx2" class="h2" playOrder="2"> <navLabel><text>List of contents</text></navLabel> <content src="valentinhauy11.html#rgn_cnt_0026" /> </navPoint> <navPoint id="ncx3" class="h2" playOrder="4"> <navLabel><text>Preface</text></navLabel> <content src="valentinhauy11.html#rgn_cnt_0095" /> </navPoint> <navPoint id="ncx4" class="h2" playOrder="6"> <navLabel><text>1. Research questions</text></navLabel> <content src="valentinhauy11.html#rgn_cnt_0103" /> </navPoint> [...] </navPoint> [...] </navMap>
This example shows two navigation levels.
For multimedia files, xhtml files with links to smil files are used by DAISY, for example:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>ごんぎつね</title> <meta name="dc:format" content="Daisy 2.02"/> </head> <body> <h1 class="title" id="hotl_0001"><a href="hotl0001.smil#hotl_0001">ごん狐</a></h1> <h1 id="hotl_0003"><a href="hotl0002.smil#dol_1_2_hotl_0003">一</a></h1> <span class="page-normal" id="xhot_0005"><a href="hotl0002.smil#dol_1_2_hotl_0004">1</a></span> <h2 id="hotl_0005"><a href="hotl0003.smil#dol_1_3_hotl_0005">1-2</a></h2> <h2 id="hotl_0006"><a href="hotl0004.smil#dol_1_4_hotl_0006">1-3</a></h2> <h2 id="hotl_0007"><a href="hotl0005.smil#dol_1_5_hotl_0007">1-4</a></h2> <h2 id="hotl_0008"><a href="hotl0006.smil#dol_1_6_hotl_0008">1-5</a></h2> <h2 id="hotl_0009"><a href="hotl0007.smil#dol_1_7_hotl_0009">1-6</a></h2> <h2 id="hotl_000a"><a href="hotl0008.smil#dol_1_8_hotl_000a">1-7</a></h2> <h2 id="hotl_000b"><a href="hotl0009.smil#dol_1_9_hotl_000b">1-8</a></h2> <h2 id="hotl_000c"><a href="hotl000a.smil#dol_1_a_hotl_000c">1-9</a></h2> <h1 id="hotl_000e"><a href="hotl000c.smil#dol_1_c_hotl_000e">二</a></h1> <span class="page-normal" id="xhot_003b"><a href="hotl000c.smil#dol_1_12_hotl_000f">2</a></span> <h2 id="hotl_0010"><a href="hotl000d.smil#dol_1_13_hotl_0010">2-1</a></h2> <h2 id="hotl_0011"><a href="hotl000e.smil#dol_1_14_hotl_0011">2-2</a></h2> <h2 id="hotl_0012"><a href="hotl000f.smil#dol_1_15_hotl_0012">2-3</a></h2> <h2 id="hotl_0013"><a href="hotl0010.smil#dol_1_16_hotl_0013">2-4</a></h2> </body> </html>
This example shows two navigation levels plus a content level.
Possible Markup for TTML
See: http://www.w3.org/WAI/PF/HTML/wiki/TextFormat_Mapping_to_Requirements#cn1
<?xml version="1.0" encoding="utf-8"?> <tt xml:lang="en" ttp:timebase="clock" xmlns="http://www.w3.org/ns/ttml" xmlns:ttp="http://www.w3.org/ns/ttml#parameter"> <body role="x-nav-work" timeContainer='seq'> <div role="x-nav-section" timeContainer='seq'> <p role="x-nav-section" timeContainer='seq'> <span role="x-nav-section" dur="11.300s">Index point 1.1.1 </span> <span role="x-nav-section" dur="20.100s">Index point 1.1.2 </span> <span role="x-nav-section" dur="12.900s">Index point 1.1.3 </span> <span role="x-nav-section" dur="13.700s">Index point 1.1.4 </span> </p> <p role="x-nav-section" timeContainer='seq'> <span role="x-nav-section" dur="7.200s">Index point 1.2.1 </span> <span role="x-nav-section" dur="28.500s">Index point 1.2.2 </span> <span role="x-nav-section" dur="31.090s">Index point 1.2.3 </span> <span role="x-nav-section" dur="41.000s">Index point 1.2.4 </span> </p> </div> <div role="x-nav-section" timeContainer='seq'> <p role="x-nav-section" timeContainer='seq'> <span role="x-nav-section" dur="11.300s">Index point 2.1.1 </span> <span role="x-nav-section" dur="20.100s">Index point 2.1.2 </span> <span role="x-nav-section" dur="12.900s">Index point 2.1.3 </span> <span role="x-nav-section" dur="13.700s">Index point 2.1.4 </span> </p> <p role="x-nav-section" timeContainer='seq'> <span role="x-nav-section" dur="1.300s">Index point 2.2.1 </span> <span role="x-nav-section" dur="2.100s">Index point 2.2.2 </span> <span role="x-nav-section" dur="2.900s">Index point 2.2.3 </span> <span role="x-nav-section" dur="3.700s">Index point 2.2.4 </span> </p> </div> </body> </tt>
This example shows two different navigation levels.
Possible Markup for WebVTT
In analogy to the other examples, WebVTT can also provide nested navigation within cues. This is not currently specified, but a possible extension for chapter tracks. Maybe something along the following lines:
WEBVTT 00:00:00.000 --> 00:00:10.700 Title Slide 00:00:10.700 --> 00:00:47.600 Introduction by Naomi Black 00:00:47.600 --> 00:07:37.900 Talk on WebVTT <nav> <00:00:47.600>Impact of Captions on the Web <00:01:50.100>Requirements of a Video text format <00:03:33.000>Simple WebVTT file <00:04:57.766>Styled WebVTT file <00:06:16.666>Internationalized WebVTT file </nav>
This example shows two navigation levels.
Technical solutions for HTML5
The solutions to this problem need to provide a markup means for hierarchical navigation and a JavaScript API to expose it, while at the same time making sure to maintain the relationship between the hierarchical levels such that keyboard controls as described in the requirements can be implemented.
The specification for navigation of media relies on chapter tracks: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#attr-track-kind, and http://dev.w3.org/html5/spec/the-iframe-element.html#attr-track-kind.
Tracks of kind "chapter" provide: "Chapter titles, intended to be used for navigating the media resource. Displayed as an interactive list in the user agent's interface."
So, the key problem to solve is how to expose hierarchical structures to the browser/script.
Possible solutions
1. Within a (chapter) text track
The examples in the previous section show different means of marking up a chapter track in external text track resources such that cues can contain a hierarchical structure.
The thus created nested time ranges can be used for hierarchical navigation.
Part of their mapping rules into HTML5 would contain a mapping to either header tags (h1, h2, h3...) or to nested lists (ul or ol). Probably a mapping to nested ul lists would make the most sense. Then the navigation structure is available to the browser and can be handed on to AT.
Discussion: - chapter cues can now also include lists + hierarchical relationship is clear + easy to communicate hierarchical notion to AT, since it builds on existing structures + no change to the existing API is required
The hierarchical markup within a cue would end up in TextTrackCues that will have some extra HTML markup in their cue text, which can be accessed through getCueAsHTML() and thus exposed to AT.
Here is an example in WebVTT to demonstrate how it works:
webvtt.vtt:
WEBVTT 00:00:00.000 --> 00:00:10.700 Title Slide 00:00:10.700 --> 00:00:47.600 Introduction by Naomi Black 00:00:47.600 --> 00:07:37.900 Talk on WebVTT <nav> <00:00:47.600>Impact of Captions on the Web <00:01:50.100>Requirements of a Video text format <00:03:33.000>Simple WebVTT file <00:04:57.766>Styled WebVTT file <00:06:16.666>Internationalized WebVTT file </nav>
track markup:
<video src="video.ogv" controls> <track src="webvtt.vtt" kind="chapter" label="chapter and subchapter level navigating"> </video>
And here is roughly how it is represented in the JS API:
TextTrackCueList { length : 3, TextTrackCue(0) { track: <TextTrack>, id : '', startTime: '00:00:00.000', endTime: '00:00:10.700', pauseOnExit: false, direction: horizontal, snapToLines: false, linePosition: 100, textPosition: 50, size: 3, alignment: center, getCueAsSource(): "Title Slide", getCueAsHTML(): "Title Slide" }, TextTrackCue(1) { [..] startTime: '00:00:00.000', endTime: '00:00:10.700', getCueAsSource(): "Introduction by Naomi Black", getCueAsHTML(): "Introduction by Naomi Black" }, TextTrackCue(2) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Talk on WebVTT <nav> <00:00:47.600>Impact of Captions on the Web <00:01:50.100>Requirements of a Video text format <00:03:33.000>Simple WebVTT file <00:04:57.766>Styled WebVTT file <00:06:16.666>Internationalized WebVTT file </nav>", getCueAsHTML(): "Talk on WebVTT <ul> <li><? target='timestamp' data='00:00:47.600'>Impact of Captions on the Web</li> <li><? target='timestamp' data='00:01:50.100'>Requirements of a Video text format</li> <li><? target='timestamp' data='00:03:33.000'>Simple WebVTT file</li> <li><? target='timestamp' data='00:04:57.766'>Styled WebVTT file</li> <li><? target='timestamp' data='00:06:16.666'>Internationalized WebVTT file</li>, </ul>" } }
The getCueAsHTML() accessor will return a structured DocumentFragment, that AT can use to provide the navigation.
2. With multiple tracks
It is possible to provide the different navigation approaches through multiple tracks that each contain a flat navigation structure.
Parallel tracks of type "chapter" could be used for hierarchical navigation. The user then has the chance to switch between chapter tracks to get to a finer / rougher navigation resolution.
Discussion: - hierarchical relationship between tracks and their cues is unclear - hard to communicate the hierarchical notion to AT + chapter tracks continue to work as right now + the different tracks don't have to be strictly hierarchically dependent - they can just provide alternative chapter segmentations
To show an example, we need to have two input tracks and combine them together through the <track> markup.
Here are two WebVTT files that replicate what is shown in 1. above.
webvtt1.vtt:
WEBVTT 00:00:00.000 --> 00:00:10.700 Title Slide 00:00:10.700 --> 00:00:47.600 Introduction by Naomi Black 00:00:47.600 --> 00:07:37.900 Talk on WebVTT
webvtt2.vtt:
WEBVTT 00:00:47.600 --> 00:01:50.100 Impact of Captions on the Web 00:01:50.100 --> 00:03:33.000 Requirements of a Video text format 00:03:33.000 --> 00:04:57.766 Simple WebVTT file 00:04:57.766 --> 00:06:16.666 Styled WebVTT file 00:06:16.666 --> 00:07:37.900 Internationalized WebVTT file
track markup:
<video src="video.ogv" controls> <track src="webvtt1.vtt" kind="chapter" label="level 1 navigation"> <track src="webvtt2.vtt" kind="chapter" label="level 2 navigation"> </video>
And here is how they would be represented in the JS API:
TextTrackCueList[0] { length : 3, TextTrackCue(0) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Title Slide", getCueAsHTML(): "Title Slide" }, TextTrackCue(1) { [..] startTime: '00:00:10.700', endTime: '00:00:47.600', getCueAsSource(): "Introduction by Naomi Black", getCueAsHTML(): "Introduction by Naomi Black" }, TextTrackCue(2) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Talk on WebVTT", getCueAsHTML(): "Talk on WebVTT" } } TextTrackCueList[1] { length: 5, TextTrackCue(0) { [..] startTime: '00:00:47.600', endTime: '00:01:50.100', getCueAsSource(): "Impact of Captions on the Web", getCueAsHTML(): "Impact of Captions on the Web" }, TextTrackCue(1) { [..] startTime: '00:01:50.100', endTime: '00:03:33.000', getCueAsSource(): "Requirements of a Video text format", getCueAsHTML(): "Requirements of a Video text format" }, TextTrackCue(2) { [..] startTime: '00:03:33.000', endTime: '00:04:57.766', getCueAsSource(): "Simple WebVTT file", getCueAsHTML(): "Simple WebVTT file" }, etc. }
3. Single-track, multiple cues
A third means would be to use the second example, but put all the cues that can be found at different navigation levels into a single track.
As a consequence, it is possible that multiple tracks would be active at the same time.
Discussion: - hierarchical relationship between the cues is unclear - hierarchical relationship has to deducted from the timing overlaps, which is very unreliable - unclear which currently active cue will be chosen for navigation when a certain time is reached - hierarchical character is lost from the authored file - hard to communicate the hierarchical notion to AT - may need to introduce a new attribute to indicate the hierarchical notion + easier to deal with hierarchical character in JavaScript
Here is an example WebVTT file for this situation:
WEBVTT 00:00:00.000 --> 00:00:10.700 <h1>Title Slide 00:00:10.700 --> 00:00:47.600 <h1>Introduction by Naomi Black 00:00:47.600 --> 00:07:37.900 <h1>Talk on WebVTT 00:00:47.600 --> 00:01:50.100 <h2>Impact of Captions on the Web 00:01:50.100 --> 00:03:33.000 <h2>Requirements of a Video text format 00:03:33.000 --> 00:04:57.766 <h2>Simple WebVTT file 00:04:57.766 --> 00:06:16.666 <h2>Styled WebVTT file 00:06:16.666 --> 00:07:37.900 <h2>Internationalized WebVTT file
track markup:
<video src="video.ogv" controls> <track src="webvtt.vtt" kind="chapter" label="chapter and subchapter level navigating"> </video>
And here is how it would be represented in the JS API:
[added a possible level attribute]
TextTrackCueList { length : 8, TextTrackCue(0) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Title Slide", getCueAsHTML(): "Title Slide", level: 1 }, TextTrackCue(1) { [..] startTime: '00:00:10.700', endTime: '00:00:47.600', getCueAsSource(): "Introduction by Naomi Black", getCueAsHTML(): "Introduction by Naomi Black", level: 1 }, TextTrackCue(2) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Talk on WebVTT", getCueAsHTML(): "Talk on WebVTT", level: 1 }, TextTrackCue(3) { [..] startTime: '00:00:47.600', endTime: '00:01:50.100', getCueAsSource(): "Impact of Captions on the Web", getCueAsHTML(): "Impact of Captions on the Web", level: 2 }, etc. }
In this example, we added a level attribute to indicate the hierarchical position.
Example markup with 3 hierarchical levels
Here we look at an actual example that would be coming from either a TTML or WebVTT document and end up as the same structure in a Web page. The example has 3 hierarchical levels, just to show the principle.
TTML example
TTML markup:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Test File</title> </head> <body> <div> <p begin="0.76s" end="3.45s"> Chapter 1 </p> <p begin="3.45s" end="8.0s"> Chapter 2 <div> <p begin="3.45s" end="4.0s"> Subchapter 1 </p> <p begin="4.0s" end="6.0s"> Subchapter 2 <div> <p begin="4.0s" end="4.5s"> Paragraph 1 </p> <p begin="4.5s" end="5.0s"> Paragraph 2 </p> <p begin="5.0s" end="6.0s"> Paragraph 3 </p> </div> </p> <p begin="6.0s" end="8.0s"> Subchapter 3 </p> </div> </p> <p begin="8.0s" end="16.0s"> Chapter 3 </p> </div> </body> </html>
WebVTT equivalent example
WebVTT markup:
WEBVTT 00:00:00.760 --> 00:0:03.450 Chapter 1 00:00:03.450 --> 00:0:08.000 Chapter 2 <nav> <00:00:03.450>Subchapter 1 <00:00:04.000>Subchapter 2 <nav> <00:00:04.000>Paragraph 1 <00:00:04.500>Paragraph 2 <00:00:05.000>Paragraph 3 </nav> <00:00:06.000>Subchapter 3 </nav> 00:00:08.000 --> 00:0:16.000 Chapter 3
Track markup with both
<video src="video.ogv" controls> <track src="chapters.vtt" kind="chapter" label="chapter, subchapter and paragraph level navigating"> <track src="chapters.ttml" kind="chapter" label="chapter, subchapter and paragraph level navigating"> </video>
Parsed JS representation of either
TextTrackCueList { length : 3, TextTrackCue(0) { [..] startTime: '00:00:00.760', endTime: '00:0:03.450', getCueAsSource(): "Chapter 1", getCueAsHTML(): "Chapter 1" }, TextTrackCue(1) { [..] startTime: '00:00:03.450', endTime: '00:0:08.000', getCueAsSource(): "Chapter 2", getCueAsHTML(): "Chapter 2" getCueAsSource(): "Chapter 2 <nav> <00:00:03.450>Subchapter 1 <00:00:04.000>Subchapter 2 <nav> <00:00:04.000>Paragraph 1 <00:00:04.500>Paragraph 2 <00:00:05.000>Paragraph 2 </nav> <00:00:06.000>Subchapter 3 </nav>", getCueAsHTML(): "Talk on WebVTT <ul> <li><? target='timestamp' data='00:00:03.450'>Subchapter 1</li> <li><? target='timestamp' data='00:00:04.000'>Subchapter 2</li> <ul> <li><? target='timestamp' data='00:00:04.000'>Paragraph 1</li> <li><? target='timestamp' data='00:00:04.500'>Paragraph 2</li> <li><? target='timestamp' data='00:00:05.000'>Paragraph 3</li> </li> <li><? target='timestamp' data='00:00:06.000'>Subchapter 3</li> </ul>" }, TextTrackCue(2) { [..] startTime: '00:00:47.600', endTime: '00:07:37.900', getCueAsSource(): "Chapter 3", getCueAsHTML(): "Chapter 3" }, TextTrackCue(3) { [..] startTime: '00:00:08.000', endTime: '00:00:16.000', getCueAsSource(): "Impact of Captions on the Web", getCueAsHTML(): "Impact of Captions on the Web" }, etc. }