This specification defines DAPT, a TTML-based file format for the exchange of timed text content in dubbing and audio description workflows.
Status of This Document
This section describes the status of this
document at the time of its publication. A list of current W3C
publications and the latest revision of this technical report can be found
in the W3C technical reports index at
https://www.w3.org/TR/.
Publication as a Working Draft does not
imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other
documents at any time. It is inappropriate to cite this document as other
than work in progress.
Future updates to this specification may incorporate
new features.
This document was produced by a group
operating under the
W3C Patent
Policy.
W3C maintains a
public list of any patent disclosures
made in connection with the deliverables of
the group; that page also includes
instructions for disclosing a patent. An individual who has actual
knowledge of a patent which the individual believes contains
Essential Claim(s)
must disclose the information in accordance with
section 6 of the W3C Patent Policy.
This specification defines a text-based profile of the Timed Text Markup Language version 2.0 [TTML2]
intended to support dubbing and audio description workflows worldwide,
to meet the requirements defined in [DAPT-REQS], and to permit usage of visual presentation
features within [TTML2] and its profiles, for example those in [TTML-IMSC1.2].
2. Introduction
This section is non-normative.
2.1 Transcripts and Scripts
A transcript is a text representation of pre-existing media in another form,
for example the dialogue in a video.
A script is a text representation of the intended content of media prior to its creation,
for example to guide an actor in recording an audio track.
Within this specification the term DAPT script is used generically to refer to both transcripts and scripts.
DAPT Scripts consist of timed text and associated metadata,
such as the character speaking.
In dubbing workflows, a transcript is generated and translated to create a script.
In audio description workflows, a transcript describes the video image,
and is then used directly as a script for recording an audio equivalent.
DAPT is a TTML-based format for the exchange of transcripts and scripts
(i.e. DAPT Scripts)
among authoring, prompting and playback tools in the localization and audio description pipelines.
A DAPT document is a serializable form of a DAPT Script designed to carry pertinent information for dubbing or audio description
such as type of DAPT script, dialogue, descriptions, timing, metadata, original language transcribed text, translated text, language information, and audio mixing instructions,
and to be extensible to allow user-defined annotations or additional future features.
A DAPT script is expected to be used to make audio visual media accessible
or localized for users who cannot understand it in its original form,
and to be used as part of the solution for meeting user needs
involving transcripts, including accessibility needs described in [media-accessibility-reqs],
as well as supporting users who need dialogue translated into a different language via dubbing.
The authoring workflow for both dubbing and audio description involves similar stages,
that share common requirements as described in [DAPT-REQS].
In both cases, the author reviews the content and
writes down what is happening, either in the dialogue or in the video image,
alongside the time when it happens.
Further transformation processes can change the text to a different language and
adjust the wording to fit precise timing constraints.
Then there is a stage in which an audio rendering of the script is generated,
for eventual mixing into the programme audio.
That mixing can occur prior to distribution,
or in the client directly.
The dubbing process which consists in creating a dubbing script
is a complex, multi-step process involving:
Transcribing and timing the dialogue in the original language from a completed programme to create a transcript;
Notating dialogue with character information and other annotations;
Generating localization notes to guide further adaptation;
Translating the dialogue to a target language script;
Adapting the translation to the dubbing;
for example matching the actor’s lip movements in the case of dubs.
A dubbing script is a transcript or script
(depending on workflow stage) used for
recording translated dialogue to be mixed with the non-dialogue programme audio,
to generate a localized version of the programme in a different language,
known as a dubbed version, or dub for short.
Dubbing scripts can be useful as a starting point for creation of subtitles or closed captions in alternate languages.
This specification is designed to facilitate the addition of, and conversion to,
subtitle and caption documents in other profiles of TTML, such as [TTML-IMSC1.2],
for example by permitting subtitle styling syntax to be carried in DAPT documents.
Alternatively, styling can be applied to assist voice artists when recording scripted dialogue.
Creating audio description content is also a multi-stage process.
An audio description,
also known as video description
or in [media-accessibility-reqs] as described video,
is an audio service
to assist viewers who can not fully see a visual presentation to understand the content.
It is the result of the audio rendition of one or more descriptions
mixed with the audio associated with the programme prior to any mixing with audio description
(sometimes referred to as main programme audio),
at moments when this does not clash with dialogue, to deliver an audio description mixed audio track.
A description is a set of words that describes an aspect of the programme presentation,
suitable for rendering into audio by means of vocalisation and recording
or used as a text alternative source for text to speech translation, as defined in [WCAG22].
More information about what audio description is and how it works can be found at [BBC-WHP051].
Writing the audio description script typically involves:
watching the video content of the programme,
or series of programmes,
identifying the key moments during which there is an opportunity to speak descriptions,
writing the description text to explain the important visible parts of the programme at that time,
creating an audio version of the descriptions, either by recording a human actor or using text to speech,
defining mixing instructions (applied using [TTML2] audio styling) for combining the audio with the programme audio.
The audio mixing can occur prior to distribution of the media,
or in the client player.
If the audio descriptionscript is delivered to the player,
the text can be used to provide an alternative rendering,
for example on a Braille display,
or using the user's configured screen reader.
2.2 Example documents
2.2.1 Basic document structure
The top level structure of a document is as follows:
The <tt> root element in the namespace http://www.w3.org/ns/ttml indicates that this is a TTML document
and the ttp:contentProfiles attribute indicates that it adheres to the DAPT content profile defined in this specification.
The daptm:workflowType attribute indicates the type of workflow.
The daptm:scriptType attribute indicates the type of script
but in this empty example, it is not relevant, as only the structure of the document is shown.
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xml:lang="en"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:workflowType="dubbing"daptm:scriptType="originalTranscript"><head><metadata><!-- Additional metadata may be placed here --><!-- Any characters must be defined here as a set of ttm:agent elements --></metadata><styling><!-- Styling is optional and consists of a set of style elements --></styling><layout><!-- Layout is optional and consists of a set of region elements --></layout></head><body><!-- Content goes here --></body></tt>
The following examples correspond to the timed text scripts produced
at each stage of the workflow described in [DAPT-REQS].
The first example shows a script where timed opportunities for descriptions
or transcriptions have been identified but no text has been written:
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xmlns:xml="http://www.w3.org/XML/1998/namespace"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:workflowType="audioDescription"daptm:scriptType="preRecording"xml:lang="en"><body><divbegin="10s"end="13s"><pdaptm:langSrc="original">
A woman climbs into a small sailing boat.
</p></div><divbegin="18s"end="20s"><pdaptm:langSrc="original">
The woman pulls the tiller and the boat turns.
</p></div></body></tt>
After creating audio recordings, if not using text to speech, instructions for playback
mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in
the audio played from inside the span, smoothly
animating the value on the way in and returning it on the way out:
<tt...daptm:workflowType="audioDescription"daptm:scriptType="asRecorded"xml:lang="en">
...
<divbegin="25s"end="28s"><pdaptm:langSrc="original"><animatebegin="0.0s"end="0.3s"tta:gain="1;0.39"fill="freeze"/><animatebegin="2.7s"end="3s"tta:gain="0.39;1"/><spanbegin="0.3s"end="2.7s"><audiosrc="clip3.wav"/>
The sails billow in the wind.</span></p></div>
...
In the above example, the <div> element's
begin time becomes the "syncbase" for its child,
so the times on the <animate> and <span>
elements are relative to 25s here.
The first <animate> element drops the gain from 1
to 0.39 over 0.3s, freezing that value after it ends,
and the second one raises it back in the
final 0.3s of this description. Then the <span> is
timed to begin only after the first audio dip has finished.
If the audio recording is long and just a snippet needs to be played,
that can be done using clipBegin and clipEnd.
If we just want to play the part of the audio from file from 5s to
8s it would look like:
...
<divbegin="25s"end="28s"><pdaptm:langSrc="original"><animatebegin="0.0s"end="0.3s"tta:gain="1;0.39"fill="freeze"/><animatebegin="2.7s"end="3s"tta:gain="0.39;1"/><spanbegin="0.3s"end="2.7s"><audio><source><datatype="audio/wave">
[base64-encoded audio data]
</data></source></audio>
The sails billow in the wind.</span></p></div>
...
2.2.3 Dubbing Examples
From the basic structure of Example 1, a transcription of the audio programme produces an original language dubbing script,
which can look as follows. No specific style or layout is defined, and here the focus is on the transcription of the dialogs.
Characters are identified within the <metadata>.
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xml:lang="fr"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:workflowType="dubbing"daptm:scriptType="originalTranscript"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"><pdaptm:langSrc="original"ttm:agent="character_1"><span>Et c'est grâce à ça qu'on va devenir riches.</span></p></div></body></tt>
After translating the text, the document is modified. It includes translation text, and
in this case the original text is preserved. The main document language is changed to indicate
that the focus is on the translated language:
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xml:lang="en"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:workflowType="dubbing"daptm:scriptType="translatedTranscript"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"ttm:agent="character_1"><pxml:lang="fr"daptm:langSrc="original"><span>Et c'est grâce à ça qu'on va devenir riches.</span></p><pxml:lang="en"daptm:langSrc="translation"><span>And thanks to that, we're gonna get rich.</span></p></div></body></tt>
The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording.
The daptm:scriptType attribute is also modified, as in the following example:
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xml:lang="en"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:workflowType="dubbing"daptm:scriptType="preRecording"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"ttm:agent="character_1"daptm:onScreen="ON_OFF"><pxml:lang="fr"daptm:langSrc="original"><span>Et c'est grâce à ça qu'on va devenir riches.</span></p><pxml:lang="en"daptm:langSrc="translation"><spanbegin="0s">And thanks to that,</span><spanbegin="1.5s"> we're gonna get rich.</span></p></div></body></tt>
3. Documentation Conventions
This document uses the following conventions:
When referring to an [XML] element in the prose, angled brackets and a specific style are used as follows: <someElement>. If the name of an element referenced in this specification
is not namespace qualified, then the TT namespace applies (see Namespaces).
When referring to an [XML] attribute in the prose, the attribute name is given with its prefix,
or without a prefix if the attribute is in the global namespace.
Attributes are styled as follows: attributePrefix:attributeName.
When defining new [XML] attributes, this specification uses the conventions used for
"value syntax expressions" in [TTML2]. For example, the following would define a new attribute
called daptm:foo as a string with two possible values:
bar and baz.
daptm:foo
: "bar"
| "baz"
When referring to the position of an element or attribute in the [XML] document,
the [XPath] LocationPath notation is used.
For example, to refer to the first <metadata> element child of
the <head> element child of
the <tt> element,
the following path would be used:
/tt/head/metadata[0].
4. DAPT Data Model and corresponding TTML syntax
This section specifies the data model for DAPT and its corresponding TTML syntax.
In the model, there are objects which can have properties and be associated with other objects.
In the TTML syntax, these objects and properties are expressed as elements and attributes,
though it is not always the case that objects are expressed as elements and properties as attributes.
Figure 1 illustrates the DAPT data model, hyperlinking every object and property
to its corresponding section in this document.
Shared properties are shown in italics.
All other conventions in the diagram are as per [uml].
The definitions of the types of documents and the corresponding daptm:scriptType values are:
Original Language Transcript:
When the daptm:scriptType value is originalTranscript,
the document is a literal transcription of the dialogue and on-screen text in their original spoken/written language(s).
When the daptm:scriptType value is translatedTranscript,
the document represents a translation of the Original Language Transcript in a common language.
When the daptm:scriptType value is preRecording,
the document represents the result of the adaptation of an Original Language Transcript or
a Translated Transcript for recording, e.g. for better lip-sync in a dubbing workflow,
or to ensure that the words can fit within the time available in an audio description workflow.
The Primary Language is a mandatory property of a DAPT Script
which represents the default language for the Text content of Script Events.
This language may be the Original language or a Translation language.
When it represents a Translation language, it may be the final language
for which a dubbing or audio description script is being prepared,
called the Target Recording Language or it may be an intermediate, or pivot, language
used in the workflow.
the xml:lang attribute MUST be present on the <tt> element and its value MUST NOT be empty.
Note
All text content in a DAPT Script has a specified language.
When multiple languages are used, the Primary Language can correspond to the language of the majority of Script Events,
to the language being spoken for the longest duration, or to the language arbitrarily chosen by the author.
Some of the properties in the DAPT data model are common within more than one object type,
and carry the same semantic everywhere they occur.
These shared properties are listed in this section.
Editor's note
Would it be better to make a "Timed Object" class and subclass Script Event,
Mixing Instruction and Audio Recording from it?
4.1.6.1 Timing Properties
The following timing properties
define when the entities that contain them are active:
The Begin property defines when an object becomes active,
and is relative to the active begin time of the parent object.
DAPT Scripts begin at time zero on the media timeline.
The End property defines when an object stops being active,
and is relative to the active begin time of the parent object.
The Duration property defines the maximum duration of an object.
Note
If both an End and a Duration property are present,
the end time is the earlier of End and Begin + Duration.
Note
If any of the timing properties is omitted, the following rules apply:
The default value for Begin is zero, i.e. the same as the begin time of the parent object.
The default value for End is indefinite,
i.e. it resolves to the same as the end time of the parent timed object,
if there is one.
The end time of a DAPT Script is for practical purposes the end of the Related Media Object.
The default value for Duration is indefinite,
i.e. the end time resolves to the same as the end time of the parent object.
4.2 Character
In Dubbing Scripts, it is necessary to identify each character in the programme. This is done with a Character object which has the following properties:
a mandatory Identifier
which is a unique identifier used to reference the character from elsewhere in the document,
for example to indicate when a Character participates in a Script Event,
or to link a Character Style to its Character.
a mandatory Name which is the name of the Character in the programme
an optional Talent Name, which is the name of the actor speaking dialogue for this Character
zero or more Character Style objects
which can be applied to control the visual appearance of Script Events spoken by the Character,
for example during recording by an actor or when transforming the script into subtitles.
A Character is represented in a DAPT Document by the following structure and constraints:
A <ttm:agent> element corresponding to the Talent NameMUST be present at the path
/tt/head/metadata/ttm:agent, with the following constraints:
its type attribute MUST be set to person
its xml:id attribute MUST be set.
it MUST have a <ttm:name> child element whose
typeMUST be set to full and its content set to the Talent Name
If more than one Character is associated with the same
Talent Name there SHOULD be a single
<ttm:agent> element corresponding to that Talent Name,
referenced separately by each of the Characters.
Each <ttm:agent> element corresponding to a Talent NameSHOULD appear before any of the Character<ttm:agent> elements
whose <ttm:actor> child element references it.
The Character is represented in a DAPT Document by a <ttm:agent> element present at the path
/tt/head/metadata/ttm:agent, with the following constraints:
The type attribute MUST be set to character.
The xml:id attribute MUST be present on the ttm:agent and set to the Character Identifier.
The ttm:agentMUST contain a ttm:name element with its type attribute set to alias and its content set to the Character Name.
If the Character has a Talent Name, it MUST contain a <ttm:actor> child element.
That child element MUST have an agent attribute set to
the xml:id of the <ttm:agent> element
corresponding to the Talent Name,
that is, whose type is set to person.
All <ttm:agent> elements SHOULD be contained in the first <metadata> element in the <head> element.
Note
There can be multiple <metadata> elements in the <head> element,
for example to include proprietary metadata
but the above recommends that only one is used to define the characters.
Each Character Style is represented by one or more <style> elements
at the path /tt/head/styling/style.
Each such <style> element is associated with the Character by having a ttm:agent attribute
whose value is the xml:id of the <ttm:agent> element representing the Character.
A Script EventMAY apply Character Styles by including the xml:id of each style
in the style attribute of the <div> element that defines that Script Event.
A Text object MAY apply Character Styles by including the xml:id of each style
in the style attribute of the <p> element that defines that Text object.
A Text object SHOULD NOT apply a Character Style for a Character that is not associated with that Text object's Script Event.
Note
Any style attribute defined in [TTML2] or [TTML-IMSC1.2]
(or other profiles using non-W3C namespaces) can be present on the <style> element.
A <style> element MAY omit the ttm:agent attribute if it is not associated with a Character.
Such styles MAY be applied in the same way as any other style, via a reference in the style attribute.
Character Styles are applied to Script Events and Text by using the style attribute to specify the set of applicable styles.
Presentation ProcessorsMUST NOT apply character styles to text if they are not specified using the style attribute.
We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:
DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.
...
<styling><stylexml:id="style_a"ttm:agent="character_3"tts:color="#FFFFFF"tts:backgroundColor="#8F42AD"/></styling>
...
<divxml:id="event_6"ttm:agent="character_3"style="style_a"... >
<-- Script event contents here, in Character 3's style -->
</div><divxml:id="event_7"ttm:agent="character_3"style="some_other_style"... >
<-- Script event contents here, not in Character 3's style -->
</div>
4.3 Script Event
A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:
A mandatory Script Event Identifier which is unique in the script
An optional Begin property and an optional End and an optional Duration property
that together define the Script Event's time interval in the programme timeline
Note
Typically Script Events do not overlap in time. However, there can be cases where they do, e.g. in Dubbing Scripts when different Characters speak different text at the same time.
While typically, a Script Event corresponds to one single Character, there are cases where multiple characters can be associated with a Script Event. This is when all Characters speak the same text at the same time.
Empty Text objects can be used to indicate explicitly that there is no text content.
It is recommended that empty Text objects are not used as a workflow placeholder to indicate incomplete work.
The begin and end attributes SHOULD be present.
The dur attribute MAY be present.
Note
As noted in [TTML2] if both an end attribute and a dur attribute are present,
the end time is the earlier of end and (begin + dur).
Note
If timing attributes are omitted, the following rules apply:
The default value for begin is zero, i.e. the same as the begin time of the parent element.
The default value for end is indefinite,
i.e. it resolves to the same as the end time of the parent timed element,
if there is one.
The topmost timed element is the <body> element,
whose end time is for practical purposes the end of the Related Media Object.
The default value for dur is indefinite, i.e. the end time resolves to the same as the end time of the parent element.
The ttm:agent attribute MAY be present and if present,
MUST contain a reference to each ttm:agent that represents an associated Character.
It MAY contain zero or more <p> elements representing each Text object.
The style attribute MAY be present. If present, it MAY contain a reference to the <style> defining the Character Style. Additional style references or inline styles MAY be used.
It MAY contain a <metadata> element representing the On Screen property.
4.4 Text
The Text object contains text content typically in a single language. This language may be the Original language or a Translation language.
The style attribute MAY be present.
If present, it MAY contain a reference to the <style> that defines the relevant Character Style.
Additional style references or inline styles MAY be used as defined in [TTML2],
and MAY be applied to sub-sections of the text defined by <span> elements.
The referenced or inline styles MAY include the
tta:speak or
tta:pitch attributes
representing a Synthesized Audio object.
The <p> element SHOULD have an xml:lang attribute corresponding to the language of the Text object.
Note
If a <p> element omits the xml:lang attribute then its computed language
is derived by inheritance from its parent element, and so forth up to the root <tt> element,
which is required to set the Primary Language via its xml:lang attribute.
Care should be taken if changing the Primary Language of a DAPT Script in case
doing so affects descendant elements unexpectedly.
Authors can mitigate this risk by explicitly setting xml:lang on all <p> elements.
<divxml:id="event_3"begin="9663f"end="9682f"style="style_a"ttm:agent="character_3"><pxml:lang="pt-BR"daptm:langSrc="original" >Você vai ter.</p><pxml:lang="fr"daptm:langSrc="translation" >Bah, il arrive.</p></div>
Note
In some cases, a single section of original language dialogue can contain some words in other languages.
Rather than splitting a Script Event into multiple Script Events to deal with this,
Text objects in one language can also contain some words in a different language.
This is represented in a DAPT Document by setting the xml:lang attribute on
inner <span> elements.
It MAY contain zero or more <audio> elements representing each Audio Recording object.
It MAY contain zero or more <animate> elements representing each Mixing Instruction object.
4.5 Text Language Source
The Text Language Source property is an annotation indicating whether a Text object is
in the same language as the relevant part of the audio's
language (original), or if it is a representation in another language (translation):
Initial design is to use an abbreviated name and original|translation,
though I considered using an abbreviated value too, since this attribute will appear on every <p> element.
Abbreviating O for Original is probably a bad idea because the letter O and the number 0 can easily be confused.
I also considered P for Primary but that caused potential confusion between Primary Language and Primary Text Language Source.
daptm:langSrc
: "original"
| "translation"
4.6 On Screen
The On Screen property is an annotation indicating
the position in the scene relating to the subject of a Script Event,
for example of the character speaking:
ON - the Script Event's subject is on screen for the entire duration
OFF - the Script Event's subject is off screen for the entire duration
ON_OFF - the Script Event's subject starts on screen, but goes off screen at some point
OFF_ON - the Script Event's subject starts off screen, but goes on screen at some point
If omitted, the default value is "ON".
Note
The On Screen property is represented in a DAPT Document by a
daptm:onScreen attribute on the
<div> element, with the following constraints:
The following attribute corresponding to the On ScreenScript Event property may be present:
The Script Event Description does not need to be unique, i.e. it does not need to have a different value for each Script Event.
For example a particular value could be re-used to identify in a human-readable way one or more Script Events that are intended to be processed together,
e.g. in a batch recording.
...
<body><divbegin="10s"end="13s"><ttm:desc>Scene 1</ttm:desc><pdaptm:langSrc="original"><span>A woman climbs into a small sailing boat.</span></p><pxml:lang="fr"daptm:langSrc="translation"><span>Une femme monte à bord d'un petit bateau à voile.</span></p></div><divbegin="18s"end="20s"><ttm:desc>Scene 1</ttm:desc><pdaptm:langSrc="original"><span>The woman pulls the tiller and the boat turns.</span></p><pxml:lang="fr"daptm:langSrc="translation"><span>La femme tire sur la barre et le bateau tourne.</span></p></div></body>
...
The <ttm:desc> element MAY have a daptm:descType attribute specified to indicate the type of description. The daptm:descType attribute is defined below. Its possible values are as indicated in the registry at YYY.
...
<body><divbegin="10s"end="13s"><ttm:descdaptm:descType="pronunciationNote">[oːnʲ]</ttm:desc><pdaptm:langSrc="original">Eóin looks around at the other assembly members.</p></div></body>
...
Multiple <ttm:desc> elements MAY be present with different values of daptm:descType, as in the following example.
...
<body><divbegin="10s"end="13s"><ttm:descdaptm:descType="scene">Scene 1</ttm:desc><ttm:descdaptm:descType="plotSignificance">High</ttm:desc><pdaptm:langSrc="original"><span>A woman climbs into a small sailing boat.</span></p><pxml:lang="fr"daptm:langSrc="translation"><span>Une femme monte à bord d'un petit bateau à voile.</span></p></div><divbegin="18s"end="20s"><ttm:descdaptm:descType="scene">Scene 1</ttm:desc><ttm:descdaptm:descType="plotSignificance">Low</ttm:desc><pdaptm:langSrc="original"><span>The woman pulls the tiller and the boat turns.</span></p><pxml:lang="fr"daptm:langSrc="translation"><span>La femme tire sur la barre et le bateau tourne.</span></p></div></body>
...
4.8 Script Event Type
The Script Event Type property provides one or more space-separated keywords representing the type of the Script Event,
i.e. spoken text, or on-screen text,
and in the latter case, the type of on-screen text (title, credit, location, ...).
The possible keywords are indicated in the registry at XXXX.
An Audio object is used to specify an audio rendering of a Text.
The audio rendering can either be a recorded audio resource,
as an Audio Recording object,
or a directive to synthesize a rendering of the text via a text to speech engine,
which is a Synthesized Audio object.
Both are types of Audio object.
It is an error for an Audio not to be in the same language as its Text.
An Audio Recording is an Audio object that references an audio resource.
It has the following properties:
One or more alternative Sources, each of which is either
1) a link to an external audio resource
or 2) an embedded audio recording;
For each Source, one mandatory Type
that specifies the type ([MIME-TYPES]) of the audio resource,
for example audio/basic;
An optional Begin property and an optional End and an optional Duration property
that together define the Audio Recording's time interval in the programme timeline,
in relation to the parent element's time interval;
An optional In Time and an optional Out Time property
that together define a temporal subsection of the audio resource;
The default In Time is the beginning of the audio resource.
The default Out Time is the end of the audio resource.
If the temporal subsection of the audio resource is longer than
the duration of the Audio Recording's time interval,
then playback MUST be truncated to end when the
Audio Recording's time interval ends.
Note
If the temporal subsection of the audio resource is shorter than
the duration of the Audio Recording's time interval,
then the audio resource plays once.
Implementations can use the Type, and if present,
any relevant additional formatting information,
to decide which Source to play.
For example, given two Sources, one being a WAV file, and the other an MP3,
an implementation that can play only one of those formats,
or is configured to have a preference for one or the other,
would select the playable or preferred version.
An Audio Recording is represented in a DAPT Document by an
<audio>
element child of a <p> or <span> element
corresponding to the Text to which it applies.
The following constraints apply to the <audio> element:
The begin, end and dur attributes
represent respectively the Begin, End and Duration properties;
The clipBegin and clipEnd attributes
represent respectively the In Time and Out Time properties,
as illustrated by Example 5;
For each Source, if it is a link to an external audio resource,
the Source and Type properties are represented by exactly one of:
A src attribute that is not a fragment identifier,
and a type attribute respectively;
This mechanism cannot be used if there is more than one Source.
A src attribute that is not a fragment identifier is a URL that references
an external audio resource, i.e. one that is not embedded within the DAPT Script.
No validation that the resource can be located is specified in DAPT.
Editor's note
Do we need both mechanisms here?
It's not clear what semantic advantage the child <source> element carries in this case.
Consider marking use of that child <source> element as "at risk"?
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple<source> child elements, and we specify that in this case the implementation must choose no more than one.
[Edited 2023-03-29 to account for the "play no more than one" constraint added after the issue was opened]
If the Source is an embedded audio resource,
the Source and Type properties are represented together by exactly one of:
A src attribute that is a fragment identifier
that references either
an <audio> element
or a <data> element,
where the referenced element is a
child of /tt/head/resources
and specifies a type attribute
and the xml:id attribute used to reference it;
This mechanism cannot be used if there is more than one Source.
A <source> child element with a
src attribute that is a fragment identifier
that references either
an <audio> element
or a <data> element,
where the referenced element is a
child of /tt/head/resources
and specifies a type attribute
and the xml:id attribute used to reference it;
In each of the cases above the type attribute represents the Type property.
A src attribute that is a fragment identifier is a pointer
to an audio resource that is embedded within the DAPT Script
If <data> elements are defined, each one MUST contain
either #PCDATA or
<chunk> child elements
and MUST NOT contain any <source> child elements.
<data> and <source> elements MAY contain a format attribute
whose value implementations MAY use in addition to the type attribute value
when selecting an appropriate audio resource.
Editor's note
Do we need all 3 mechanisms here?
Do we need any?
There may be a use case for embedding audio data,
since it makes the single document a portable (though large)
entity that can be exchanged and transferred with no concern for missing resources,
and no need for e.g. manifest files.
If we do not need to support referenced embedded audio then only the last option is needed,
and is probably the simplest to implement.
One case for referenced embedded audio is that it more easily allows reuse of the
same audio in different document locations, though that seems like an unlikely
requirement in this use case. Another is that it means that all embedded audio is in
an easily located part of the document in tt/head/resources, which
potentially could carry an implementation benefit?
Consider marking the embedded data features as "at risk"?
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple<source> child elements, though it is unclear what the semantic is intended to be if multiple resources are specified - presumably, the implementation gets to choose one somehow.
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
If we are going to support embedded audio resources, they can either be defined in /tt/head/resources and then referenced, or the data can be included inline.
See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?
Issue 117: Embedded data: Do we need to support all the permitted encodings? What about length? questionCR must-have
In TTML2's <data> element, an encoding can be specified, being one of:
base16
base32
base32hex
base64
base64url
Do we need to require processor support for all of them, or will the default base64 be adequate?
Also, it is possible to specify a length attribute that provides some feasibility of error checking, since the decoded data must be the specified length in bytes. Is requiring support for this a net benefit? Would it be used?
The computed value of xml:langMUST be identical
to the computed value of xml:lang of the parent element
and any child <source> elements
and any referenced embedded <data> elements.
4.9.2 Synthesized Audio
A Synthesized Audio is an Audio object that represents
a machine generated audio rendering of the parent Text content.
It has the following properties:
A mandatory Rate that specifies the rate of speech, being
normal,
fast or
slow;
An optional Pitch that allows adjustment of the pitch
of the speech.
A Synthesized Audio is represented in a DAPT Document by
the application of a
tta:speak
style attribute on the element representing the Text object to be spoken,
where the computed value of the attribute is
normal, fast or slow.
This attribute also represents the Rate Property.
The tta:pitch
style attribute represents the Pitch property.
A tta:pitch attribute on an element
whose computed value of tta:rate is none
has no effect.
Such an element is not considered to have an associated Synthesized Audio.
Note
The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [SSML] as indicated in [TTML2]. This version
of the specification does not specify how other features of [SSML] can be either generated from DAPT or embedded
into DAPT documents. The option to extend [SSML] support in future versions of this specification is deliberately left open.
4.10 Mixing Instruction
A Mixing Instruction object is a static or animated adjustment
of the audio relating to the containing object.
It has the following properties:
Zero or more Gain properties.
The gain acts as a multiplier to be applied to the related Audio;
Zero or more Pan properties.
The pan adjusts the stereoscopic (left/right) position;
An optional Begin and an optional End and an optional Duration property
that together define the time interval during which the Mixing Instruction
applies;
An optional Fill property that specifies whether,
at the end time of an animated Mixing Instruction,
the specified Gain and Pan properties should be
retained (freeze) or reverted (remove).
A Mixing Instruction is represented by applying audio style attributes
to the element that corresponds to the relevant object, either inline,
by reference to a <style> element, or in a child (inline)
<animate> element:
The tta:gain
attribute represents the Gain property;
The tta:pan
attribute represents the Pan property.
If the Mixing Instruction is animated, that is,
if the adjustment properties change during the
containing object's active time interval, then it is represented by
one or more child <animate> elements.
This representation is required if more than one Gain or Pan property is needed,
or if any timing properties are needed.
The <animate>
element(s) MUST be children of
the element corresponding to the containing object,
and have the following constraints:
The begin, end and dur attributes
represent respectively the Begin, End and Duration properties;
The tta:gain
attribute represents the Gain property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period;
The tta:pan
attribute represents the Pan property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period.
The predefined entities are (including the leading ampersand and trailing semicolon):
& for an ampersand & (unicode code point U+0026)
' for an apostrophe ' (unicode code point U+0027)
> for a greater than sign > (unicode code point U+003E)
< for a less than sign < (unicode code point U+003C)
" for a quote symbol " (unicode code point U+0022)
Note
A DAPT Document can also be used as an in-memory model
for processing, in which case the serialisation requirements do not apply.
5.2 Foreign Elements and Attributes
A DAPT DocumentMAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile.
Note
DAPT Documents remain subject to the content conformance requirements specified at Section 3.1 of [TTML2].
In particular, a DAPT Document can contain elements and attributes not in any TT namespace, i.e. in foreign namespaces, since such elements and attributes are pruned by the algorithm at Section 4 of [TTML2] prior to evaluating content conformance.
Note
For validation purposes it is good practice to define and use a content specification for all foreign namespace elements and attributes used within a DAPT Document.
Many dubbing and audio description workflows permit annotation of Script Events or documents with proprietary metadata.
Metadata vocabulary defined in this specification or in [TTML2] MAY be included.
Additional vocabulary in other namespaces MAY also be included.
Note
It is possible to add information such as the title of the programme using [TTML2] constructs.
<!DOCTYPE html><htmllang="en"><head><metacharset="utf-8"><title>Error</title></head><body><pre>Cannot GET /examples/metadata-TTML2.xml</pre></body></html>
Note
It is possible to add workflow-specific information using a foreign namespace.
In the following example, a fictitious namespace vendorm from an "example vendor" is used
to provide document-level information not defined by DAPT.
The namespace prefix values defined above are for convenience and DAPT DocumentsMAY use any prefix value that conforms to [xml-names].
The namespaces defined by this proposal document are mutable [namespaceState]; all undefined names in these namespaces are reserved for future standardization by the W3C.
5.4 Related Media Object (TTML)
Within DAPT, the common language terms audio and video are used in the context of a programme.
The audio and video are each a part of what is defined in [TTML2] as the
Related Media Object that
provides the media timeline and is the source of the main programme audio,
and any visual timing references needed when adjusting timings relevant to the video image,
such as for lip synchronization.
Note
A DAPT document can identify the programme acting
as the Related Media Object using metadata. For example, it is possible
to use the <ebuttm:sourceMediaIdentifier> element defined in [EBU-TT-3390].
If the DAPT Document is intended to be used as the basis for producing
an [TTML-IMSC1.2] document,
the synchronization provisions of [TTML-IMSC1.2] apply
in relation to the video.
Timed content within the DAPT Document is intended to be rendered
starting and ending on specific audio samples.
Note
In the context of this specification rendering could be visual presentation of text,
for example to show an actor what words to speak, or could be audible playback of an audio resource,
or could be physical or haptic, such as a Braille display.
In constrained applications, such as real-time audio mixing and playback,
if accurate synchronization to the audio sample cannot be achieved in the rendered output,
the combined effects of authoring and playback inaccuracies in
timed changes in presentation SHOULD meet the synchronization requirements
of [EBU-R37], i.e. audio changes are not to precede image changes by
more than 40ms, and are not to follow them by more than 60ms.
Likewise, authoring applications SHOULD allow authors to meet the
requirements of [EBU-R37] by defining times with an accuracy
such that changes to audio are less than 15ms after any associated change in
the video image, and less than 5ms before any associated change in the video image.
Taken together, the above two constraints on overall presentation and
on DAPT documents intended for real-time playback mean that
content processorsSHOULD complete audio presentation changes
no more than 35ms before the time specified in the DAPT document
and no more than 45ms after the time specified.
The ttp:contentProfiles attribute
is used to declare the [TTML2] profiles to which the document conforms.
TTML documents representing DAPT Scripts MUST specify a ttp:contentProfiles attribute
on the <tt> element including one value equal to the
DAPT 1.0 Content Profile designator.
Other values MAY be present to declare conformance to other profiles of [TTML2],
and MAY include profile designators in proprietary namespaces.
TTML documents representing DAPT Scripts MAY specify a ttp:processorProfiles attribute
on the <tt> element.
If present, the ttp:processorProfiles attribute MUST include one value equal to
the designator of the DAPT 1.0 Processor Profile.
Other values MAY be present to declare additional processing constraints,
and MAY include profile designators in proprietary namespaces.
Note
ttp:processorProfiles can be used
to signal that features and extensions in additional profiles
need to be supported to process the Document Instance successfully.
For example, a local workflow might introduce particular metadata requirements,
and signal that the processor needs to support those by using an additional
processor profile designator.
Note
If the content author does not need to signal that
additional processor requirements than those defined by DAPT
are needed to process the DAPT document then the
ttp:processorProfiles is not expected to be present.
5.6.5 Other TTML2 Profile Vocabulary
[TTML2] specifies a vocabulary and semantics that can be used to define the set of features
that a document instance can make use of, or that a processor needs to support,
known as a Profile.
Except where specified, it is not a requirement of DAPT that this profile vocabulary is supported by
processors; nevertheless such support is permitted.
The majority of this profile vocabulary is used to indicate how a processor can compute the set of features
that it needs to support in order to process the Document Instance successfully.
The vocabulary is itself defined in terms of TTML2 features.
Those profile-related features are listed within E. Profiles as being optional.
They MAY be implemented in processors
and their associated vocabulary
MAY be present in Document Instances.
Note
Unless processor support for these features and vocabulary has been
arranged (using an out-of-band protocol), the vocabulary is not expected to be present.
The additional profile-related vocabulary for which processor support is
not required (but is permitted) in DAPT is:
Within a DAPT Script, the following constraints apply in relation to time attributes and time expressions:
5.7.1 ttp:timeBase
The only permitted ttp:timeBase value is media,
since E. Profiles prohibits all timeBase features
other than #timeBase-media.
This means that the beginning of the document timeline,
i.e. time "zero",
is the beginning of the Related Media Object.
5.7.2 timeContainer
The only permitted value of the timeContainer attribute is the default value, par.
Documents SHOULD omit the timeContainer attribute on all elements.
Documents MUST NOT set the timeContainer attribute to any value other than par on any element.
Note
This means that the begin attribute value for every timed element is relative to
the computed begin time of its parent element,
or for the <body> element, to time zero.
5.7.3 ttp:frameRate
If the document contains any time expression that uses the f metric,
or any time expression that contains a frames component,
the ttp:frameRate attribute MUST be present on the <tt> element.
Note
5.7.4 ttp:tickRate
If the document contains any time expression that uses the t metric,
the ttp:tickRate attribute MUST be present on the <tt> element.
5.7.5 Time expressions
All time expressions within a document SHOULD use the same syntax,
either clock-time or offset-time
as defined in [TTML2], with DAPT constraints applied.
Note
A DAPT clock-time has one of the forms:
hh:mm:ss.sss
hh:mm:ss
where
hh is hours,
mm is minutes,
ss is seconds, and
ss.sss is seconds with a decimal fraction of seconds (any precision).
Note
Clock time expressions that use frame components,
which look similar to "time code",
are prohibited due to the semantic confusion that has been observed
elsewhere when they are used, particularly with non-integer frame rates,
"drop modes" and sub-frame rates.
Note
An offset-time has one of the forms:
nn metric
nn.nn metric
where
nn is an integer,
nn.nn is a number with a decimal fraction (any precision), and
metric is one of:
h for hours,
m for minutes,
s for seconds,
ms for milliseconds,
f for frames, and
t for ticks.
When mapping a media time expression M to a frame F of the video,
e.g. for the purpose of accurately timing lip synchronization,
the content processorSHOULD map M to the frame F with the presentation time
that is the closest to, but not less, than M.
A media time expression of 00:00:05.1 corresponds to frame
ceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153
of a video that has a frame rate of 1000 / 1001 × 30 ≈ 29.97.
5.8 Layout
This specification does not put additional constraints on the layout and rendering features defined in [TTML-IMSC1.2].
Note
Layout of the paragraphs may rely on the default TTML region (i.e. if no <layout> element is used in the <head> element) or may be explicit by the use of the region attribute, to refer to a <region> element present at /tt/head/layout/region.
5.9 Bidirectional text
The following metadata elements are permitted in DAPT and specified in [TTML2] as containing #PCDATA,
i.e. text data only with no element content.
Where bidirectional text is required within the character content within such an element,
Unicode control characters can be used to define the base direction within arbitrary ranges of text.
<ttm:copyright>
<ttm:desc>
<ttm:item>
<ttm:name>
<ttm:title>
The content elements <p> and <span> permit the direction of text
to be specified using the tts:direction and tts:unicodeBidi attributes.
Document authors should use this more robust mechanism rather than using Unicode control characters.
6. Conformance
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, SHOULD, and SHOULD NOT in this document
are to be interpreted as described in
BCP 14
[RFC2119] [RFC8174]
when, and only when, they appear in all capitals, as shown here.
[TTML2] specifies a formal language for expressing document and processor requirements,
within the Profiling sub-system.
The normative requirements of this specification are defined using the conformance terminology
described above, and are also defined using this TTML2 profile mechanism.
Where TTML2 vocabulary is referenced, the syntactic and semantic requirements relating to that
vocabulary as defined in [TTML2] apply.
Whilst there is no requirement for a DAPT processor to implement the
TTML2 profile processing semantics in general,
implementers can use the TTML2 profiles defined in E. Profiles
as a means of verifying that their implementations meet the normative requirements
of DAPT, for example as a checklist.
Conversely, a general purpose [TTML2] processor that does support the TTML2
profile processing semantics can use the TTML2 profiles defined in E. Profiles
directly to determine if it is capable of processing a DAPT document.
With the exception of the following, the privacy considerations of [ttml2] apply:
Appendix P.3 (Resource Fetching), where only the considerations relating to the <audio> element apply.
Appendix P.7 (Access to Processing State) does not apply, since no support for the condition attribute is required.
Appendix P.9 (Privacy of Preference) is extended to include the potential indication that the user has a need for audio description.
B.1 Personal Information
DAPT documents typically contain the names of characters or people who feature within the associated media,
either fictional or real.
In general this information would be present within the media itself or be public via other routes.
If there is sensitivity associated with their being known to people with access to
the DAPT documents in which their identity is contained,
then such access should be managed with appropriate confidentiality.
For example those documents could be available within a closed authoring environment and
edited to remove the sensitive information prior to distribution to a wider audience.
If this scenario arises, information security good practices within the closed environment should be applied,
such as encryption of the document "at rest" and when being moved,
access controlled by authentication platforms, etc.
B.2 Audio format preference
DAPT documents can reference a set of alternate external audio resources for the same fragment of audio,
where the processor is expected to select one of the alternatives based on features such as format support.
If this pattern is used, it is possible that the processor's choice of audio resource,
being exposed to the origin, reveals information about that processor, such as its preferred audio format.
Applying the Mixing Instructions can be implemented using [webaudio].
Figure 2 shows the flow of programme audio,
and how, when audio-generating elements are active,
the pan and gain (if set) on the Script Event are applied,
then the output is passed to the Text,
which mixes in the audio from any active Audio Recording,
itself subject to its own Mixing Instructions,
then the result has the Text's Mixing Instructions
applied, prior to the output being mixed on to the master bus.
The above examples are simplified in at least two ways:
if a Text contains
<span> elements that themselves have Mixing Instructions
applied, then additional nodes would be needed;
the application of animatedMixing Instructions is not
shown explicitly. [webaudio] supports the timed variation of
input parameters to its nodes: it is possible to translate the
TTML <animate> semantics directly into
[webaudio] API calls to achieve the equivalent effect.
A TTML Profile specification is
a document that lists all the features of TTML that are required / optional / prohibited
within “document instances” (files) and “processors” (things that process the files),
and any extensions or constraints.
MUST satisfy all normative provisions specified by the profile;
MAY include any vocabulary, syntax or attribute value associated with a feature or extension whose disposition is permitted or optional in the profile;
MUST include any vocabulary, syntax or attribute value associated with a feature or extension whose disposition is required in the profile.
MUST NOT include any vocabulary, syntax or attribute value associated with a feature or extension whose disposition is prohibited in the profile.
MUST satisfy the Generic Processor Conformance requirements at Section 3.2.1 of [TTML2]
MUST satisfy all normative provisions specified by the profile; and
MUST implement presentation semantic support for every feature or extension designated as permitted or required by the profile, subject to any additional constraints on each feature as specified by the profile.
MAY implement presentation semantic support for every feature or extension designated as optional or prohibited by the profile, subject to any additional constraints on each feature as specified by the profile.
MUST satisfy the Generic Processor Conformance requirements at Section 3.2.1 of [TTML2];
MUST satisfy all normative provisions specified by the profile; and
MUST implement transformation semantic support for every feature or extension designated as permitted or required by the profile, subject to any additional constraints on each feature as specified by the profile.
MAY implement transformation semantic support for every feature or extension designated as optional or prohibited by the profile, subject to any additional constraints on each feature as specified by the profile.
The dispositions required, permitted, optional and prohibited as used in this specification
map to the [TTML2] <ttp:feature>
and <ttp:extension> elements'
value attribute values as follows:
The use of the terms presentation processor
and transformation processor within this document does not imply conformance
per se to any of the Standard Profiles defined in [TTML2].
In other words, it is not considered an error for
a presentation processor or transformation processor
to conform to the profile defined in this document
without also conforming to the TTML2 Presentation Profile or the TTML2 Transformation Profile.
Note
The use of the [TTML2] profiling
sub-system to describe DAPT conformance within this specification
is not intended imply that DAPT processors are required to support any features of that
system other than those for which support is explicitly required by DAPT.
The permitted and prohibited dispositions do not refer to the specification of
a <ttp:feature> or <ttp:extension> element
as being permitted or prohibited within a <ttp:profile> element.
For example, a DAPT Script can include syntax permitted
by the IMSC ([TTML-IMSC1.2]) profiles of [TTML2] to enhance the presentation
of scripts to actors recording audio,
or to add styling important for later usage in subtitle or caption creation.
Editor's note
Editorial task: go through this list of features and check the disposition of each.
There should be no prohibited features that are permitted in IMSC.
This is the profile expression of the prohibition of xml:lang
on <audio> having a different computed value to the
parent element and descendant or referenced <source>
and <data> elements, as specified in
4.9.1 Audio Recording.
The DAPT Content Profile expresses the conformance requirements of DAPT Scripts
using the profile mechanism of [TTML2].
It can be used by a validating processor that supports the DAPT Processor Profile
to validate a DAPT Document.
There is no requirement to include the DAPT Content Profile within a Document Instance.
<?xml version="1.0" encoding="utf-8"?><!-- this file defines the "dapt-content" profile of ttml --><profilexmlns="http://www.w3.org/ns/ttml#parameter"designator="http://www.w3.org/ns/ttml/profile/dapt1.0/content"combine="mostRestrictive"type="content"><featuresxml:base="http://www.w3.org/ns/ttml/feature/"><!-- required (mandatory) feature support --><featurevalue="required">#structure</feature><featurevalue="required">#timeBase-media</feature><!-- optional (voluntary) feature support --><featurevalue="optional">#animate-fill</feature><featurevalue="optional">#animate-minimal</feature><featurevalue="optional">#audio</feature><featurevalue="optional">#audio-description</feature><featurevalue="optional">#audio-speech</feature><featurevalue="optional">#bidi</feature><featurevalue="optional"extends="#bidi">#bidi-version-2</feature><featurevalue="optional">#chunk</feature><featurevalue="optional">#content</feature><featurevalue="optional">#contentProfiles</feature><featurevalue="optional">#contentProfiles-combined</feature><featurevalue="optional">#core</feature><featurevalue="optional">#data</feature><featurevalue="optional">#direction</feature><featurevalue="optional">#embedded-audio</feature><featurevalue="optional">#embedded-data</feature><featurevalue="optional">#frameRate</feature><featurevalue="optional">#frameRateMultiplier</feature><featurevalue="optional">#gain</feature><featurevalue="optional">#metadata</feature><featurevalue="optional">#metadata-item</feature><featurevalue="optional"extends="#metadata">#metadata-version-2</feature><featurevalue="optional">#pan</feature><featurevalue="optional">#permitFeatureNarrowing</feature><featurevalue="optional">#permitFeatureWidening</feature><featurevalue="optional">#pitch</feature><featurevalue="optional">#presentation-audio</feature><featurevalue="optional">#processorProfiles</feature><featurevalue="optional">#processorProfiles-combined</feature><featurevalue="optional">#resources</feature><featurevalue="optional"extends="#animation">#set</feature><featurevalue="optional">#set-fill</feature><featurevalue="optional">#set-multiple-styles</feature><featurevalue="optional">#source</feature><featurevalue="optional">#speak</feature><featurevalue="optional">#speech</feature><featurevalue="optional">#styling</feature><featurevalue="optional">#styling-chained</feature><featurevalue="optional">#styling-inheritance-content</feature><featurevalue="optional">#styling-inline</feature><featurevalue="optional">#styling-referential</feature><featurevalue="optional">#tickRate</feature><featurevalue="optional">#time-clock</feature><featurevalue="optional">#time-offset</feature><featurevalue="optional">#time-offset-with-frames</feature><featurevalue="optional">#time-offset-with-ticks</feature><featurevalue="optional">#timing</feature><featurevalue="optional">#unicodeBidi</feature><featurevalue="optional">#unicodeBidi-isolate</feature><featurevalue="optional"extends="#unicodeBidi">#unicodeBidi-version-2</feature><featurevalue="optional">#xlink</feature><!-- prohibited feature support --><featurevalue="prohibited">#animation-out-of-line</feature><featurevalue="prohibited">#clockMode</feature><featurevalue="prohibited">#clockMode-gps</feature><featurevalue="prohibited">#clockMode-local</feature><featurevalue="prohibited">#clockMode-utc</feature><featurevalue="prohibited">#dropMode</feature><featurevalue="prohibited">#dropMode-dropNTSC</feature><featurevalue="prohibited">#dropMode-dropPAL</feature><featurevalue="prohibited">#dropMode-nonDrop</feature><featurevalue="prohibited">#markerMode</feature><featurevalue="prohibited">#markerMode-continuous</feature><featurevalue="prohibited">#markerMode-discontinuous</feature><featurevalue="prohibited">#subFrameRate</feature><featurevalue="prohibited">#time-clock-with-frames</feature><featurevalue="prohibited">#time-wall-clock</feature><featurevalue="prohibited">#timeBase-clock</feature><featurevalue="prohibited">#timeBase-smpte</feature><featurevalue="prohibited">#timeContainer</feature></features><extensionsxml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/"><!-- required (mandatory) extension support --><extensionvalue="required">#contentProfiles-root</extension><extensionvalue="required">#scriptType-root</extension><extensionvalue="required">#serialization</extension><extensionvalue="required">#textLanguageSource</extension><extensionvalue="required">#workflowType-root</extension><extensionvalue="required">#xmlId-div</extension><extensionvalue="required">#xmlLang-root</extension><!-- optional (voluntary) extension support --><extensionvalue="optional">#agent</extension><extensionvalue="optional">#onScreen</extension><!-- prohibited extension support --><extensionvalue="prohibited">#profile-root</extension><extensionvalue="prohibited">#source-data</extension><extensionvalue="prohibited">#xmlLang-audio-nonMatching</extension></extensions></profile>
E.3 DAPT Processor Profile
The DAPT Processor Profile expresses the processing requirements of DAPT Scripts
using the profile mechanism of [TTML2].
A processor that supports the required features and extensions of the DAPT Processor Profile
can, minimally, process all permitted features within a DAPT Document.
There is no requirement to include the DAPT Processor Profile within a Document Instance.
<?xml version="1.0" encoding="utf-8"?><!-- this file defines the "dapt-processor" profile of ttml --><profilexmlns="http://www.w3.org/ns/ttml#parameter"designator="http://www.w3.org/ns/ttml/profile/dapt1.0/processor"combine="mostRestrictive"type="processor"><featuresxml:base="http://www.w3.org/ns/ttml/feature/"><!-- required (mandatory) feature support --><featurevalue="required">#animate-fill</feature><featurevalue="required">#animate-minimal</feature><featurevalue="required">#audio</feature><featurevalue="required">#audio-description</feature><featurevalue="required">#audio-speech</feature><featurevalue="required">#bidi</feature><featurevalue="required"extends="#bidi">#bidi-version-2</feature><featurevalue="required">#chunk</feature><featurevalue="required">#content</feature><featurevalue="required">#contentProfiles</feature><featurevalue="required">#core</feature><featurevalue="required">#data</feature><featurevalue="required">#direction</feature><featurevalue="required">#embedded-audio</feature><featurevalue="required">#embedded-data</feature><featurevalue="required">#frameRate</feature><featurevalue="required">#frameRateMultiplier</feature><featurevalue="required">#gain</feature><featurevalue="required">#metadata</feature><featurevalue="required">#metadata-item</feature><featurevalue="required"extends="#metadata">#metadata-version-2</feature><featurevalue="required">#pan</feature><featurevalue="required">#pitch</feature><featurevalue="required">#presentation-audio</feature><featurevalue="required">#resources</feature><featurevalue="required"extends="#animation">#set</feature><featurevalue="required">#set-fill</feature><featurevalue="required">#set-multiple-styles</feature><featurevalue="required">#source</feature><featurevalue="required">#speak</feature><featurevalue="required">#speech</feature><featurevalue="required">#structure</feature><featurevalue="required">#styling</feature><featurevalue="required">#styling-chained</feature><featurevalue="required">#styling-inheritance-content</feature><featurevalue="required">#styling-inline</feature><featurevalue="required">#styling-referential</feature><featurevalue="required">#tickRate</feature><featurevalue="required">#time-clock</feature><featurevalue="required">#time-offset</feature><featurevalue="required">#time-offset-with-frames</feature><featurevalue="required">#time-offset-with-ticks</feature><featurevalue="required">#timeBase-media</feature><featurevalue="required">#timing</feature><featurevalue="required">#transformation</feature><featurevalue="required">#unicodeBidi</feature><featurevalue="required">#unicodeBidi-isolate</feature><featurevalue="required"extends="#unicodeBidi">#unicodeBidi-version-2</feature><featurevalue="required">#xlink</feature><!-- optional (voluntary) feature support --><featurevalue="optional">#animation-out-of-line</feature><featurevalue="optional">#clockMode</feature><featurevalue="optional">#clockMode-gps</feature><featurevalue="optional">#clockMode-local</feature><featurevalue="optional">#clockMode-utc</feature><featurevalue="optional">#contentProfiles-combined</feature><featurevalue="optional">#dropMode</feature><featurevalue="optional">#dropMode-dropNTSC</feature><featurevalue="optional">#dropMode-dropPAL</feature><featurevalue="optional">#dropMode-nonDrop</feature><featurevalue="optional">#markerMode</feature><featurevalue="optional">#markerMode-continuous</feature><featurevalue="optional">#markerMode-discontinuous</feature><featurevalue="optional">#permitFeatureNarrowing</feature><featurevalue="optional">#permitFeatureWidening</feature><featurevalue="optional">#processorProfiles</feature><featurevalue="optional">#processorProfiles-combined</feature><featurevalue="optional">#subFrameRate</feature><featurevalue="optional">#time-clock-with-frames</feature><featurevalue="optional">#time-wall-clock</feature><featurevalue="optional">#timeBase-clock</feature><featurevalue="optional">#timeBase-smpte</feature><featurevalue="optional">#timeContainer</feature></features><extensionsxml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/"><!-- required (mandatory) extension support --><extensionvalue="required">#agent</extension><extensionvalue="required">#contentProfiles-root</extension><extensionvalue="required">#onScreen</extension><extensionvalue="required">#scriptType-root</extension><extensionvalue="required">#serialization</extension><extensionvalue="required">#textLanguageSource</extension><extensionvalue="required">#workflowType-root</extension><extensionvalue="required">#xmlId-div</extension><extensionvalue="required">#xmlLang-root</extension><!-- optional (voluntary) extension support --><extensionvalue="optional">#profile-root</extension><extensionvalue="optional">#source-data</extension><extensionvalue="optional">#xmlLang-audio-nonMatching</extension></extensions></profile>
F. Extensions
F.1 General
The following sections define extension designations,
expressed as relative URIs (fragment identifiers)
relative to the DAPT Extension Namespace base URI.
These extension designations are used in E. Profiles
to describe the normative provisions of DAPT that are not expressed
by [TTML2] profile features.
F.2 #agent
A transformation processor supports the #agent extension if
it recognizes and is capable of transforming values of the following
elements and attributes on
the <ttm:agent> element:
xml:id attribute
<ttm:name> element
and if it recognizes and is capable of transforming each of the following value combinations:
<ttm:agent> element with type="person"
and child <ttm:name> element with type="full";
<ttm:agent> element with type="character"
and child <ttm:name> element with type="alias";
A presentation processor supports the #agent extension if
it implements presentation semantic support of the above listed
elements, attributes and value combinations.
F.3 #contentProfiles-root
A transformation processor supports the #contentProfiles-root extension if
it recognizes and is capable of transforming values of the
ttp:contentProfiles attribute on the <tt> element.
A presentation processor supports the #contentProfiles-root extension if
it implements presentation semantic support of the
ttp:contentProfiles attribute on the <tt> element.
Note
F.4 #onScreen
A transformation processor supports the #onScreen extension if
it recognizes and is capable of transforming values of the
daptm:onScreen attribute on the <div> element.
A presentation processor supports the #onScreen extension if
it implements presentation semantic support of the
daptm:onScreen attribute on the <div> element.
F.5 #profile-root
A transformation processor supports the #profile-root extension if
it recognizes and is capable of transforming values of the
ttp:profile attribute on the <tt> element.
A presentation processor supports the #profile-root extension if
it implements presentation semantic support of the
ttp:profile attribute on the <tt> element.
F.6 #scriptType-root
A transformation processor supports the #scriptType-root extension if
it recognizes and is capable of transforming values of the
daptm:scriptType attribute on the <tt> element.
A presentation processor supports the #scriptType-root extension if
it implements presentation semantic support of the
daptm:scriptType attribute on the <tt> element.
An example of a transformation processor that supports this extension is
a validating processor that provides appropriate feedback, for example warnings,
when the SHOULD requirements defined in 4.1.2 Script Type for a
DAPT Document's daptm:scriptType are not met,
and that reports an error if the extension is required by a
content profile but the Document Instance claiming
conformance to that profile either does not have a
daptm:scriptType attribute on the <tt> element
or has one whose value is not defined herein.
F.7 #serialization
A serialized document that is valid with respect to the #serialization
extension is
an XML 1.0 [xml] document encoded using
UTF-8 character encoding as specified in [UNICODE],
that contains no entity declarations and
no entity references other than to predefined entities.
A transformation processor that writes documents supports
the #serialization extension if
it can write a serialized document as defined above.
F.8 #source-data
A transformation processor supports the #source-data extension if
it recognizes and is capable of transforming values of the
<source> element
child of a
<data> element.
A presentation processor supports the #source-data extension if
it implements presentation semantic support of the
<source> element
child of a
<data> element.
F.9 #textLanguageSource
A transformation processor supports the #textLanguageSource extension if
it recognizes and is capable of transforming values of the
daptm:langSrc attribute on the <p> element.
A presentation processor supports the #textLanguageSource extension if
it implements presentation semantic support of the
daptm:langSrc attribute on the <p> element.
F.10 #workflowType-root
A transformation processor supports the #workflowType-root extension if
it recognizes and is capable of transforming values of the
daptm:workflowType attribute on the <tt> element.
A presentation processor supports the #workflowType-root extension if
it implements presentation semantic support of the
daptm:workflowType attribute on the <tt> element.
An example of a transformation processor that supports this extension is
a validating processor that reports an error if the extension is required by a
content profile but the Document Instance claiming
conformance to that profile either does not have a
daptm:workflowType attribute on the <tt> element
or has one whose value is not defined herein.
F.11 #xmlId-div
A transformation processor supports the #xmlId-div extension if
it recognizes and is capable of transforming values of the
xml:id attribute on the <div> element.
A presentation processor supports the #xmlId-div extension if
it implements presentation semantic support of the
xml:id attribute on the <div> element.
F.12 #xmlLang-audio-nonMatching
A transformation processor supports the #xmlLang-audio-nonMatching extension if
it recognizes and is capable of transforming values of the
xml:lang attribute on the <audio> element
that differ from the computed value of the same attribute of its
parent element or any of its descendant or referenced
<source> or <data> elements,
known as non-matching values.
A presentation processor supports the #xmlLang-audio-nonMatching extension if
it implements presentation semantic support of such non-matchingxml:lang attribute values.
F.13 #xmlLang-root
A transformation processor supports the #xmlLang-root extension if
it recognizes and is capable of transforming values of the
xml:lang attribute on the <tt> element.
A presentation processor supports the #xmlLang-root extension if
it implements presentation semantic support of the
xml:lang attribute on the <tt> element.
G. Acknowledgments
The editors would like to thank XXX for their contributions to this specification.