Internationalizing SSML
Kazuyuki Ashimura,
W3C, Team contact for the Voice Browser Working Group
<ashimura@w3.org>
Why Internationalizing SSML?
Global users of the Web
-
The Web is not only for English-native people but also everyone in the world.
-
SSML might be used for international connection services between one country and another like international call.
-
SSML should provide various features for spoken languages of all countries and areas in the world.
Extension of SSML ability
-
Enhancements for non-English languages to make SSML more useful in current and emerging markets (e.g. China, Korea, Japan, etc.).
- More precise pronunciation identification and prosodic controls are essential for richer speech synthesis.
- Lots of useful suggestions are included in non-English speech synthesis especially Asian language synthesis.
Problem to be solved: Pronunciation ambiguity
- SSML 1.0 vocabulary provides various ways to eliminate pronunciation ambiguities.
- However, still many problems remaining...
Example of pronunciation ambiguity in Japanese (1)
A certain character sequence can have several different meanings with different pitch accents.
Note: "'" means that there is accent nucleus (= perceived pitch falling).
Example of pronunciation ambiguity in Japanese (2)
Sometimes a certain character sequence can have even opposite meanings with different combination of duration and intonation.
Controls for prosodic information
To solve the problem of pronunciation ambiguities, additional specification must be provided to SSML.
- Especially, controls for prosodic information are essential for Asian tonal languages.
- Such controls can be specified for each step of TTS process to control each DB and/or Model (e.g. model selection, parameters for model).
Category of prosodic controls
According to
Fujisaki
,
prosodic information is classified into three categories.
Therefore we should consider these three categories when we discuss prosodic controls.
- Linguistic Information
-
- Symbolic information represented by a set of discrete symbols and rules for their combination.
- It can be represented either explicitly by the written language, or can be easily and uniquely inferred from context.
- It is discrete and categorical, for example, character sequences, parts of speech, accent types, etc.
- Paralinguistic Information
-
- Information not inferable from the written counterpart but deliberately added by the speaker to modify or supplement the linguistic information.
- It can be both discrete and continuous, for example, duration and speech rate, fundamental frequency transition, spectrum transition, etc.
- Nonlinguistic Information
-
- Information concerns factors as age, gender, idiosyncrasy, physical and emotional states of the speaker.
- It is not directly related to linguistic information nor paralinguistic information, and not generally under control of the speaker.
Possible prosodic controls
- There are various prosodic controls that are useful for rendering non-English languages.
- Some of them are already included in SSML 1.0, others should be added.
- Additional topics and extensions to current SSML will be proposed in this Workshop ;-)
Items in black: |
Examples of potential controls borrowed from Fujisaki's definition
|
Items in red: |
Elements for prosodic controls in SSML 1.0
|
Category of prosody |
Input Level |
Text Analysis |
Prosody Analysis |
Waveform Production |
Linguistic Information |
- character sequences
- part of speech
- accent types
- <p>
- <s>
- <say-as>
- <sub>
- <lexicon>
- <phoneme>
|
?
|
?
|
Paralinguistic Information |
?
|
- duration and speech rate
- fundamental frequency transition
- spectrum transition
- <prosody>
- <emphasis>
- <break>
|
|
Nonlinguistic Information |
?
|
?
|
- age
- gender
- idiosyncrasy
- physical and emotional states of the speaker
- <voice>
- <audio>
|
Let's get started
Goals & Scope of the workshop
- To identify and prioritize extensions and additions to SSML that will improve the use of SSML for rendering non-English languages.
- The scope of the workshop is not limited to Asian languages.
- Suggestions for enhancements to SSML for the support of any non-English language are welcome, especially if they are relevant for multiple languages.
Topics
- Diacritics for auto-completion
- Representing special word classes
- Representing word boundaries
- Denoting language and character sets
- Tones
- Sentence structure
- Words with multiple pronunciations and meanings
- Text with multiple languages
- Expression, speaking style, and focus
- Other extensions and/or additions to SSML