Copyright ©2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document details the responses made by the Voice Browser Working Group to issues raised during the Candidate Recommendation (beginning 18 December 2003 and ending 18 February 2004) review of Speech Synthesis Markup Language (SSML) Version 1.0 Candidate Recommendation. Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document of the W3C's Voice Browser Working Group describes the disposition of comments as of June 29, 2004 on Speech Synthesis Markup Language (SSML) Version 1.0 Candidate Recommendation. It may be updated, replaced or rendered obsolete by other W3C documents at any time.
Comments on this document and requests for further information should be sent to the Working Group's public mailing list www-voice@w3.org (archive). Note as a precaution against spam, you should first subscribe to this list by sending an email to <www-voice-request@w3.org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe).
This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).
This document describes the disposition of comments in relation to the Speech Synthesis Markup Language (SSML) Version 1.0 (http://www.w3.org/TR/2003/CR-speech-synthesis-20031218/). Each issue is described by the name of the commenter, a description of the issue, and either the resolution or the reason that the issue was not resolved.
Notation: Each original comment is tracked by a "Candidate Recommendation Public Comment" [CRPC] designator. Each point within that original comment is identified by a point number. For example, "CRPC5-1" is the first point in the fifth CR public comment for the specification.
Item | Commenter | Proposed disposition | Status |
CRPC1-1 | David Descamps | N/A (Question) | Implicitly accepted |
CRPC1-2 | David Descamps | N/A (Question) | Implicitly accepted |
CRPC2-1 | David Descamps | N/A (Question) | Implicitly accepted |
CRPC3-1 | Roopa Trivedi | N/A (Question) | Implicitly accepted |
CRPC4-1 | Susan Lesch | Accepted | Accepted |
CRPC4-2 | Susan Lesch | Partially accepted | Accepted |
CRPC5-1 | Roopa Trivedi | N/A (Question) | Implicitly accepted |
CRPC6-72 | I18N Interest Group | Accepted | Accepted |
From David Descamps
quote:
"...
Relative changes in prosodic parameters should be carried across voice changes. However, different voices have different natural defaults for pitch, speaking rate, etc. because they represent different personalities, so absolute values of the prosodic parameters may vary across changes in the voice.
..."
if I understand that, the synthesis processor must make the difference between, for example, baseline pitch change and relative pitch change?
In the "prosody" element: when you change your pitch by
- a number followed by "Hz", you change the baseline pitch
- and a relative change or "x-low", "low", "medium", "high", "x-high", or "default", you change the relative pitch.
Is It right?
Proposed disposition: N/A (Question)
Yes, the processor must differentiate between absolute changes of the baseline and changes relative to the baseline.
A number followed by "Hz" and "x-low" through "x-high", etc. are absolute (baseline) pitch changes. Only relative changes change the relative pitch.
Email Trail:
From David Descamps
If my understanding of the previous point is correct, does a baseline change in a "prosody" element cancel previous relative change?
Proposed disposition: N/A (Question)
Yes. Note that it would only cancel relevant relative changes. For example, setting the baseline pitch would not reset relative pitch *range* changes.
Although it is easy to construct silly or bizarre example combinations of absolute and relative changes, most of which will hopefully be ignored by intelligent processors, the goal of this separation was to simplify the case where an author has increased the tempo/pitch/etc. of a voice and wishes that same relative change to apply when a small amount of text in another language (and voice) is embedded in the stream.
Email Trail:
From David Descamps
quote:
"...
Relative changes in prosodic parameters should be carried across voice changes. However, different voices have different natural defaults for pitch, speaking rate, etc. because they represent different personalities, so absolute values of the prosodic parameters may vary across changes in the voice.
..."How the syntesis processor have to deal with relative pitch change in Hertz or in semitone?
/******
For example, you have a male voice (baseline pitch of : +/- 100Hz) with a relative change of 10Hz. You change the voice in a female one (baseline pitch : +/- 180Hz) and you keep the relative pitch change:
male : 100 -> 110 Hz : +10.0%
female : 180 -> 190 Hz : + 5.5%the proportion of your relative pitch change in Hertz have been corrupted!
*******/How the synthesis processor have to deal with relative change in Hz or st through a voice change: keep the value or keep the proportion?
Proposed disposition: N/A (Question)
You would keep the value. It is already possible to make relative changes in percentage terms. It is the author's responsibility to specify the relative change in the terms desired.
Email Trail:
From Roopa Trivedi
SSML 1.0 spec says that
"gender: optional attribute indicating the preferred gender of the voice to speak the contained text. Enumerated values are: "male", "female", "neutral""
Does this mean that "neutral" is yet another type of gender supported by some TTS vendors? Or, does it mean that the user does not wish to specify a gender and thus uses "neutral" to leave the gender selection elsewhere.
Proposed disposition: N/A (Question)
When you use the "<voice>" element you are requesting to the sythesis processor the "best" voice that you need for your application. So if you ask for a "neutral" voice the engine will do its best to find the voice that best suits your request.
Email Trail:
From Susan Lesch
Analysis:The document is served ISO-8859-1 as far as I can tell from .htaccess but the change notes say "Changed examples to use utf-8." So somewhere in production there is an encoding mismatch. For example: § looks like this:今日㯠[etc.] looks like this:
2004-03-09: Max proposed a response.
2004-05-18: SSML was written to be UTF-8. There still seems to be a problem with the document not being served as UTF-8. Dan will send email to Dave and Max asking them to address this problem. Assigned to Dave and Max.
2004-05-18: Dan sent email to Dave and Max.
2004-05-25: Max requests the most recent version of the document. Dan sends his final SSML draft of last year to Max and Dave. 2004-06-01: We agree to the following public response: "The original document was in UTF-8, but an error occurred somewhere in the publication process. It is our understanding that all specifications are now being served in UTF-8, so this should not be a problem for the Proposed Recommendation and Recommendation."
Proposed disposition: Accepted
The original document was in UTF-8, but an error occurred somewhere in the publication process. It is our understanding that all specifications are now being served in UTF-8, so this should not be a problem for the Proposed Recommendation and Recommendation
Email Trail:
From Susan Lesch
Analysis:These caps can be lowercase to match your RFC 2119 convention: the processor MUST render The processor SHOULD also text MUST be rendered But because of this use of must: Defining a comprehensive set of text format types is difficult because of the variety of languages that must be considered and because of the innate flexibility of written languages. RFC 2119 markup would help. There is example XHTML and CSS in the Manual of Style (can be adapted).
2004-03-09: Max proposed a partial response.
2004-05-18: Group accepts Max's suggestion. Group accepts Susan's suggestion to lowercase the three keyword instances listed. Style changes suggested by Susan are not considered necessary and will only be done if Editor has spare time.
Proposed disposition: Partially accepted
We will convert the three keyword instances you list to lower case.
We will change the offending "must" to "have to" in the sentence you quote.
Thank you for the style suggestion. We may or may not implement this, as time permits.
Email Trail:
From Roopa Trivedi
The SSML 1.0 spec says in the following section http://www.w3.org/TR/speech-synthesis/#S3.1.8
that
"SSML only specifies the say-as element, its attributes, and their purpose. It does not enumerate the possible values for the attributes. The Working Group expects to produce a separate document that will define standard values and associated normative behavior for these values."
Is there a separate document already that we can refer to for the values of say-as attributes? If yes, where can we find that? If not, would these values be vendor dependent for now?
Proposed disposition: N/A (Question)
Proposed:
Currently, values for these attributes are not defined and are therefore effectively vendor-dependent as of today. We are currently working on the document mentioned above.
Email Trail:
From I18N Interest Group
This item was originally point 145-72 in the Last Call Disposition of Comments. I18N wished to see further clarification in the next process stage for the specification.
Proposed disposition: Accepted
We will remove the offending line.
Email Trail: