Voice Extensible Markup Language (VoiceXML) 2.1

1 Introduction

The popularity of VoiceXML 2.0 [VXML2] spurred the development of numerous voice browser implementations early in the specification process. [VXML2] has been phenomenally successful in enabling the rapid deployment of voice applications that handle millions of phone calls every day. This success has led to the development of additional, innovative features that help developers build even more powerful voice-activated services. While it was too late to incorporate these additional features into [VXML2], the purpose of VoiceXML 2.1 is to formally specify the most common features to ensure their portability between platforms and at the same time maintain complete backwards-compatibility with [VXML2].

This document defines a set of 8 commonly implemented additional features to VoiceXML 2.0 [VXML2].

2 Referencing Grammars Dynamically

As described in section 3.1 of [VXML2], the <grammar> element allows the specification of a speech recognition or DTMF grammar. VoiceXML 2.1 extends the <grammar> element to support the following additional attribute:

Table 1: <grammar> Attributes
expr	Equivalent to src, except that the URI is dynamically determined by evaluating the given ECMAScript expression. The expression must be evaluated each time the grammar needs to be activated.

Exactly one of "src", "expr", or an inline grammar must be specified; otherwise, an error.badfetch event is thrown.

The following example dynamically selects the appropriate grammar and prompt based on whether or not the user is signed in.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">

  <var name="signedIn" expr="false"/>
  
  <var name="theGrammar" expr="signedIn ? 'loggedin.grxml' : 'anonymous.grxml'"/>
  <var name="thePrompt" expr="signedIn ? 
    'Say balances, trade stocks, or get quote' : 'Say sign in or get quote.' "/>
  
  <form id="main">
      <field name="opt">
        <grammar type="application/srgs+xml" expr="theGrammar"/>
        <prompt><value expr="thePrompt"/></prompt>
        <filled>
          <prompt>You said <value expr="opt"/></prompt>
        </filled>
      </field>
  </form>
</vxml>

The following example dynamically selects the appropriate grammar and prompt based on whether the user is a novice or an expert.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">

  <var name="userLevel" expr="'novice'"/>

  <var name="thePrompt" expr="userLevel == 'novice' ? 
    'To obtain your balances, say balances or press 1. To trade stocks, say stocks or press 2.' : 
    'Say balances or trade.' "/>
  
  <form id="main">
      <field name="opt">
        <grammar type="application/srgs+xml" expr="'mainmenu_' + userLevel + '.grxml'"/>
        <prompt><value expr="thePrompt"/></prompt>
        <filled>
          <prompt>You said <value expr="opt"/></prompt>
        </filled>
      </field>
  </form>
</vxml>

3 Referencing Scripts Dynamically

As described in section 5.3.12 of [VXML2], the <script> element allows the specification of a block of client-side scripting language code, and is analogous to the [HTML4] <SCRIPT> element. VoiceXML 2.1 extends the <script> element to support the following additional attribute:

Table 2: <script> Attributes
expr	Equivalent to src, except that the URI is dynamically determined by evaluating the given ECMAScript expression. The expression must be evaluated each time the script needs to be executed.

Exactly one of "src", "expr", or an inline script must be specified; otherwise, an error.badfetch event is thrown.

The following example retrieves a parameterized script based on the value of the variable user_id.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">
  <form>
    <var name="user_id" expr="12345"/>
    <script expr="'http://acme.passport.net/?id=' + user_id"/>
  </form>
</vxml>

4 Using <mark> to Detect Barge-in During Prompt Playback

As described in section 2.3.2 of [SSML], the <mark> element places a marker into the text/tag sequence. An SSML processor must either allow the VoiceXML interpreter to retrieve or must inform the interpreter when a <mark> is executed during audio output.

[SSML] defines a single attribute, name, on the <mark> element, allowing the programmer to name the mark. VoiceXML 2.1 extends the <mark> element to support the following additional attribute:

Table 3: <mark> Attributes
nameexpr	An ECMAScript expression which evaluates to the name of the mark.

Exactly one of "name" and "nameexpr" must be specified; otherwise, an error.badfetch event is thrown.

As described in section 4.1.1 of [VXML2], the <mark> element is permitted in conforming VoiceXML documents, but [VXML2] does not specify a standard way for VoiceXML processors to access <mark> element information. Processors of conforming VoiceXML 2.1 documents must set the following two properties on the application.lastresult$ object whenever the application.lastresult$ object is assigned (e.g. a <link> is matched) and a <mark> has been executed.

Table 4: <mark>-related application.lastresult$ properties
markname	The name of the mark last executed by the SSML processor before barge-in occurred or the end of audio playback occurred. If no mark was executed, this variable is undefined.
marktime	The number of milliseconds that elapsed since the last mark was executed by the SSML processor until barge-in occurred or the end of audio playback occurred. If no mark was executed, this variable is undefined.

When a <mark> is executed during the processing of a form item, the interpreter sets shadow variables, the names of which correspond to the properties of the application.lastresult$ object. The value of each shadow variable must be identical to the value of the corresponding application.lastresult$ property.

The following example establishes marks at the beginning and at the end of an advertisement. In the <filled>, the code checks which mark, if any, was last executed when bargein occurred. If the "ad_start" mark is executed but "ad_end" is not, the code checks that at least 5 seconds of the advertisement has been played and sets the played_ad variable appropriately.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">

  <var name="played_ad" expr="false"/>

  <form>   
    <field name="team">
      <prompt>
        <mark name="ad_start"/>
        Baseball scores brought to you by Elephant Peanuts.
        There's nothing like the taste of fresh roasted peanuts.
        Elephant Peanuts. Ask for them by name.
        <mark name="ad_end"/>
        <break time="500ms"/>
        Say the name of a team. For example, say Boston Red Sox.
      </prompt>
  
      <grammar type="application/srgs+xml" src="teams.grxml"/>
  
      <filled>
        <if cond="typeof(team$.markname) == 'string' &amp;&amp; 
           (team$.markname=='ad_end' || 
           (team$.markname=='ad_start' &amp;&amp; 
              team$.marktime &gt;= 5000))">
          <assign name="played_ad" expr="true"/>
        <else/>
          <assign name="played_ad" expr="false"/>
        </if>
      </filled>
  
    </field>
  </form>   

</vxml>

5 Using <data> to Fetch XML Without Requiring a Dialog Transition

The <data> element allows a VoiceXML application to fetch arbitrary XML data from a document server without transitioning to a new VoiceXML document. The XML data fetched by the <data> element is bound to ECMAScript through the named variable that exposes a read-only subset of the W3C Document Object Model (DOM).

Attributes of <data> are:

Table 5: <data> Attributes
src	The URI specifying the location of the XML data to retrieve.
name	The name of the variable that exposes the DOM.
expr	Like src, except that the URI is dynamically determined by evaluating the given ECMAScript expression when the data needs to be fetched.
method	The request method: get (the default) or post.
namelist	The list of variables to submit. By default, no variables are submitted. If a namelist is supplied, it may contain individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced.
enctype	The media encoding type of the submitted document. The default is application/x-www-form-urlencoded. Interpreters must also support multipart/form-data and may support additional encoding types.
fetchaudio	See Section 6.1 of [VXML2].
fetchhint	See Section 6.1 of [VXML2].
fetchtimeout	See Section 6.1 of [VXML2].
maxage	See Section 6.1 of [VXML2].
maxstale	See Section 6.1 of [VXML2].

Exactly one of "src" or "expr" must be specified; otherwise, an error.badfetch event is thrown.

Platforms should support parsing the XML data into a DOM. If an implementation does not support DOM, the name attribute must not be set, and any retrieved content must be ignored by the interpreter. If the name attribute is present, these implementations will throw error.unsupported.data.name.

If the name attribute is present, the VoiceXML interpreter must expose the retrieved content via a read-only subset of the DOM as specified in Appendix D. If the content cannot be retrieved, the interpreter throws an error as specified for fetch failures in Section 5.2.6 of [VXML2]. If the retrieved content is not well-formed XML, the interpreter throws error.badfetch.

Like the <var> element, the <data> element can occur in executable content or as a child of <form> or <vxml>. In addition, it shares the same scoping rules as the <var> element. If a <data> element has the same name as a variable already declared in the same scope, the variable is assigned a reference to the DOM exposed by the <data> element.

If use of the DOM causes a DOMException to be thrown, but the DOMException is not caught by an ECMAScript catch handler, the VoiceXML interpreter throws error.semantic.

Like the <submit> element, when an ECMAScript variable is submitted to the server its value is first converted into a string before being submitted. If the variable is an ECMAScript Object the mechanism by which it is submitted is not currently defined. If a <data> element's namelist contains a variable which references recorded audio but does not contain an enctype of multipart/form-data, the behavior is not specified. It is probably inappropriate to attempt to URL-encode large quantities of data.

In the examples that follow, the XML document fetched by the <data> element is in the following format:

<?xml version="1.0" encoding="UTF-8"?>
<?access-control allow="*.roadrunner.edu *.acme.edu"?>
<quote>
  <ticker>F</ticker>
  <name>Ford Motor Company</name>
  <change>1.00</change>
  <last>30.00</last>
</quote>

The following example assigns the value of the "last" element to the ECMAScript variable "price":

<data name="quote" src="quote.xml"/> 
<script><![CDATA[
  var price = quote.documentElement.getElementsByTagName("last").item(0).firstChild.data;
]]></script>

The data is fetched when the <data> element is executed according to the caching rules established in Section 6.1 of [VXML2].

Before exposing the data in the XML document referenced by the <data> element via the DOM, the interpreter must check the "access-control" processing instruction included in the XML document indicating the domains allowed to access the data. If the processing instruction is absent, or the domain, partial domain suffix, or IP address of the document server that provided the VoiceXML document containing the <data> element is not listed, the interpreter must throw error.noauthorization. The format of the "access-control" processing instruction is formally described in Appendix E. If the VoiceXML interpreter encounters multiple "access-control" processing instructions in the retrieved XML document, it uses the last one encountered.

The following example retrieves a stock quote in one dialog, caches the DOM in a variable at document scope, and uses the DOM to playback the quote in another dialog.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">
  <var name="quote"/>
  <var name="ticker" expr="'f'"/>

  <form id="get_quote">
     <block>
        <data name="quote" expr="'http://www.quoteserver.com/getquote?ticker=' + ticker"/>
        <assign name="document.quote" expr="quote.documentElement"/>
        <goto next="#play_quote"/>         
     </block>
  </form>

  <form id="play_quote">

     <script><![CDATA[
     // retrieve the value contained in the node t from the DOM exposed by d
     function GetData(d, t, nodata)
     {
        try {
           return d.getElementsByTagName(t).item(0).firstChild.data;
        }
        catch(e)
        {
           // the value could not be retrieved, so return this instead
           return nodata;
        }
     }
     ]]></script>

     <block>
        <!-- retrieve the change in the stock's value -->
        <var name="change" expr="GetData(quote, 'change', 0)"/>

        <!--play the company name -->
        <audio expr="ticker + '.wav'"><value expr="GetData(quote, 'name', 'unknown')"/></audio>
        <!-- play 'unchanged, 'up', or 'down' based on zero, positive, or negative change -->
        <if cond="change == 0">
           <audio src="unchanged_at.wav"/>
        <else/>
           <if cond="change &gt; 0">
              <audio src="up.wav"/>
           <else/> <!-- negative -->
              <audio src="down.wav"/>
           </if>
           <audio src="by.wav"/>
           <!-- play change in value as positive number -->
           <audio><value expr="Math.abs(change)"/></audio>
           <audio src="to.wav"/>
        </if>
        <!-- play the current price per share -->
        <audio><value expr="GetData(quote, 'last', 0)"/></audio>
     </block>
  </form>
</vxml>

6 Concatenating Prompts Dynamically Using <foreach>

The <foreach> element allows a VoiceXML application to iterate through an ECMAScript array and to execute the content contained within the <foreach> element for each item in the array.

Attributes of <foreach> are:

Table 6: <foreach> Attributes
array	An ECMAScript expression that must evaluate to an array; otherwise, an error.semantic event is thrown.
item	The variable that stores each array item upon each iteration of the loop. A new variable will be declared if it is not already defined within the parent's scope.

The <foreach> element can occur in executable content and as a child of <prompt>.

The following example calls a user-defined function GetMovieList that returns an ECMAScript array. The array is assigned to the variable named 'prompts'. Upon entering the <field>, if a noinput or a nomatch event occurs, the VoiceXML interpreter reprompts the user by executing the second <prompt>. The second <prompt> executes the <foreach> element by iterating through the ECMAScript array 'prompts' and assigning each array element to the variable 'thePrompt'. Upon each iteration of the array, the interpreter executes the <audio> and <break> elements contained within the <foreach> element.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml"
  version="2.1">

  <script src="movies.js"/>

  <form id="pick_movie">
  
    <!-- 
    GetMovieList returns an array of objects
      with properties audio and tts.
      The size of the array is undetermined until runtime.
    -->  
    <var name="prompts" expr="GetMovieList()"/>

    <field name="movie">
      <grammar type="application/srgs+xml" src="movie_names.grxml"/>
  
      <prompt>Say the name of the movie you want.</prompt>
  
      <prompt count="2">
        <audio src="prelist.wav">When you hear the name of the movie you want, just say it.</audio>
        <foreach item="thePrompt" array="prompts">
          <audio expr="thePrompt.audio"><value expr="thePrompt.tts"/></audio>
          <break time="300ms"/>
        </foreach>
      </prompt>
  
      <noinput>
        I'm sorry. I didn't hear you.
        <reprompt/>
      </noinput>
  
      <noinput>
        I'm sorry. I didn't get that.
        <reprompt/>
      </noinput>
  
    </field>
  </form>
</vxml>

The following is a contrived implementation of the user-defined GetMovieList function:

function GetMovieList()
{
  var movies = new Array(3);
  movies[0] = new Object();
  movies[0].audio = "godfather.wav"; movies[0].tts = "the godfather";
  movies[1] = new Object();
  movies[1].audio = "high_fidelity.wav"; movies[1].tts = "high fidelity";
  movies[2] = new Object();
  movies[2].audio = "raiders.wav"; movies[2].tts = "raiders of the lost ark";
  
  return movies;
}

When the interpreter queues the second <prompt>, it expands the <foreach> element in the previous example to the following:

<audio src="godfather.wav">the godfather</audio>
<break time="300ms"/>
<audio src="high_fidelity.wav">high fidelity</audio>
<break time="300ms"/>
<audio src="raiders.wav">raiders of the lost ark</audio>
<break time="300ms"/>

The following example combines the use of the <mark> and <foreach> elements to more precisely identify which item in a list of movies the user has selected. During each iteration of the movies array, the interpreter stores the current item in the array to the variable movie. Upon execution of the <mark> element, the name of the <mark> is set to the unique identifier of the movie as specified by the id property of the object referenced by the variable movie.

In the <filled>, if the form item variable mov is set to the value "more", indicating the user's desire to hear more detail about a movie, the code retrieves the unique identifier of the desired movie from mov$.markname. If barge-in occurred within twenty milliseconds of the mark's execution, the code retrieves the unique identifier of the preceding movie under the assumption that the user was slow to react.

<vxml xmlns="http://www.w3.org/2001/vxml"
  version="2.1">

  <script src="movies.js"/>

  <form id="list_movies">
    <!-- 
      GetMovieList returns an array of objects
      with properties audio, tts, and id.
      The size of the array is undetermined until runtime.
    -->  
    <var name="movies" expr="GetMovieList()"/>
  
    <field name="mov">
      <prompt>
        Say the name of the movie.
      </prompt>
  
      <prompt count="2">
        Here's the list of movies. 
        To hear more about a movie, say 'tell me more'.
        <break time="500ms"/>
        <foreach item="movie" array="movies">
          <mark nameexpr="movie.id"/>
          <audio expr="movie.audio"><value expr="movie.tts"/></audio>
          <break time="500ms"/>
        </foreach>
      </prompt>
  
      <grammar type="application/srgs+xml" src="more.grxml"/>
      <grammar type="application/srgs+xml" src="movies.grxml"/>
  
      <catch event="nomatch">
        Sorry. I didn't get that.
        <reprompt/>
      </catch>
  
      <catch event="noinput">
        <reprompt/>
      </catch>
  
      <filled>
        <var name="movie_id"/>
        <if cond="'more' == mov">
          <!-- user wants more detail -->
          <if cond="mov$.markname != undefined &amp;&amp; mov$.marktime &lt;= 20">
            <!-- returns the id of the previous movie (or the first) -->
            <assign name="movie_id" expr="GetPreviousMovie(mov$.markname)"/>
          <else/>
            <assign name="movie_id" expr="mov$.markname"/>
          </if>
        <else/>
          <!-- user said a specific movie -->
          <assign name="movie_id" expr="mov"/>
        </if>
        <!-- 
          Given an id, GetMovieDetail returns an object
          with properties audio and tts.
        -->  
        <var name="detail" expr="GetMovieDetail(movie_id)"/>
        <audio expr="detail.audio"><value expr="detail.tts"/></audio>
      </filled>
    
    </field>
  
  </form>
</vxml>

The following is a contrived implementation of the user-defined GetMovieList function:

function GetMovieList()
{
  var movies = new Array(3);
  movies[0] = new Object(); movies[0].id = "m0010";
  movies[0].audio = "godfather.wav"; movies[0].tts = "the godfather";
  movies[1] = new Object(); movies[1].id = "m0052";
  movies[1].audio = "high_fidelity.wav"; movies[1].tts = "high fidelity";
  movies[2] = new Object(); movies[2].id = "m0027";
  movies[2].audio = "raiders.wav"; movies[2].tts = "raiders of the lost ark";
  
  return movies;
}

When the interpreter queues the second <prompt>, it expands the <foreach> element in the previous example to the following:

<mark name="m0010"/>
<audio src="godfather.wav">the godfather</audio>
<break time="300ms"/>
<mark name="m0052"/>
<audio src="high_fidelity.wav">high fidelity</audio>
<break time="300ms"/>
<mark name="m0027"/>
<audio src="raiders.wav">raiders of the lost ark</audio>
<break time="300ms"/>

7 Recording User Utterances While Attempting Recognition

Several elements defined in [VXML2] can instruct the interpreter to accept user input during execution. These elements include <field>, <initial>, <link>, <menu>, <record>, and <transfer>. VoiceXML 2.1 extends these elements to allow the interpreter to conditionally enable recording while simultaneously gathering input from the user.

To enable recording during recognition, set the value of the recordutterance property to true. If the recordutterance property is set to true in the current scope , the following three shadow variables are set on the appropriate form item variable:

Table 7: recordutterance-related shadow variables
recording	The variable that stores a reference to the recording, or undefined if no audio is collected.
recordingsize	The size of the recording in bytes, or undefined if no audio is collected.
recordingduration	The duration of the recording in milliseconds, or undefined if no audio is collected.

The interpreter sets the corresponding properties on the application.lastresult$ object, and the value of each property must be identical to the value of the corresponding shadow variable. In the case of <link> and <menu>, the interpreter only sets the application.lastresult$ properties.

Support for this feature is optional on <record>, and <transfer>. Platforms that support it set the aforementioned shadow variables on the associated form item variable and the corresponding properties on the application.lastresult$ object when the recordutterance property is set to true in an encompassing scope.

Like recordings created using the <record> tag, utterance recordings can be submitted to a document server via HTTP POST using the namelist attribute of the <submit> and <subdialog> elements. The enctype attribute must be set to "multipart/form-data", and the method attribute must be set to "post".

In the following example, the dialog requests a city and state from the user. On the third recognition failure, the recording of the user's utterance is submitted to a Web server.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" 
  version="2.1">
  <form>
    <property name="recordutterance" value="true"/>

    <field name="city_state">
     <prompt>
      Say a city and state.
     </prompt>

     <grammar type="application/srgs+xml" src="citystate.grxml"/>

     <nomatch>
      I'm sorry. I didn't get that.
      <reprompt/>
     </nomatch>

     <nomatch count="3">
       <submit method="post" 
         enctype="multipart/form-data" 
         next="upload.cgi"
         namelist="lastresult$.recording"/>
     </nomatch>
    </field>  
  </form>
</vxml>

7.1 Specifying the Media Format of Utterance Recordings

To specify the media format of the resulting recording, set the recordutterancetype property. Platforms must support the audio file formats specified in Appendix E of [VXML2]. Other formats may also be supported. The recordutterancetype property defaults to a platform-specific format which should be one of the required formats. Note that the recordutterancetype property does not affect the <record> element.

8 Adding namelist to <disconnect>

As described in section 5.3.11 of [VXML2], the <disconnect> element causes the interpreter context to disconnect from the user. VoiceXML 2.1 extends the <disconnect> element to support the following attribute:

Table 8: <disconnect> Attributes
namelist	Variable names to be returned to interpreter context. The default is to return no variables; this means the interpreter context will receive an empty ECMAScript object.

9 Adding type to <transfer>

As described in section 2.3.7 of [VXML2], the <transfer> element directs the interpreter to connect the caller to another entity. VoiceXML 2.1 extends the <transfer> element to support the following additional attribute:

Table 9: <transfer> Attributes
type	The type of transfer. The value can be "bridge", "blind", or "consultation".

Exactly one of "bridge" or "type" may be specified; otherwise an error.badfetch event is thrown.

As specified in 2.3.7 of [VXML2], the <transfer> element is optional, though platforms should support it. Platforms that support <transfer> may support any combination of bridge, blind, or consultation transfer types.

If the value of the type attribute is set to "bridge", the interpreter's behavior must be identical to its behavior when the value of the bridge attribute is set to "true". If the value of the type attribute is set to "blind", the interpreter's behavior must be identical to its behavior when the bridge attribute is set to "false". The behavior of the bridge attribute is fully specified in section 2.3.7 of [VXML2]. If the type attribute is specified and the bridge attribute is absent, the value of the type attribute takes precedence over the default value of the bridge attribute.

The bridge attribute is maintained for backwards compatiblity with [VXML2]. Since all of the functionality of the bridge attribute has been incorporated into the type attribute, developers are encouraged to use the type attribute on platforms that implement it.

The connecttimeout attribute of <transfer> applies if the type attribute is set to "bridge" or "consultation".

The maxtime attribute of <transfer> applies if the type attribute is set to "bridge".

9.1 Consultation Transfer

The consultation transfer is similar to a blind transfer except that the outcome of the transfer call setup is known and the caller is not dropped as a result of an unsuccessful transfer attempt. When performing a consultation transfer, the platform monitors the progress of the transfer until the connection is established between caller and callee. If the connection cannot be established (e.g. no answer, line busy, etc.), the session remains active and returns control to the application. As in the case of a blind transfer, if the connection is established, the interpreter disconnects from the session, connection.disconnect.transfer is thrown, and document interpretation continues normally. Any connection between the caller and the callee remains in place regardless of document execution.

Issue (xfer1):

The Voice Browser Working Group would like to solicit feedback on the proposed value: "consultation". Alternative proposals should consider the specified transfer functionality in comparison to existing standards.

Resolution:

None recorded.

Figure 1: Audio Connections during a consultation transfer: <transfer type="consultation"/>

The possible outcomes for a consultation transfer before the connection to the callee is established are:

Table 10: Consultation Transfer Outcomes Prior to Connection Being Established
Action	Value of form item variable	Event	Reason
caller disconnects		connection.disconnect.hangup	The caller hung up.
caller cancels transfer before outgoing call begins	near_end_disconnect		The caller cancelled the transfer attempt via a DTMF or voice command before the outgoing call begins (during playback of queued audio).
callee busy	busy		The callee was busy.
network busy	network_busy		An intermediate network refused the call.
callee does not answer	noanswer		There was no answer within the time specified by the connecttimeout attribute.
---	unknown		The transfer ended but the reason is not known.

The possible outcomes for a consultation transfer once the connection to the callee is established are:

Table 11: Consultation Transfer Outcomes After Connection Has Been Established
Action	Value of form item variable	Event	Reason
transfer begins	undefined	connection.disconnect.transfer	The caller was transferred to another line and will not return.
transfer ends	unknown		The transfer ended but the reason is not known.

9.2 Consultation Transfer Errors and Events

One of the following events may be thrown during a consultation transfer:

Table 12: Events thrown during consultation transfer
Event	Reason
connection.disconnect.hangup	The caller hung up.
connection.disconnect.transfer	The caller was transferred to another line and will not return.

If a consultation transfer could not be made, one of the following errors will be thrown:

Table 13: Consultation transfer attempt error events
Error	Reason
error.connection.noauthorization	The caller is not allowed to call the destination.
error.connection.baddestination	The destination URI is malformed.
error.connection.noroute	The platform is not able to place a call to the destination.
error.connection.noresource	The platform cannot allocate resources to place the call.
error.connection.protocol.nnn	The protocol stack for this connection raised an exception that does not correspond to one of the other error.connection events.
error.unsupported.transfer.consultation	The platform does not support consultation transfer.
error.unsupported.uri	The platform does not support the URI format used. The special variable _message (section 5.2.2 of [VXML2]) will contain the string "The URI x is not a supported URI format" where x is the URI from the dest or destexpr <transfer> attributes.

9.3 Example of a Consultation Transfer

The following example attempts to perform a consultation transfer of the caller to a another party. Prompts may be included before or within the <transfer> element. This may be used to inform the caller of what is happening, with a notice such as "Please wait while we transfer your call." The <prompt> within the <block>, and the <prompt> within <transfer> are queued and played before actually performing the transfer. After the prompt queue is flushed, the outgoing call is initiated. The "transferaudio" attribute specifies an audio file to be played to the caller in place of audio from the far-end until the far-end answers. If the audio source is longer than the connect time, the audio will stop playing immediately upon far-end answer.

Figure 2: Sequence and timing during an example of a consultation transfer

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" 
  xmlns="http://www.w3.org/2001/vxml" 
<catch event="connection.disconnect.transfer">
   <!-- far-end answered -->
   <log> Connection with the callee established: transfer executed.</log>
</catch>

<form id="consultation_xfer">
   <block>
     <!-- queued and played before starting the transfer -->
     <prompt>
        Calling Riley.
     </prompt>
   </block>
   <!-- Play music while attempting to connect to far-end -->
   <!-- Wait up to 60 seconds for the far end to answer  -->
   <transfer name="mycall" dest="tel:+1-555-123-4567"
      transferaudio="music.wav" connecttimeout="60s" type="consultation">
     <!-- queued and played before starting the transfer -->
     <prompt>
        Please wait...
     </prompt>
     <filled>
        <if cond="mycall == 'busy'">
           <prompt>
             Riley's line is busy. Please call again later.
           </prompt>
         <elseif cond="mycall == 'noanswer'"/>
           <prompt>
             Riley can't answer the phone now. Please call
             again later.
           </prompt>
         </if>
      </filled>
   </transfer>
   <!-- submit call statistics to server -->
   <block>
      <submit namelist="mycall" next="/cgi-bin/report"/>
   </block>
</form>
</vxml>

A VoiceXML Document Type Definition

The VoiceXML 2.1 DTD is not available yet.

B VoiceXML Schema

The VoiceXML 2.1 XML Schema Definition is not available yet.

C Conformance

This section is normative.

C.1 Conforming VoiceXML Document

A conforming VoiceXML document is a well-formed [XML] document that requires only the facilities described as mandatory in this specification and in [VXML2]. Such a document must meet all of the following criteria:

The document must conform to the constraints expressed in the VoiceXML Schema
The root element of the document must be <vxml>.
The <vxml> element must include a "version" attribute with the value "2.1".
The <vxml> element must designate the VoiceXML namespace using the "xmlns" attribute [XMLNAMES]. The namespace for VoiceXML is defined to be http://www.w3.org/2001/vxml.
It is recommended that the <vxml> element also include "xmlns:xsi" and "xsi:schemaLocation" attributes to indicate the location of the schema for the VoiceXML namespace. If the "xsi:schemaLocation" attribute is present, it must include a reference to the VoiceXML Schema:
```
xsi:schemaLocation="http://www.w3.org/2001/vxml 
  http://www.w3.org/TR/voicexml21/vxml.xsd"
```
There may be a DOCTYPE declaration in the document prior to the root element. If present, the public identifier included in the DOCTYPE declaration must reference the VoiceXML DTD using its Formal Public Identifier.
```
<!DOCTYPE vxml 
  PUBLIC "-//W3C//DTD VOICEXML 2.1//EN" 
  "http://www.w3.org/TR/voicexml21/vxml.dtd">
```
The system identifier may be modified appropriately. The DTD subset must not be used to override any parameter entities in the DTD.

Here is an example of a Conforming VoiceXML document:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.w3.org/2001/vxml 
  http://www.w3.org/TR/voicexml21/vxml.xsd">

  <form>
    <block>hello</block>
  </form>

</vxml>

Note that in this example, the recommended "xmlns:xsi" and "xsi:schemaLocation" attributes are included as is an XML declaration. An XML declaration like the one above is not required in all XML documents. VoiceXML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.

The VoiceXML language or these conformance criteria provide no designated size limits on any aspect of VoiceXML documents. There are no maximum values on the number of elements, the amount of character data, or the number of characters in attribute values.

C.2 Using VoiceXML with other namespaces

The VoiceXML namespace may be used with other XML namespaces as per [XMLNAMES], although such documents are not strictly conforming VoiceXML documents as defined above. Future work by W3C will address ways to specify conformance for documents involving multiple namespaces.

C.3 Conforming VoiceXML Processors

A VoiceXML processor is a user agent that can parse and process Conforming VoiceXML documents.

In a Conforming VoiceXML Processor, the XML parser must be able to parse and process all well-formed XML constructs defined within [XML] and [XMLNAMES]. It is not required that a Conforming VoiceXML processor use a validating parser.

A Conforming VoiceXML Processor must be a Conforming Speech Synthesis Markup Language Processor [SSML] and a Conforming XML Grammar Processor [SRGS] except for differences described in this document. If a syntax error is detected processing a grammar document, then an "error.badfetch" event must be thrown.

A Conforming VoiceXML Processor must support the syntax and semantics of all VoiceXML elements as described in this document and in [VXML2]. Consequently, a Conforming VoiceXML Processor must not throw an 'error.unsupported.<element>' for any VoiceXML element which must be supported when processing a Conforming VoiceXML Document.

When a Conforming VoiceXML Processor encounters a Conforming VoiceXML Document with non-VoiceXML elements or attributes which are proprietary, defined only in earlier versions of VoiceXML, or defined in a non-VoiceXML namespace, and which cannot be processed, then it must throw an "error.badfetch" event.

When a Conforming VoiceXML Processor encounters a document with a root element designating a namespace other than VoiceXML, its behavior is undefined.

There is, however, no conformance requirement with respect to performance characteristics of the VoiceXML Processor.

D ECMAScript Language Binding for DOM

This appendix contains the ECMAScript binding for the subset of Level 2 of the Document Object Model exposed by the <data> element.

Prototype Object DOMException

The DOMException class has the following constants:

DOMException.INDEX_SIZE_ERR: This constant is of type Number and its value is 1.
DOMException.DOMSTRING_SIZE_ERR: This constant is of type Number and its value is 2.
DOMException.HIERARCHY_REQUEST_ERR: This constant is of type Number and its value is 3.
DOMException.WRONG_DOCUMENT_ERR: This constant is of type Number and its value is 4.
DOMException.INVALID_CHARACTER_ERR: This constant is of type Number and its value is 5.
DOMException.NO_DATA_ALLOWED_ERR: This constant is of type Number and its value is 6.
DOMException.NO_MODIFICATION_ALLOWED_ERR: This constant is of type Number and its value is 7.
DOMException.NOT_FOUND_ERR: This constant is of type Number and its value is 8.
DOMException.NOT_SUPPORTED_ERR: This constant is of type Number and its value is 9.
DOMException.INUSE_ATTRIBUTE_ERR: This constant is of type Number and its value is 10.
DOMException.INVALID_STATE_ERR: This constant is of type Number and its value is 11.
DOMException.SYNTAX_ERR: This constant is of type Number and its value is 12.
DOMException.INVALID_MODIFICATION_ERR: This constant is of type Number and its value is 13.
DOMException.NAMESPACE_ERR: This constant is of type Number and its value is 14.
DOMException.INVALID_ACCESS_ERR: This constant is of type Number and its value is 15.

The DOMException object has the following properties:

code: This property is of type Number.

Object DOMImplementation

The object has the following methods:

hasFeature(feature, version): This method returns a Boolean.
The feature parameter is of type String .
The version parameter is of type String .
createDocumentType(qualifiedName, publicId, systemId): This method returns a DocumentType object.
The qualifiedName parameter is of type String .
The publicId parameter is of type String .
The systemId parameter is of type String .
This method can raise a DOMException object.
createDocument(namespaceURI, qualifiedName, doctype): This method returns a Document object.
The namespaceURI parameter is of type String .
The qualifiedName parameter is of type String .
The doctype parameter is a DocumentType object .
This method can raise a DOMException object.

Object DocumentFragment

DocumentFragment has all the properties and methods of the Node object .

Object Document

Document has all the properties and methods of the Node object as well as the properties and methods defined below.

The Document object has the following properties:

doctype: This read-only property is a DocumentType object.
implementation: This read-only property is a DOMImplementation object.
documentElement: This read-only property is a Element object.

The object has the following methods:

createElement(tagName): This method returns a Element object.
The tagName parameter is of type String .
This method can raise a DOMException object.
createDocumentFragment(): This method returns a DocumentFragment object.
This method can raise a DOMException object.
createTextNode(data): This method returns a Text object.
The data parameter is of type String .
createComment(data): This method returns a Comment object.
The data parameter is of type String .
createCDATASection(data): This method returns a CDATASection object.
The data parameter is of type String .
This method can raise a DOMException object.
createProcessingInstruction(target, data): This method returns a ProcessingInstruction object.
The target parameter is of type String .
The data parameter is of type String .
This method can raise a DOMException object.
createAttribute(name): This method returns a Attr object.
The name parameter is of type String .
This method can raise a DOMException object.
createEntityReference(name): This method returns a EntityReference object.
The name parameter is of type String .
This method can raise a DOMException object.
getElementsByTagName(tagname): This method returns a NodeList object.
The tagname parameter is of type String .
This method can raise a DOMException object.
importNode(importedNode, deep): This method returns a Node object.
The importedNode parameter is a Node object .
The deep parameter is of type Boolean .
This method can raise a DOMException object.
createElementNS(namespaceURI, qualifiedName): This method returns a Element object.
The namespaceURI parameter is of type String .
The qualifiedName parameter is of type String .
This method can raise a DOMException object.
createAttributeNS(namespaceURI, qualifiedName): This method returns a Attr object.
The namespaceURI parameter is of type String .
The qualifiedName parameter is of type String .
This method can raise a DOMException object.
getElementsByTagNameNS(namespaceURI, localName): This method returns a NodeList object.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
getElementById(elementId): This method returns a Element object.
The elementId parameter is of type String .

Prototype Object Node

The Node class has the following constants:

Node.ELEMENT_NODE: This constant is of type Number and its value is 1.
Node.ATTRIBUTE_NODE: This constant is of type Number and its value is 2.
Node.TEXT_NODE: This constant is of type Number and its value is 3.
Node.CDATA_SECTION_NODE: This constant is of type Number and its value is 4.
Node.ENTITY_REFERENCE_NODE: This constant is of type Number and its value is 5.
Node.ENTITY_NODE: This constant is of type Number and its value is 6.
Node.PROCESSING_INSTRUCTION_NODE: This constant is of type Number and its value is 7.
Node.COMMENT_NODE: This constant is of type Number and its value is 8.
Node.DOCUMENT_NODE: This constant is of type Number and its value is 9.
Node.DOCUMENT_TYPE_NODE: This constant is of type Number and its value is 10.
Node.DOCUMENT_FRAGMENT_NODE: This constant is of type Number and its value is 11.
Node.NOTATION_NODE: This constant is of type Number and its value is 12.

The Node object has the following properties:

nodeName: This read-only property is of type String.
nodeValue: This property is of type String. This property can raise a DOMException object on retrieval.
nodeType: This read-only property is of type Number.
parentNode: This read-only property is a Node object.
childNodes: This read-only property is a NodeList object.
firstChild: This read-only property is a Node object.
lastChild: This read-only property is a Node object.
previousSibling: This read-only property is a Node object.
nextSibling: This read-only property is a Node object.
attributes: This read-only property is a NamedNodeMap object.
ownerDocument: This read-only property is a Document object.
namespaceURI: This read-only property is of type String.
prefix: This property is of type String.
localName: This read-only property is of type String.

The object has the following methods:

insertBefore(newChild, refChild): This method returns a Node object.
The newChild parameter is a Node object .
The refChild parameter is a Node object .
This method can raise a DOMException object.
replaceChild(newChild, oldChild): This method returns a Node object.
The newChild parameter is a Node object .
The oldChild parameter is a Node object .
This method can raise a DOMException object.
removeChild(oldChild): This method returns a Node object.
The oldChild parameter is a Node object .
This method can raise a DOMException object.
appendChild(newChild): This method returns a Node object.
The newChild parameter is a Node object .
This method can raise a DOMException object.
hasChildNodes(): This method returns a Boolean.
cloneNode(deep): This method returns a Node object.
The deep parameter is of type Boolean .
normalize(): This method has no return value.
isSupported(feature, version): This method returns a Boolean.
The feature parameter is of type String .
The version parameter is of type String .
hasAttributes(): This method returns a Boolean.

Object NodeList

The NodeList object has the following properties:

length: This read-only property is of type Number.

The object has the following methods:

item(index): This method returns a Node object.
The index parameter is of type Number .

Object NamedNodeMap

The NamedNodeMap object has the following properties:

length: This read-only property is of type Number.

The object has the following methods:

getNamedItem(name): This method returns a Node object.
The name parameter is of type String .
setNamedItem(arg): This method returns a Node object.
The arg parameter is a Node object .
This method can raise a DOMException object.
removeNamedItem(name): This method returns a Node object.
The name parameter is of type String .
This method can raise a DOMException object.
item(index): This method returns a Node object.
The index parameter is of type Number .
getNamedItemNS(namespaceURI, localName): This method returns a Node object.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
This method can raise a DOMException object.
setNamedItemNS(arg): This method returns a Node object.
The arg parameter is a Node object .
This method can raise a DOMException object.
removeNamedItemNS(namespaceURI, localName): This method returns a Node object.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
This method can raise a DOMException object.

Object CharacterData

CharacterData has all the properties and methods of the Node object as well as the properties and methods defined below.

The CharacterData object has the following properties:

data: This property is of type String. This property can raise a DOMException object on retrieval.
length: This read-only property is of type Number.

The object has the following methods:

substringData(offset, count): This method returns a String.
The offset parameter is of type Number .
The count parameter is of type Number .
This method can raise a DOMException object.
appendData(arg): This method has no return value.
The arg parameter is of type String .
This method can raise a DOMException object.
insertData(offset, arg): This method has no return value.
The offset parameter is of type Number .
The arg parameter is of type String .
This method can raise a DOMException object.
deleteData(offset, count): This method has no return value.
The offset parameter is of type Number .
The count parameter is of type Number .
This method can raise a DOMException object.
replaceData(offset, count, arg): This method has no return value.
The offset parameter is of type Number .
The count parameter is of type Number .
The arg parameter is of type String .
This method can raise a DOMException object.

Object Attr

Attr has all the properties and methods of the Node object as well as the properties and methods defined below.

The Attr object has the following properties:

name: This read-only property is of type String.
specified: This read-only property is of type Boolean.
value: This property is of type String.
ownerElement: This read-only property is a Element object.

Object Element

Element has all the properties and methods of the Node object as well as the properties and methods defined below.

The Element object has the following properties:

tagName: This read-only property is of type String.

The object has the following methods:

getAttribute(name): This method returns a String.
The name parameter is of type String .
setAttribute(name, value): This method has no return value.
The name parameter is of type String .
The value parameter is of type String .
This method can raise a DOMException object.
removeAttribute(name): This method has no return value.
The name parameter is of type String .
This method can raise a DOMException object.
getAttributeNode(name): This method returns a Attr object.
The name parameter is of type String .
setAttributeNode(newAttr): This method returns a Attr object.
The newAttr parameter is a Attr object .
This method can raise a DOMException object.
removeAttributeNode(oldAttr): This method returns a Attr object.
The oldAttr parameter is a Attr object .
This method can raise a DOMException object.
getElementsByTagName(name): This method returns a NodeList object.
The name parameter is of type String .
getAttributeNS(namespaceURI, localName): This method returns a String.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
setAttributeNS(namespaceURI, qualifiedName, value): This method has no return value.
The namespaceURI parameter is of type String .
The qualifiedName parameter is of type String .
The value parameter is of type String .
This method can raise a DOMException object.
removeAttributeNS(namespaceURI, localName): This method has no return value.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
This method can raise a DOMException object.
getAttributeNodeNS(namespaceURI, localName): This method returns a Attr object.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
setAttributeNodeNS(newAttr): This method returns a Attr object.
The newAttr parameter is a Attr object .
This method can raise a DOMException object.
getElementsByTagNameNS(namespaceURI, localName): This method returns a NodeList object.
The namespaceURI parameter is of type String .
The localName parameter is of type String .
hasAttribute(name): This method returns a Boolean.
The name parameter is of type String .
hasAttributeNS(namespaceURI, localName): This method returns a Boolean.
The namespaceURI parameter is of type String .
The localName parameter is of type String .

Object Text

Text has all the properties and methods of the CharacterData object .

The object has the following methods:

splitText(offset): This method returns a Text object.
The offset parameter is of type Number .
This method can raise a DOMException object.

Object Comment

Comment has all the properties and methods of the CharacterData object .

Object CDATASection

CDATASection has all the properties and methods of the Text object .

Object DocumentType

DocumentType has all the properties and methods of the Node object as well as the properties and methods defined below.

The DocumentType object has the following properties:

name: This read-only property is of type String.
entities: This read-only property is a NamedNodeMap object.
notations: This read-only property is a NamedNodeMap object.
publicId: This read-only property is of type String.
systemId: This read-only property is of type String.
internalSubset: This read-only property is of type String.

Object Notation

Notation has all the properties and methods of the Node object as well as the properties and methods defined below.

The Notation object has the following properties:

publicId: This read-only property is of type String.
systemId: This read-only property is of type String.

Object Entity

Entity has all the properties and methods of the Node object as well as the properties and methods defined below.

The Entity object has the following properties:

publicId: This read-only property is of type String.
systemId: This read-only property is of type String.
notationName: This read-only property is of type String.

Object EntityReference

EntityReference has all the properties and methods of the Node object .

Object ProcessingInstruction

ProcessingInstruction has all the properties and methods of the Node object as well as the properties and methods defined below.

The ProcessingInstruction object has the following properties:

target: This read-only property is of type String.
data: This property is of type String.

E Securing access to <data>

Before exposing the XML document referenced by the <data> element via the DOM to a voice application, the interpreter must validate that the host requesting the document is allowed to access the data. This validation is performed by comparing the hostname and IP Address of the document server from which the document containing the <data> element was fetched to the list of hostnames, hostname suffixes, and IP addresses listed in the access control processing instruction included in the XML document referenced by the <data> element. Access to the data is allowed only if there is a match; otherwise, the interpreter throws error.noauthorization.

The following grammar describes the syntax for the access control processing instruction used by the <data> element. The grammar is specified using Extended Backus-Naur Form (EBNF) notation. For more information on this syntax, see section 6, Notation, in [XML]. For definitions of the HostName and IPv4address productions, see [RFC2396].

Access Control Processing Instruction

[1]	`AccessControlPI`	::=	`'<?access-control' S 'allow="'AccessList'"?>'`
[2]	`AccessList`	::=	`AccessItem (S AccessItem)* \| '*'`
[3]	`AccessItem`	::=	`HostName \| PartialHostName \| IPv4address`
[4]	`PartialHostName`	::=	`'*.' HostName`

In the following example, the hosts named "voice.roadrunner.edu" and "voice.acme.edu" are allowed access to the data. A data request from a VoiceXML document located on all other hosts (e.g. "voice.coyote.net") will fail.

<?access-control allow="voice.roadrunner.edu voice.acme.edu"?>

Numerous hosts within a domain may require data access, and listing them all is impractical. For this reason, the VoiceXML interpreter supports wildcard matching through the use of an asterisk ('*') at the beginning of a domain name. In the following example, all hosts within the "roadrunner.edu" and "acme.net" domains are allowed access to the data in which the PI is contained:

<?access-control allow="*.roadrunner.edu *.acme.edu"?>

To allow any host in any domain to access the data, set the value of allow to a single asterisk ('*') as shown in the following example:

<?access-control allow="*"?>

Issue (data_sec):

The Voice Browser Working Group would like to solicit feedback from Web security experts on the general applicability of this technique for protecting unauthorized access to XML documents.

Resolution:

None recorded.

F References

F.1 Normative References

DOM2: DOM Level 2, ed. Arnaud Le Hors et al. W3C Recommendation, November 2000. See http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/.
HTML4: HTML 4.01, ed. Dave Raggett et al. W3C Recommendation, December 1999. See http://www.w3.org/TR/1999/REC-html401-19991224/.
SRGS: Speech Recognition Grammar Specification Version 1.0, ed. Andrew Hunt and Scott McGlashan W3C Recommendation, March 2004. See http://www.w3.org/TR/2004/REC-speech-grammar-20040316/.
SSML: Speech Synthesis Markup Language, ed. Daniel C. Burnett et al. W3C Working Draft, December 2002. See http://www.w3.org/TR/2002/WD-speech-synthesis-20021202/.
RFC2396: Uniform Resource Identifiers (URI): Generic Syntax, ed. T. Berners-Lee et al. IETF RFC 2396, August 1998. See http://www.ietf.org/rfc/rfc2396.txt.
VXML2: VoiceXML 2.0, ed. Scott McGlashan et al. W3C Recommendation, March 2004. See http://www.w3.org/TR/2004/REC-voicexml20-20040316/.
XML: Extensible Markup Language (XML) 1.0, ed. Tim Bray et al. W3C Recommendation, October 2000. See http://www.w3.org/TR/2000/REC-xml-20001006.
XMLNAMES: Namespaces in XML, ed. Tim Bray et al. W3C Recommendation, January 1999. See http://www.w3.org/TR/1999/REC-xml-names-19990114/.

Voice Extensible Markup Language (VoiceXML) 2.1

W3C Working Draft 23 March 2004

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Referencing Grammars Dynamically

3 Referencing Scripts Dynamically

4 Using <mark> to Detect Barge-in During Prompt Playback

5 Using <data> to Fetch XML Without Requiring a Dialog Transition

6 Concatenating Prompts Dynamically Using <foreach>

7 Recording User Utterances While Attempting Recognition

7.1 Specifying the Media Format of Utterance Recordings

8 Adding namelist to <disconnect>

9 Adding type to <transfer>

9.1 Consultation Transfer

9.2 Consultation Transfer Errors and Events

9.3 Example of a Consultation Transfer

A VoiceXML Document Type Definition

B VoiceXML Schema

C Conformance

C.1 Conforming VoiceXML Document

C.2 Using VoiceXML with other namespaces

C.3 Conforming VoiceXML Processors

D ECMAScript Language Binding for DOM

E Securing access to <data>

Access Control Processing Instruction

F References

F.1 Normative References