Semantic Interpretation for Speech Recognition

W3C Working Draft 8 November 2004

This version:: http://www.w3.org/TR/2004/WD-semantic-interpretation-20041108/
Latest version:: http://www.w3.org/TR/semantic-interpretation/
Previous version:: http://www.w3.org/TR/2003/WD-semantic-interpretation-20030401/
Editors:: Luc Van Tichelen, ScanSoft

Abstract

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification.

Semantic Interpretation may be useful in combination with other specifications, such as the Stochastic Language Models (N-Gram) Specification, but their use with N-grams has not yet been studied.

The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity is defining a data format (EMMA) for representing information contained in user utterances. It is believed that semantic interpretation will be able to produce results that can be included in EMMA.

Status of this document

This document is a public W3C Last Call Working Draft for review by W3C members and other interested parties.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

This document was produced under the 24 January 2002 Current Patent Practice as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a patent disclosure page relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

This specification describes the syntax and semantics for semantic interpretation tags in speech recognition grammars, and forms part of the proposals for the W3C Speech Interface Framework. It is intended to be used with Speech Recognition grammars as defined in Speech Recognition Grammar Specification.

This Last Call Working Draft incorporates several changes to the previous working draft. Based on reviewer feedback, "tags" as attribute were dropped, and an alternative syntax has been defined.

This document is for public review, and comments and discussion are welcomed on the public mailing list <www-voice@w3.org>. The archive for the list is accessible online.

The working group's intention is to advance this specification to Candidate Recommendation during the 1st quarter of 2005 (see Work Items of the Voice Browser Activity). Reviewers are encouraged to send their comments on this working draft before 5 December 2004.

1. Introduction
- 1.1. Semantic Results
- 1.2. Basic Principles
- 1.3. ECMAScript Compact Profile
2. Normative References and Conventions
- 2.1. Normative References
- 2.2. Notational Conventions
3. Expressions in Semantic Interpretation Tags
- 3.1. Rule Variables and Semantic Values
- 3.2. Semantic Interpretation Tags
  - 3.2.1. Semantic Interpretation Scripts
  - 3.2.2. Semantic Interpretation String Literals
- 3.3. Syntax for Rule Variables
4. Semantic Interpretation Grammars
- 4.1. Semantic Interpretation Grammars
- 4.2. Global Variable Declarations and Initialization
5. Default Assignment
6. Visibility Rules and order of tag evaluation for ABNF/XML Speech Recognition Grammar Format
- 6.1. Logical Parse Structure
- 6.2. Flat Parse List
- 6.3. Scoping and Visibility Rules for Script Syntax grammars
- 6.4. Order of tag execution for Script Syntax grammars
- 6.5. Examples
7. Using Semantic Interpretation to generate XML results
- 7.1. Serialization of ECMAScript result into an XML fragment
- 7.2. Use of _attributes and _value
- 7.3. Namespaces
8. Example Grammar with Semantic Interpretation Tags
9. Conformance
Acknowledgments
References

1. Introduction

This section is informative.

1.1. Semantic Results

Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.

In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is presented in the user utterance. What is needed is a computer processable representation of the information, the semantic result, more than a natural language transcript.

Semantic Interpretation Tags provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar.

When used with a VoiceXML Processor, it is expected that a Semantic Interpretation Tag Processor will convert the result generated by an SRGS speech grammar processor into an ECMAScript object that can then be processed as specified in the VoiceXML 2.0 specification section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms.

The W3C Multimodal Interaction working group is defining a data format (EMMA) for the representation of information contained in the user's input (a spoken utterance or other forms of input available through the modalities in the interaction). It is expected that Semantic Interpretation for Speech Recognition will be generating results that can be integrated into EMMA.

This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification.

It is possible that Semantic Interpretation Tags as defined here can be used also with the N-Gram Specification, but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.

1.2. Basic Principles

The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:

semantic information is represented as values associated with non-terminals
statements in Semantic Interpretation Tags are either valid ECMAScript (Compact Profile) or string literals
expression evaluation order is connected to the grammar rule definitions and the sequence of words in the recognized utterance

1.3. ECMAScript Compact Profile

While there was no explicit requirements document created for the properties of a semantic interpretation syntax, the working group gradually learned that there are some conflicting desires to be met.

Certainly, the Semantic Interpretation Tags must be easy to use by developers, and it should minimally provide the expressive power that is needed for the majority of applications. ECMAScript (ECMA-262) would meet these requirements.

On the other hand, there are concerns on performance and other implications from using ECMAScript (such as variable scoping, platform access, etc.).

The ECMAScript Compact Profile (ECMA 327) is a strict subset of the third edition of ECMA-262. It has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.

2. Normative References and Conformance

2.1. Normative References

This document normatively references the ECMA-327 Standard "ECMAScript 3^rd Edition Compact Profile", June 2001, further referenced as ES-CP.

The ES-CP itself references the ECMA-262 Standard "ECMAScript Language Specification", 3rd Edition - December 1999.

For informative purposes, some text from the ECMA-262 has been copied in this document. Where that is done, unless otherwise specified, such text should be considered informative and the corresponding reference to the ECMA-262 standard is normative.

All sections in this specification are normative, unless otherwise indicated.

2.2. Notational Conventions

Throughout the specification the following abbreviations will be used:

Abbreviation	Description
ES n	Shorthand notation for ECMA-262 Section number n.
ES-CP	ECMAScript Compact Profile, see section 2.1.
SI	Semantic Interpretation.

This specification uses the notational conventions for Syntactic and Lexical Grammars as given in ES 5.1, and the same Algorithm Conventions as in ES 5.2.

3. Expressions in Semantic Interpretation Tags

3.1. Rule Variables and Semantic Values

Semantic Interpretation Tags compute semantic values. During the semantic interpretation process, these values can be assigned to variables that are associated with the rules in the grammar. These variables are known as Rule Variables.

Every grammar rule has a single Rule Variable that holds a semantic value. The Rule Variable is typically assigned its value by the SI tags within its grammar rule. SI tags also have access to the Rule Variables of any other rules referenced by the current grammar rule and already processed by that point in the utterance (according to the visibility constraints defined in section 6.). The Rule Variables of other rules are referenced by the name of their grammar rule, as described in section 3.3.1.

Rule Variables can hold semantic values of any type defined in ES-CP. They are not explicitly typed. Rule Variables that have not been assigned a value are not defined. SI authors will typically use scalar types, e.g. string or numeric values, in lower level rules and more structured objects in higher level rules (particularly root rules).

In addition to semantic values, certain other values corresponding to Rule Variables are available during SI processing.

For every Rule Variable there is an associated variable named "text", of type string, which holds the substring (the series of tokens) in the utterance that is governed by the corresponding grammar rule. Text variables are not part of the Rule Variable and can not be modified.

Likewise, for every Rule Variable, there is an associated variable called "score", of type Number, which holds a value that is related to the confidence or probability of the corresponding grammar rule or some similar measure. Higher score values indicate higher confidence or probability over the corresponding grammar rule. Processors that don't compute or don't have access to such values can return a constant value for every score. Score variables are not part of the Rule Variable and can not be modified.

The semantic result for an utterance is the value of the Rule Variable of the root rule when all semantic interpretation evaluations have been completed. For certain result formats (e.g. EMMA), this value is serialized into an XML document according to the description in section 7. It is outside the scope of this specification to define how the semantic result is communicated to the application.

Informative Note:

In the context of the W3C Voice Browser architecture, the semantic result will be directly cast into ECMAScript variables in the VoiceXML interpreter (see VoiceXML 2.0 section 3.1.6. Mapping Semantic Interpretation Results to VoiceXML forms).

In the W3C Multimodal architecure, the semantic result is expected to be transformed into EMMA following the mechanism described in section 7.

In other contexts, the mechanism described in section 7. can be used to transform the semantic result into other XML formats.

Informative Note:

Score values are highly dependent on the processor's implementation.

In most implementations using speech recognition, scores are likely dependent on factors such as audio channel quality, grammar contents, grammar weights, language, individual speaker characteristics, and others. Scores for a particular word or phrase within a grammar are typically comparable over instances of the same word or phrase over time. Scores for different words in a single grammar are also typically comparable to one another. Scores accross grammars, or scores for words and word sequences, or scores between different processors, are very often not comparable.

It is anticipated that scores will be useful only for annotating the results, not for influencing the results during SI processing.

Note that an SI processor doesn't require a speech recognizer, and thus that the score does not even have to be related to speech recognition.

3.2. Semantic Interpretation Tags

This specification defines the syntax for the contents of tags in the grammar. There are two different Semantic Interpretation tag syntaxes that can be used. The two different possible values of the tag-format declaration in the grammar define which of the two syntaxes is being used. The different syntaxes only change the processing of tags during Semantic Interpretation, in all other respects the grammar behaves identically.

The "Script" tag syntax, enabled by setting the tag-format to "semantics/1.0", defines the contents of tags to be ECMAScript. Each tag is a valid ES-CP program. Section 3.2.1. describes the processing of this tag syntax in more detail.

The "String Literal" tag syntax, enabled by setting the tag-format to "semantics/1.0-literals", defines the contents of tags to be strings. This syntax does not have the expressive power of a full scripting language, but does provide a way to produce semantic results consisting of simple strings. Section 3.2.2. describes this tag syntax in more detail.

Within one grammar, it is not possible to mix the two tag syntaxes. All tags in one grammar must have the same tag-format. However, it is possible for externally referenced grammars to have a different tag-format to the parent grammar they are referenced from.

Informative Note:

Semantic Interpretation Tags are added in the string content of the tag elements in the grammar rule expansion, as described in Section 2.6 Tags of the Speech Recognition Grammar Specification. This specification further uses the term Semantic Interpretation Tag (or SI Tag) to refer to such tag.

Below are two example formats of SI Tags in the Speech Recognition Grammar Specification; tag-content represents the content of the tag which can be either a Script or a String Literal.

XML Form

In the XML grammar format, SI Tags are specified as the content of the tag element.

XMLSemanticTag:
    <tag/>
    <tag> </tag>
    <tag> tag-content </tag>

ABNF Form

In the ABNF grammar format, SI Tags are enclosed in curly braces or in the three-character sequences '{!{' and '}!}'.

ABNFSemanticTag :
  {}
  { tag-content }
  {!{ tag-content }!}

3.2.1 Semantic Interpretation Scripts

A Semantic Interpretation Script holds a string that is treated as the source text of a valid ES-CP Program (with Program as defined by ES14).

The environment in which SI tags are embedded may introduce escaped characters, character references or other markup that has to be resolved by the environment. The result after resolution is treated as ES-CP.

It is illegal to make an assignment to a variable that has not been previously declared (either implicitly as is the case for Rule Variables or explicitly by using a var statement). Attempting to assign to an undeclared variable will result in a runtime error.

3.2.2. Semantic Interpretation String Literals

A tag using the String Literal tag syntax has content that is a sequence of zero or more characters. If the character sequence is not empty, it has to follow either the DoubleStringCharacters or the SingleStringCharacters production of ES 7.8.4

During processing, a tag with a String Literal has the same effect as a script that assigns the content of the tag, as a string literal, to the Rule Variable of the rule the tag is in.

Informative Note:

As a consequence, if multiple tags are present in the rule expansion, the Rule Variable is set to the value of the last tag in the expansion. Prior tags are overwritten by the final tag.

A grammar using the Script tag syntax can reference rules of a grammar using the String Literal syntax. The value of the string literal can be obtained by the parent rule using the Rule Variable of the referenced rule. The recognized text of the referenced rule is also available in the meta.latest().text and meta.rulename.text (or the $$$.text and $rulename$.text) variables.

A grammar using the String Literal tag syntax can reference rules in other grammars (which can be using either the Script syntax or the String Literal syntax). See section 5. Default Assignment for the way semantic results from a referenced grammar can be used in a grammar with String Literal tag syntax.

Examples:

The syntax for the XML Form and for the ABNF Form are provided below.

XML Form

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0-literals" root="answer">
				 
  <rule id="answer" scope="public">  
    <one-of>
      <item> <ruleref uri="#yes"/> </item>
      <item> <ruleref uri="#no"/> </item>
    </one-of>
  </rule>
	
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item>yeah<tag>yes</tag></item>
      <item> <token>you bet</token><tag>yes</tag></item>
      <item xml:lang="fr-CA">oui <tag>yes</tag></item>
    </one-of> 
  </rule> 
  
  <rule id="no">
    <one-of>
      <item>no</item>
      <item>nope</item>
      <item>no way</item>
    </one-of>
    <tag>no</tag>
  </rule>
  
</grammar>

The grammar with string literals is equivalent to the grammar with SI Scripts below:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0" root="answer">

  <rule id="answer" scope="public">  
    <one-of>
      <item> <ruleref uri="#yes"/> </item>
      <item> <ruleref uri="#no"/> </item>
    </one-of>
  </rule>
					 
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item>yeah<tag>out="yes";</tag></item>
      <item> <token>you bet</token><tag>out="yes";</tag></item>
      <item xml:lang="fr-CA">oui <tag>out="yes";</tag></item>
    </one-of> 
  </rule> 
  
  <rule id="no">
    <one-of>
      <item>no</item>
      <item>nope</item>
      <item>no way</item>
    </one-of>
    <tag>out="no";</tag>
  </rule>  

</grammar>

ABNF Form

#ABNF 1.0 ;
language en-US;
tag-format <semantics/1.0-literals>;
root $answer;

public $answer = $yes | $no;



$yes = yes | yeah {yes} | "you bet" {!{yes}!} | "oui"!fr-CA {yes};



$no = (no | nope | no way) {no};

The grammar with string literals is equivalent to the grammar with SI Scripts below:

#ABNF 1.0 ;
language en-US;
tag-format <semantics/1.0>;
root $answer;

public $answer = $yes | $no;

$yes = yes | yeah {$="yes";} | "you bet" {!{$="yes";}!} | "oui"!fr-CA {$="yes";};


$no = (no | nope | no way) {$="no";};

3.3. Syntax for Rule Variables

SI Scripts can access Rule Variables using the syntax defined in this section. This syntax applies only to documents for which the SI Tags hold SI Scripts (and not to documents where SI Tags contain String Literals).

Two variant syntaxes are available for working with Rule Variables. Both syntaxes can be used inside SI Scripts.

Informative Note:

The syntax introduced in this version of the working draft has been designed based on feedback on the previous working draft, that the original syntax was overloading the use of the $ sign.

The working group determined it was desireable to maintain both the original syntax next to this new syntax rather than replace it with the new syntax.

Throughout this document examples will alternate between the two variant syntaxes. Both syntaxes can be used in both the XML and ABNF grammar formats.

3.3.1. Syntax for accessing the grammar rule's Rule Variable

Every grammar rule has a single Rule Variable that holds a ES-CP value. This Rule Variable can both be evaluated and assigned to.

It is identified by out or by the dollar sign $.

Properties of the Rule Variable can be individually accessed by out.Identifier or $.Identifier, where Identifier is the name of the property.

Informative Note:

out         identifies the Rule Variable
out.pizza   identifies the pizza property of the Rule Variable

$           identifies the Rule Variable
$.pizza     identifies the pizza property of the Rule Variable

Informative Note:

The Semantic Interpretation Script typically assigns a value to the Rule Variable of its embedding grammar rule. The Rule Variable is initialized to an empty Object before the first tag in the grammar rule is executed (see section 6.3). The SI author will usually either add properties to this Object or alternatively discard it by assigning a primitive value (e.g. String or Number) to the Rule Variable. Since the Rule Variable is initialized before the tag is executed, a var statement is not required prior to assigning to it.

As a consequence of normal ECMAScript behavior, the SI author is free to override the Rule Variable type as well as value within the bounds of legal ECMAScript. Note that ES-CP enforces rules that affect Semantic Interpretation Scripts. For example, ES-CP reserved words cannot be used as a property. Thus, out.for is illegal because it uses the ES-CP reserved word for.

Examples:

out.prop = 'my property'                                           an Object with property name prop
out = 'my value'                                                   a String with value 'my value'
$.prop = 'my property'; $ = 'my value'                             a String with value 'my value'
out = 'my value'; out.prop = 'my property'                         a String with value 'my value'
$.prop1 = 'a'; $.prop2 = 'b'; $ = $.prop1 + $.prop2                a String with value 'ab'
out = 'my value'; out = new Object(); out.prop = 'my property'     an Object with property name prop

3.3.2. Syntax for accessing Rule Variables of referenced grammar rules

SI Scripts can access the Rule Variable associated with grammar rules referenced in SI Tags that appear after (to the right or below) the rule reference in the grammar expansion, and only if the referenced rule was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when Rule Variables associated to rule references can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.

Rule Variables associated to referenced rules can both be evaluated and assigned to.

The Rule Variable associated to a rule reference is identified by rules.Rulename or by $Rulename, where Rulename is the rulename of the rule , as defined in SRGS Section 3.1 Basic Rule Definition.

Individual properties of a Rule Variable can be identified by rules.Rulename.Identifier or by $Rulename.Identifier, where Rulename is the name of the rule and Identifier is the name of the property.

Every SI Script has access to a rules object that has a property holding the Rule Variable value for every visible rule; the property name is the name of the rule to which the Rule Variable is associated.

The Rule Variable for the latest rule reference that was used in the expansion matching the utterance up to the position of the SI Tag can also be referenced through rules.latest() or $$.

In an expression, both the Rule Variables of the current grammar rule and the referenced rules can be evaluated and assigned to.

Special Rules (NULL, VOID, GARBAGE) can not be evaluated.

Informative Note:

The rules.Rulename and $Rulename notations can be used only for explicit local rule references and for explicit references to a named rule of a grammar, not for implicit rule references. (See SRGS Section 2.2 Rule Reference for a definition of explicit and implicit rule references).

To refer to the Rule Variable for a rule that is referenced by an implicit reference to the root rule of a grammar, the rules.latest() or $$ notation can be used.

Examples:


out                           the Rule Variable for the current grammar rule
out.prop               the property prop of the Rule Variable for the current grammar rule

rules.rname            the Rule Variable associated to the referenced rule rname
rules.rname.prop       the property prop of the Rule Variable associated to the referenced rule rname

rules.latest()                the Rule Variable associated to the latest matching rule reference before the SI Tag
rules.latest().prop    the property prop of Rule Variable associated to latest matching 
                              rule reference before the SI Tag



$             the Rule Variable of the for the current grammar rule
$.prop        the property prop of the Rule Variablefor the current grammar rule

$rname        the Rule Variable associated to the referenced rule rname
$rname.prop   the property prop of the Rule Variable 
                     associated to the referenced rule rname


$$                   the Rule Variable associated to the latest matching rule referencebefore the SI Tag
$$.prop       the property prop of Rule Variable associated tolatest matching 
                     rule reference before the SI Tag

Informative Note:

Section 6 describes the visibility rules for accessing Rule Variables. If according to these rules a Rule Variable is not visible, one can still evaluate or declare and assign to the variable with that name (it is then simply behaving as a local variable). The value assigned to a local variable that has the name of a Rule Variable will be overwritten when that Rule Variable is visible according to Section 6. This behavior can be used to "initialize" Rule Variables to handle cases where a referenced rule may not actually be matched depending on the input to the grammar.

Examples:

In the following grammar, by declaring and assigning $foodsize a default value, the value for the "drink" rule will always be

{
  drinksize: "medium"
  type: "coke"
}

regardless of whether the input is 'coke' or 'medium coke':

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0" root="drink">
				 
   <rule id="drink">
       <tag> var $foodsize="medium"; </tag> <-- Note: var required since $foodsize not declared yet -->
       <item repeat="0-1">
           <ruleref uri="#foodsize"/>
       </item>
       <ruleref uri="#kindofdrink"/> 
       <tag> $.drinksize=$foodsize; $.type=$kindofdrink; </tag>
   </rule>

   <rule id="foodsize">
       <one-of>
           <item> small </item>
           <item> medium </item>
           <item> large </item>
       </one-of>
   </rule>

   <rule id="kindofdrink">
     <one-of>
       <item> coke </item>
       <item> pepsi </item>
     </one-of>
   </rule>

</grammar>

3.3.3. Syntax for variables associated with the grammar rule or referenced grammar rules

A Rule Variable's text variable is identified by meta.rulename.text or $rulename$.text, where rulename is the name of the Rule Variable.

The text variable of the Rule Variable referred to by rules.latest() or $$ is identified by meta.latest().text or $$$.text.

The text variable associated to the current grammar rule is identified by meta.current().text or $meta.text. The text variable of the current grammar rule is read-only.

A Rule Variable's score variable is identified by meta.rulename.score or $rulename$.score, where rulename is the name of the Rule Variable.

The score variable of the Rule Variable referred to by rules.latest() or $$ is identified by meta.latest().score or $$$.score.

The score variable associated to the current grammar rule is identified by meta.current().score or $meta.score. The score variable of the current grammar rule is read-only.

Examples:

meta.rname.text  the text variable of the Rule Variable referenced to by rules.rname
meta.latest().text      the text variable of the Rule Variable referenced to by rules.latest()
meta.current().text     the text variable of the current grammar rule (read-only)


$rname$.text  the text variable of the Rule Variable referenced to by $rname
$$$.text      the text variable of the Rule Variable referenced to by $$
$meta.text     the text variable of the current grammar rule (read-only)

Informative Note:

Since the text and score variables of the current grammar are read-only, they behave as read-only properties as defined in ES-CP. As a consequence, attempts to assign to the text or score variable associated to the Rule Variable of the current grammar rule will be ignored.

4. Semantic Interpretation Grammars

4.1. Semantic Interpretation Grammars

This specification defines a Semantic Interpretation Grammar to be a Speech Recognition Grammar as defined by SRGS that

has the tag-format value of "semantics/1.0" or "semantics/1.0-literals"
processes the contents of the tags as specified in this specification
extends the use of the <tag> element to the grammar header for the purpose of setting global variables

4.2. Global Variable Declarations and Initialization

The header of an SRGS grammar may contain one or more global SI Tags. In grammars using the Script tag syntax, these tags are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (section 4.1 of SRGS). Global tags are ignored in Grammars using the String Literal tag syntax.

The SI Tags are evaluated only once, in a global scope that will be shared by all evaluations (see 6.3.)

Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.

XML Form

In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header, before the first rule.

Examples:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0"> root="rule">
				 

  <tag>var x=1;</tag>
  <tag>var y='low{1}';</tag>

  <rule id="rule">. . .</rule>
	
</grammar>

ABNF Form

In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header, before the first rule. Both tag delimiting syntaxes can be used.

Examples:

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0>;
root $rule;

{var x=1;};
{!{var y='low{1}';}!};
$rule = . . .;

5. Default Assignment

For a given parse, if there is no SI Tag attached to the expansion in the grammar rule that is used to match the utterance, then the value for the out Rule Variable is determined as follows. If there are no rule references in the parse, the value for the text meta variable (meta.current().text) is automatically copied into the Rule Variable (which then becomes of type string). Otherwise, the value of the Rule Variable of the last rule reference in the parse (rules.latest()) is automatically copied into the Rule Variable.

Examples:

For the following rule, rules.drink is either "coke", "pepsi" or "coca cola". Similarly for meta.drink.text.

<rule id="drink">
  <one-of>
    <item>coke</item>
    <item>pepsi</item>
    <item>coca cola</item>
  </one-of> 
</rule>

For the following rule, there is an String Literal tag associated with "coca cola" and hence rules.drink is either "coke" or "pepsi". However, meta.drink.text is either "coke", "coca cola", or "pepsi".

<rule id="drink">
  <one-of>
    <item>coke</item>
    <item>pepsi</item>
    <item>coca cola<tag>coke</tag></item>
  </one-of> 
</rule>

For the following grammar, the utterance "I want to fly to Boston" will return the result "BOS".

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0-literals" root="flight">
				 
         
   <rule id="flight" scope="public">
     I want to fly to 
     <ruleref uri="#airports"/>
   </rule>

   <rule id="airports" scope="private">
     <one-of>
       <ruleref uri="#USairport "/>
       <ruleref uri="#otherairport"/>
     </one-of>
   </rule>

   <rule id="USairport" scope="private">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>
	 
   <rule id="otherairport" scope="private">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>
	 
</grammar>

Note that the default assignment has been designed to handle the simplest but most frequent cases only. It can not cope with combining information from different rule references. For example, the grammar below would return the information about the last airport only, not about both airports. For the following grammar, the utterance "I want to fly from Chicago to Boston" will return the result "BOS".

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0-literals" root="flight">
				 

   <rule id="flight" scope="public">
     I want to fly from
     <one-of>
       <item><ruleref uri="#USairport "/></item>
       <item><ruleref uri="#otherairport"/></item>
     </one-of>
     to
     <one-of>
       <item><ruleref uri="#USairport "/></item>
       <item><ruleref uri="#otherairport"/></item>
     </one-of>
   </rule>

   <rule id="USairport" scope="private">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>

   <rule id="otherairport" scope="private">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>

</grammar>

In order to make this grammar return both airports, one would have to add in explicit script tags, as shown below. This functionality can not be achieved by relying only on literal tags and default assignments.

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0" root="flight">
				 

   <rule id="flight" scope="public">
     I want to fly from
     <one-of>
       <item>
         <ruleref uri="http://www.example.com/places.grxml"/>
       </item>
       <item>
         <ruleref uri="http://www.example.com/places.grxml#otherairport"/>
       </item>
     </one-of>
     <tag> out.departure = rules.latest(); </tag>
     to
     <one-of>
       <item>
         <ruleref uri="http://www.example.com/places.grxml"/>
       </item>
       <item>
         <ruleref uri="http://www.example.com/places.grxml#otherairport"/>
       </item>
     </one-of>
     <tag> out.arrival = rules.latest(); </tag>
   </rule>

</grammar>

Grammar http://www.example.com/places.grxml:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0-literals" root="USairport">
				 

   <rule id="USairport" scope="public">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>

   <rule id="otherairport" scope="public">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>
	 
</grammar>

6. Visibility Rules and order of tag evaluation for ABNF/XML Speech Recognition Grammar Format

This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in N-grams), the visibility rules and order of evaluation may be defined differently.

6.1. Logical Parse Structure

The visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H. Logical Parse Structure of the Speech Recognition Grammar Specification.

Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical parse structure is only a means to illustrate the order of evaluation and visibility rules for SI Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.

The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.

The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].

Output entities can be one out of three kinds:

a token, represented as a string holding the literal matching the input to the processor
a tag, represented as a SI Tag in curly braces
a rule reference, represented using the ABNF form for rule references, followed by an array with the output entities generated from that rule reference

Appendix H of the Speech Recognition Grammar Specification contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.

For the purpose of building the logical parse, all String Literals are assumed to be converted into the equivalent SI Script as defined in 3.2.2.

Examples:

The sentence "turn the heating off" on the following XML Form grammar

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0" root="command">
				 
   <rule id="command">
      <one-of>
         <item>set</item>
         <item>turn</item>
      </one-of>
      <ruleref uri="#object"/>
      <ruleref uri="#state"/>
      <tag>$.o=$object; $.s=$state;</tag>
   </rule>

   <rule id="object"> 
      <item repeat="0-1">the</item>
      <one-of>
         <item>
            <one-of>
               <item>heating</item>
               <item>cooling</item>
            </one-of>
            <tag>$="airco";</tag>
          </item>
          <item>radio<tag>$="radio";</tag></item>
          <item>lights<tag>$="lights";</tag></item>  
       </one-of>
   </rule>

   <rule id="state">
      <one-of>
         <item>to</item>
         <item><ruleref special="NULL"/></item>
      </one-of>
      <one-of>
         <item>on<tag>$="1";</tag></item>
         <item>off<tag>$="0";</tag></item>
         <item>warm<tag>$="w";</tag></item>
         <item>cool<tag>$="c";</tag></item>
         <item>cold<tag>$="c";</tag></item>
      </one-of> 
   </rule>

</grammar>

or equivalent ABNF Form grammar

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0>;
root $command;

$command = (set | turn) $object $state {$.o=$object; $.s=$state;};
$object = [the] (heating | cooling){$="airco";} | radio{$="radio";} | lights{$="lights";});
$state = (to|$NULL) (on{$="1";} | off{$="0";} | warm{$="w";} | cool{$="c";} | cold{$="c";});

would result in the logical parse

[$command [turn,
           $object [the,
                    heating,
                    {$="airco";}],
           $state  [off,
                    {$="0";}],
           {$.o=$object; $.s=$state;}]
]

6.2. Flat Parse List

The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses, with one parse for every grammar rule application.

The flat parse for a given rule application is represented as:

the rule name followed by a sequence number in parenthesis and a colon
a list of output entities

The output entities are as in the logical parse structure, except that rule references are represented without an array of output entities but followed by a sequence number in parenthesis.

Examples:

The equivalent flat parse list for the above example is:

$command(1): turn, $object(1), $state(1), {$.o=$object; $.s=$state;}
$object(1): the, heating, {$="airco";}
$state(1): off, {$="0";}

The following example illustrates the use of the sequence number for rules that are applied more than once. Consider the grammar with String Literals, in XML Form:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
         tag-format="semantics/1.0-literals" root="a">
				 
   <rule id="a">
      <item repeat="1-"><ruleref uri="#b"/></item>
      <ruleref uri="#c"/>
      <one-of>
         <item>         
            <item repeat="0-1">t1</item>
            <tag>tag1</tag>
         </item>
         <item>
            <ruleref uri="#d"/>
            <tag>tag2</tag>
         </item>
      </one-of>
   </rule>

   <rule id="b">
      <one-of>
         <item>t2</item>
         <item>t3<tag>tag3</tag></item>
         <item>t4</item>
      </one-of>
   </rule>

   <rule id="c">
      <item repeat="1-2">t5<tag>tag5</tag></item>
   </rule>

   <rule id="d">
      t6 <ruleref uri="#c"/>
   </rule>

</grammar>

or equivalently in ABNF Form:

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0-literals>;
root $a;

$a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2};
$b = t2 | t3 {tag3} | t4;
$c = (t5 {tag5})<1-2>;
$d = t6 $c;

Given the input "t2 t3 t5 t5", the logical parse structure is:

[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]

and the flat parse listis:

$a: $b(1), $b(2), $c(1), {tag1}
$b(1): t2
$b(2): t3, {tag3}
$c(1): t5, {tag5}, t5, {tag5}

6.3. Scoping and Visibility Rules for Script Syntax grammars

These scoping and visibility rules are defined on the basis of the flat parse list as specified in section 6.2.

The Global Scope

Before evaluating any scripts in the flat parse list, a global anonymous ECMAScript scope is created for the grammar. This global scope is initialized by executing the scripts that are in the global tags in the grammar header (see 4.2.).

During evaluation of a script in the flat parse list, the global scope is accessible for reading only.

Informative Note:

Every script has only one global scope associated: the global scope for the grammar in which the script appears. Scripts in referenced rules that are located in a referenced external grammar are thus executed with access to that referenced grammar's global scope, and don't have access to the referencing grammar's global scope.

Order of Tag Execution

The tags within a flat parse are executed in the order in which they appear, left to right. The global tags (in the grammar hearder) are executed in document order.

Scope Chains and Access to Variables

For each flat parse , a new anonymous ECMAScript scope is created that is a direct child of the global scope object for the grammar in which the related rule is defined. The ECMAScript scope chains thus always have the global scope (the scope of the whole parse) as top-level object, and the scope belonging to the parse list as successor.

Access to variables in tag executions are resolved with the scope chain according to the ECMAScript rules. (Cf. to ES 10.1.4)

The variables object according to ES-CP is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule.

Before the first tag in a flat parse is executed, the environment of a new scope is set up in the following way:

The variable $ is initialized as an empty object
The out variable is initialized to a reference to $
$meta.text is initialized (read-only) to the text variable of the current grammar rule
meta.current().text is initialized (read-only) to the text variable of the current grammar rule
$meta.score is initialized (read-only) to the score value related to the current grammar rule
meta.current().score is initialized (read-only) to the score value related to the current grammar rule

When execution of the flat parse is finished, the scope object of this flat parse is removed from the scope chain . The scope belonging to the referencing flat parse is then updated in the following way:

$rulename of the scope of the referencing rule, where rulename is the name of the referenced rule, is set to the value of the variable $ of the child scope.
rules.rulename of the scope of the referencing rule, where rulename is the name of the referenced rule, is set to the value of the variable out of the child scope.
$rulename$.text and meta.rulename.text of the scope of the referencing rule, where rulename is the name of the referenced rule, are set to the concatenation of all terminals within the rule reference.
$rulename$.score and meta.rulename.score of the scope of the referencing rule, where rulename is the name of the referenced rule, are set to score value for the referenced rule.
$$ = $rulename (both variables are in the scope of the referencing rule)
rules.latest() = rules.rulename (both variables are in the scope of the referencing rule)
$$$.text = $rulename$.text (both variables are in the scope of the referencing rule)
meta.latest().text = meta.rulename.text (both variables are in the scope of the referencing rule)
$$$.score = $rulename$.score (both variables are in the scope of the referencing rule)
meta.latest().score = meta.rulename.score (both variables are in the scope of the referencing rule)

When any of these variables already existed, they are overwritten.

Informative Note:

Whether or not the $, $rulename, $rulename$.text and $rulename$.score variables are enumerated when enumerating the scope object is not defined by this specification and may vary over implementations. Authors are discouraged to use enumeration of the scope object.

Note: Assigning literals to a Rule Variable will result in the out or $ variable being updated independently of the other. Mixing the two syntaxes in the same grammar rule is not recommended.

Visibility

The consequences of these scoping rules are:

Within a parse list, results of previously executed rule references that are direct child of this list are available by $rulename and rules.rulename.
If a rule was referenced multiple times in the same scope, the result of the last instantiation is visible.
$$ and rules.latest() always refer to the result of the previous reference in the current scope; $$$.text and meta.latest().text refer to the corresponding text utterance; and $$$.score and meta.latest().score refer to the corresponding score value.

Global Variables

Since the global scope is read only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see 4.2.)

Examples:

The following rule contains two Rule Variables associated with the same rule "city". The XML Form is:

<rule id="fromto">
   from 
   <ruleref uri="#city"/>
   <tag>out.fromcity=rules.city.name;</tag>
   to
   <ruleref uri="#city"/>
   <tag>out.tocity=meta.city.text;</tag>  
</rule>

and the equivalent ABNF Form is:

$fromto = from $city {out.fromcity=rules.city.name;} to $city {out.tocity=meta.city.text;};

To determine which of the Rule Variable instances the tags refer to, we can build the flat parse for $fromto, which is always of the form:

$fromto: from, $city(1), {out.fromcity=rules.city.name;}, to, $city(2), {out.tocity=meta.city.text;}

From this it follows that rules.city.name in the first tag refers to the first Rule Variable rules.city in the rule, and that the reference to meta.city.text in the second tag is to the second Rule Variable named rules.city.

In the following rule, the flat parse is depending on whether the input matches the optional rule "b". The XML Form is:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <tag>out.x=rules.b.x;</tag>
</rule>

and the equivalent ABNF Form is:

$a = $b [$b] {out.x=rules.b.x;};

The two possible flat parses are:

$a: $b(1), {out.x=rules.b.x;}
$a: $b(1), $b(2), {out.x=rules.b.x;}

The reference rules.b.x in the tag will thus refer to either the first or the last rule "b", depending on whether the optional rule "b" was matched in the input.

The SI Tag in the rule below contains a couple of references to Rule Variables that are undefined since there is no Rule Variable with that name before the tag in the flat parse. The XML Form is:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1"><ruleref uri="#c"/></item>
   <tag>out.x=rules.c; out.y=rules.d; out.z=rules.e;</tag>
   <ruleref uri="#e"/>
</rule>

and the equivalent ABNF Form is:

$a = $b [$c] {out.x=rules.c; out.y=rules.d; out.z=rules.e;} $e;

The two possible flat parses are:

$a: $b(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)
$a: $b(1), $c(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)

This means that:

out.x is undefined if rule "c" didn't match in the utterance
out.y is undefined because rule "d" is not in the rule expansion at all
out.z is undefined because rule "e" doesn't appear before the tag

6.4. Order of tag execution for Script Syntax grammars

Within a single SI Tag, the order of evaluation is determined by ES-CP for the evaluation of a valid ES-CP Program (ES 14)

All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see 4.2.).

The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating String Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see 6.2.).

The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI tag using that reference's value is processed.

Examples:

Consider the following rules in XML Form:

<rule id="a">
   <ruleref uri="#b"/>
   <tag>$.y=$b.x;</tag>
   <item repeat="0-1"><ruleref uri="#b"/><tag>$.y=$.y+$b.x;</tag></item>
</rule>

<rule id="b">
   foo
   <tag>$.x=1;</tag>
   <one-of>
      <item>bar<tag>$.x=3;</tag></item>
      <item>
         <item repeat="1-">boo<tag>$.x=$.x+1;</tag></item>
      </item>
   </one-of>
</rule>

or equivalently in ABNF Form:

$a = $b  {$.y=$b.x;} [$b {$.y=$.y+$b.x;}];
$b = foo {$.x=1;} (bar {$.x=3;} | (boo {$.x=$.x+1;})<1->);

For the input "foo boo boo boo", the flat parse lists are:

$a: $b(1), {$.y=$b.x}
$b(1): foo, {$.x=1;}, boo, {$.x=$.x+1;}, boo, {$.x=$.x+1;}, boo, {$.x=$.x+1;}

and $.y evaluates to 4. For the input "foo bar foo boo", the flat parse lists are:

$a: $b(1), {$.y=$b.x;}, $b(2), {$.y=$.y+$b.x;}
$b(1): foo, {$.x=1;}, bar, {$.x=3;}
$b(2): foo, {$.x=1;}, boo, {$.x=$.x+1;}

and $.y evaluates to 5.

6.5. Examples

The rules.b.x and rules.c.x refer to the respective Random Variable properties:

<rule id="a">
   <ruleref uri="#b"/>
   <ruleref uri="#c"/>
   <tag>out.x = rules.b.x + rules.c.x;</tag>
</rule>

The $c.x causes a run-time error because it is used to the left of rule "c":

<rule id="a">
   <ruleref uri="#b"/>
   <tag>$.x = $b.x + $c.x;</tag>
   <ruleref uri="#c"/>
</rule>

The rules.b.x evaluates to the x property of rules.b if rule "b" is matched on the input utterance. Otherwise it causes a run-time error:

<rule id="a">
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <ruleref uri="#c"/>
   <tag>out.x = rules.b.x + rules.c.x;</tag>
</rule>

A safer way to write this rule could be (assuming x is of type number):

<rule id="a">
   <tag>out.x=0;</tag>
   <item repeat="0-1"><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
   <ruleref uri="#c"/>
   <tag>out.x = out.x + rules.c.x;</tag>
</rule>

The rules.b.x evaluates to the last occurrence of rule "b" in the repeat:

<rule id="a">
   <item repeat="1-"><ruleref uri="#b"/></item>
   <ruleref uri="#c"/>
   <tag>$.x=$b.x+$c.x;</tag>
</rule>

If the purpose was to add or concatenate over each occurrence of rules.b, it should be written as:

<rule id="a">
   <item repeat="1-"><ruleref uri="#b"/><tag>$.x=$.x+$b.x;</tag></item>
   <ruleref uri="#c"/>
   <tag>$.x=$.x+$c.x;</tag>
</rule>

The rules.b evaluates to the last occurrence of rules.b in the repeat="0-" expansion, if any - otherwise it is undefined:

<rule id="a">
   <item repeat="0-"><ruleref uri="#b"/><ruleref uri="#d"/></item>
   <ruleref uri="#c"/>
   <tag>out.x=rules.b+rules.c.x;</tag>
</rule>

Either $b.x or $c.x will cause a run-time error depending on the input utterance:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <tag>$.x=$b.x+$c.x;</tag>
</rule>

This could be better written as:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>$.x=$b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>$.x=$c.x;</tag></item>
   </one-of>   
</rule>

The rules.b.x refers to whichever rules.b actually matched:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/> a</item>
      <item>a <ruleref uri="#b"/></item>
   </one-of>   
   <ruleref uri="#c"/>
   <tag>out.x=rules.b.x+rules.c.x;</tag>
</rule>

One of the operands to every addition causes a run-time error here depending on the input utterance:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <one-of>
      <item><ruleref uri="#d"/></item>
      <item><ruleref uri="#e"/></item>
   </one-of>
   <tag>out.x=(rules.b.x+rules.c.x) * (rules.d.x+rules.e.x);</tag>
</rule>

This rule can be better written as:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item>
   </one-of>
   <one-of>
      <item><ruleref uri="#d"/><tag>out.x=out.x*rules.d.x;</tag></item>
      <item><ruleref uri="#e"/><tag>out.x=out.x*rules.e.x;</tag></item>
   </one-of>
</rule>

Evaluation of $b.x always causes a run-time error because the expression will be evaluated only when rule "c" matches, not rule "b". (When rule "b" matches, the default assignment would cause $=$b$.text).

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/><tag>$.x=$b.x+$c.x;</tag></item>
   </one-of>
</rule>

A more useful rule could be:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>$.x=$b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>$.x=$c.x;</tag></item>
   </one-of>
</rule>

The expression is only evaluated if rule "c" matches; in that case both rules.b and rules.c are defined:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1">
      <ruleref uri="#c"/>
      <tag>out.x=rules.b.x+rules.c.x;</tag>
   </item>
</rule>

The expression is evaluated for every occurrence of rule "c". Note that this will actually result in $b.x to be added to $.x for the last occurrence of rule "c" because every evaluation will overwrite the previous result.

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="1-">
      <ruleref uri="#c"/>
      <tag>$.x = $b.x + $c.x;</tag>
   </item>
</rule>

Same effect as previous example except that now the expression is not evaluated if rule "c" did not match once.

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-">
      <ruleref uri="#c"/>
      <tag>out.x = rules.b.x + rules.c.x;</tag>
   </item>
</rule>

These rules do the obvious concatenation of digits. Note that the ds property is first initialized to "" because otherwise in the first evaluation of the expression, ds would be undefined and would cause a run-time error:

<rule id="digits">
   <tag>$.ds="";</tag>
   <item repeat="1-">
      <ruleref uri="#digit"/>
      <tag>$.ds = $.ds + $digit;</tag>
   </item>
</rule>

<rule id="digit">
   <one-of>
      <item>"0"</item>
      <item>"1"</item>
      <item>"2"</item>
      <item>"3"</item>
      <item>"4"</item>
      <item>"5"</item>
      <item>"6"</item>
      <item>"7"</item>
      <item>"8"</item>
      <item>"9"</item>
   </one-of>
</rule>

The rules.latest() resolves to rules.c:

<rule id="a">
   <ruleref uri="#b"/>
   <ruleref uri="#c"/>
   <tag>out=rules.latest();</tag>
</rule>

The $$ resolves to $.b:

<rule id="a">
   <ruleref uri="#c"/>
   <ruleref uri="#b"/>   
   <tag>$=$$;</tag>
</rule>

The rules.latest() cannot be resolved and causes a run-time error:

<rule id="a">
   b c
   <tag>out=rules.latest();</tag>
</rule>

If rule "b" matches, $$ resolves to $.b. If rule "c" matches, $$ resolves to $.c.

<rule id="x">
   <ruleref uri="#a"/>
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <tag>$=$$;</tag>
</rule>

This is equivalent to:

<rule id="x">
   <ruleref uri="#a"/>
   <one-of>
      <item><ruleref uri="#b"/><tag>$=$$;</tag></item>
      <item><ruleref uri="#c"/><tag>$=$$;</tag></item>
   </one-of>
</rule>

The rules.latest() resolves to rules.b, if rule "b" matches, if not, it resolves to rules.a.

<rule id="x">
   <ruleref uri="#a"/>
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <tag>out=rules.latest();</tag>
</rule>

The effect is equivalent to:

<rule id="x">
   <ruleref uri="#a"/><tag>out=rules.latest();</tag>
   <item repeat="0-1"><ruleref uri="#b"/><tag>out=rules.latest();</tag></item>   
</rule>

The $$ resolves to the last occurrence of rules.a:

<rule id="x">
   <item repeat="1-"><ruleref uri="#a"/></item>
   <tag>$=$$;</tag>
</rule>

The effect is equivalent to:

<rule id="x">
   <item repeat="1-"><ruleref uri="#a"/><tag>$=$$;</tag></item>
</rule>

7. Using Semantic Interpretation to generate XML results

Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting (EMMA), the forthcoming W3C specification for the representation of user input.)

If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.

The serialization into XML has been designed as a convenient mechanism to generate XML fragments directly from SI grammars. It has not been designed as a generic conversion mechanism from ES-CP objects into XML fragments. It is not a generic conversion mechanism for at least the following reasons:

Not all valid ECMA names are valid XML Names; invalid XML Names can cause the conversion to fail
ECMA Objects can contain circular references. Handling these is platform specific
Not all information in an ECMA Object is serialized; in particular, Object Type information and "DontEnum" properties are not serialized
The conversion makes use of some reserved names. Using these names in different ways can cause unexpected results.
The conversion is not reversible

7.1. Serialization of ECMAScript result into an XML fragment

The serialization of the ECMAScript result into an XML fragment is constituted by the following general transformations:

If the ECMAScript top-level Rule Variable is not an Object but a simple scalar type (String, Number, Boolean, Null or Undefined) then the resulting XML fragment only consists of character data without any mark-up. The character data will be the value of the top-level Rule Variable as if the ToString() operation had been performed on an argument of this type (e.g., for Boolean, "true" or "false").
Each property (see note below) in the ECMAScript top-level Rule Variable becomes an XML element. The name of the element will be the same as the name of the property.
If the value of the property is a simple scalar type (String, Number, Boolean, Null or Undefined) then the character data content of the XML element will be the value of this property as if the ToString() operation had been performed on an argument of this type (e.g., for Boolean, "true" or "false")
If the property is of type Object, then each child property of this object becomes a child element, and the contents of these child elements are in turn processed.
Indexed elements of an Array object (e.g. a[0], a[1]. etc.) become XML child elements with name 'item'. Each <item> element has an attribute named 'index', which is the index of the corresponding element in the array. In addition, the XML element containing the <item>s includes an attribute named 'length', whose value is given by the length property of the ECMAScript Array object. Any other properties of an Array object, for instance the keys of an associative array (e.g. a["prop"]), are subject to the same transformation rules as the regular properties of an object. In a sparse array, only those elements which hold defined values will be serialized.
Properties with the name _attributes, _value, _nsdecl and _nsprefix will be treated in the ways described below.

Note: Properties which have the "DontEnum" attribute (see ES 8.6.1) are not serialized. This prevents functions and built-in properties from being serialized.

The values of properties of type String may contain special characters such as < and &, which could be erroneously treated as the start of markup by XML processors. An SI processor can use CDATA sections or character escaping to avoid this problem.

It is an error to transform an ECMAScript object into XML, that contains properties with names that are not allowed in XML. This can occur when a property of a Rule Variable has a name that is not a legal name for an XML element.

It is possible for circular references to exist between ECMAScript objects, for example, if an object contains a property that references itself. The handling of circular references is platform specific.

Informative Note:

Following the above principles, to take the top-level Rule Variable with the properties drink and pizza of the example grammar in section 8:

   {
      drink: {
         liquid:"coke"
         drinksize:"medium"}
      pizza: {
         number: "3"
         pizzasize: "large"
         topping: [ "pepperoni" "mushrooms" ]
      }
   }

SI processing in an XML environment would generate the following document:

	
	  <drink>
	    <liquid> coke </liquid> 
	    <drinksize> medium </drinksize>
	  </drink>
	  <pizza>
	    <number> 3 </number>
	    <pizzasize> large </pizzasize>
	    <topping length="2">
	      <item index="0"> pepperoni </item>
	      <item index="1"> mushrooms </item>
	    </topping>
	  </pizza>

The following example ECMAScript object would cause an error because the $size$ property while a valid name in ECMAScript is not a valid name for an XML Element:

  {
    drink: {
       liquid:"coke"
       $size$:"medium"
    }
  }

Informative Note:

As a consequence of these transformation rules, the XML fragment resulting from grammars using the String Literal tag syntax will contain character data corresponding to the top-level Rule Variable string value with no additional elements or attributes.

7.2. Use of _attributes and _value

Variables named _attributes and _value can be created and used by the SI author to enable the generation of richer XML results, including the following structures:

XML elements that contain both elements and character data
XML elements that contain attributes
XML elements containing a mixture of elements, attributes and character data

The _attributes object is used to hold property name/value pairs which will be rendered as XML attributes of the object which contains _attributes.

The _value variable is used to hold a scalar value for character data contained in an element or to hold the value of an attribute.

Semantic Interpretation processors treat these objects in the following way:

properties specified in the _attributes object are rendered as XML attributes of the containing object.
the value of _value is treated as character data content of the containing object or the value of an attribute if the containing object is a child of _attributes.

If the value of _value is not a scalar type, the ToString() operation is performed to generate a string value.

It is an error to transform an ECMAScript object into XML, that contains properties with names that are not allowed in XML. This can occur when a property name in an _attribute has a name that is not a legal name for an XML attribute.

Examples:

The following ECMAScript object:

	{
	  martini: {
	    gin: {
		_value: "Bombay Sapphire"
		_attributes {
		   ratio: 8
		}
	    }
	    vermouth: { 
		_value: "Noilly Prat" 
		_attributes {
		   ratio: 1
	        }
	    }
	    _attributes {
		method: "shaken"
	    }
	}

would generate the following XML result:

	  ...
	      <martini method="shaken">
	        <gin ratio="8"> Bombay Sapphire </gin>
	        <vermouth ratio="1"> Noilly Prat </vermouth>
	      </martini>
	  ...

7.3. Namespaces

The object named _nsdecl is used to declare a namespace (XML Names) in an element. The property named _nsprefix enables the SI author to associate an XML element or attribute with a particular namespace.

When an object contains the _nsdecl property, the namespace declaration is attached to the resultant XML serialized element for this object. The _prefix property of _nsdecl indicates the namespace prefix and the _name property of _nsdecl indicates the corresponding namespace name (usually a URI reference). If the _prefix property is an empty string, the default namespace is declared. If both _prefix and _name are empty strings, the namespace declaration xmlns="" applies.

When an Array object contains the _nsprefix property, the prefix also applies to the automatically generated <item> elements and length and index attributes.

Note that this transformation produces an XML fragment - see XML Names for rules on valid namespace usage in XML.

Examples:

The following ECMAScript object:

  {
    drink: {
        _nsdecl: {
            _prefix:"n1"
            _name:"http://www.example.com/n1"
        }

        _nsprefix:"n1"

        liquid: {
            _nsdecl: {
                _prefix:"n2"
                _name:"http://www.example.com/n2"
            }
            _attributes: {
                color: {
                    _nsprefix:"n2"
                    _value:"black"
                }
            }
            _value:"coke"
        }

        size:"medium"
    }
  }

would generate the following XML result:

    <n1:drink xmlns:n1="http://www.example.com/n1">
        <liquid n2:color="black" xmlns:n2="http://www.example.com/n2"> coke </liquid>
        <size> medium </size>  
    </n1:drink>

Note that the _nsprefix property only applies to its parent object and hence neither the liquid element nor the size element are associated with a namespace in this fragment.

8. Example Grammar with Semantic Interpretation Tags

Example in XML Form:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
                  "http://www.w3.org/TR/speech-grammar/grammar.dtd">

<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         version="1.0" mode="voice" tag-format="semantics/1.0" root="order">
				 
   <rule id="order">
      I would like a
      <ruleref uri="#drink"/>
      <tag> out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize; </tag>
      and
      <ruleref uri="#pizza"/>
      <tag> out.pizza=rules.pizza; </tag>
   </rule>

   <rule id="kindofdrink">
      <one-of>
         <item> coke </item>
         <item> pepsi </item>
         <item> coca cola <tag> out="coke"; </tag> </item>
      </one-of>
   </rule>

   <rule id="foodsize">
      <tag> out="medium"; </tag> <!--  "medium" is default if nothing said -->
      <item repeat="0-1">
         <one-of>
            <item> small </item>
            <item> medium </item>
            <item> large </item>
            <item> regular <tag> out="medium"; </tag></item>
         </one-of>
      </item>
   </rule>

   <!-- Construct Array of toppings, return Array -->
   <rule id="tops"> 
      <tag> out=new Array; </tag>
      <ruleref uri="#top"/> 
      <tag> out.push(rules.top); </tag>
      <item repeat="1-">
         and
         <ruleref uri="#top"/>
         <tag> out.push(rules.top); </tag>
      </item>
   </rule>

   <rule id="top">
      <one-of>
         <item> anchovies </item>
         <item> pepperoni </item>
         <item> mushroom <tag> out="mushrooms"; </tag> </item>
         <item> mushrooms </item>
      </one-of>
   </rule>

   <!-- Two named properties (drinksize and type) on left hand side Rule Variable -->
   <rule id="drink">
      <ruleref uri="#foodsize"/>
      <ruleref uri="#kindofdrink"/> 
      <tag> out.drinksize=rules.foodsize; out.type=rules.kindofdrink; </tag>
   </rule>
 
   <-- Three properties on rules.pizza’s -->
   <rule id="pizza">
      <ruleref uri="#number"/>
      <ruleref uri="#foodsize"/> 
      <tag> out.pizzasize=rules.foodsize; out.number=rules.number; </tag>
      pizzas with
      <ruleref uri="#tops"/>
      <tag> out.topping=rules.tops; </tag>
   </rule>
 
   <rule id="number">
      <one-of>
         <item>
            <tag> out=1; </tag>
            <one-of> 
               <item> a </item>
               <item> one </item>
            </one-of>
         </item>
         <item> two<tag> 2 </tag> </item>
         <item> three<tag> 3 </tag> </item>
      </one-of>
   </rule>

</grammar>

Example in ABNF Form:

#ABNF 1.0 UTF-8;
language en;
mode voice;
tag-format <semantics/1.0>
root $order;

$order = I would like a $drink {$.drink = new Object(); $.drink.liquid = $drink.type;
        $.drink.drinksize = $drink.drinksize;}
   and $pizza {$.pizza=$pizza;};

$kindofdrink = coke | pepsi | "coca cola"{$="coke";};

// "medium" is default if nothing said
$foodsize = [ {$="medium";} | small | medium | large | regular {$="medium";}]; 

// Construct Array of toppings, return Array  
$tops = {$=new Array;} $top {$.push($top);}
  (and $top {$.push($top);})<1-> ;
 
$top = anchovies | pepperoni | mushroom{$="mushrooms";} | mushrooms;

// Two named properties (drinksize and type) on left hand side Rule Variable 
$drink = $foodsize $kindofdrink {$.drinksize=$foodsize; $.type=$kindofdrink; };

// Three properties on rules.pizza's Rule Variable 
$pizza = $number $foodsize {$.pizzasize=$foodsize; $.number=$number;} pizzas
   with $tops {$.topping=$tops;};

$number = (a | one){$="1";} | two{$="2";} | three{$="3";};

On the above grammar, the following utterance

"I would like a coca cola and three large pizzas with pepperoni and mushrooms."

Would create following struct Rule Variable on the rule "order":

{
  drink: {
    liquid:"coke"
    drinksize:"medium"}
  pizza: {
    number: "3"
    pizzasize: "large"
    topping: [ "pepperoni", "mushrooms" ]
  }
}

9. Conformance

9.1.Conforming Semantic Interpretation Tags

A Semantic Interpretation Tag (SI Tag) is a conforming SI Tag if it's content is matching the syntax as defined in the normative sections in this document.

There is no normative restriction on the size of a SI Tag.

9.2. Conforming Grammar Fragments and Grammar Documents with SI Tags

A stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment with SI Tags is conforming if:

the document or grammar fragment is a conforming ABNF or XML document or fragment as defined by the conformance requirements in the Speech Recognition Grammar Specification, and
every tag in the grammar document or fragment is a conforming SI tag
the tag-format for the grammar fragment or document is "semantics/1.0" or "semantics/1.0-literals".

Informative Note:

The Speech Recognition Grammar Specification provides a tag-format declaration that identifies the format of the contents of the tag element in a speech grammar. The tag-format to reference Semantic Interpretation Tags conforming with the present specification is defined here as "semantics/1.0" or "semantics/1.0-literals". Note that the former is the default tag-format in the current Speech Recognition Grammar Specification when no explicit tag-format is specified.

It is expected that future revisions of this specification will use higher version numbers.

Other tag-formats can be used with Speech Recognition Grammars; in this case the tag-format must be explicitly declared and must not begin with "semantics/x.y" (where x and y are any digits).

9.3. Conforming Semantic Interpretation Processors

A Semantic Interpretation Processor is a program that can parse and process Semantic Interpretation Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor or VoiceXML processor).

A Conforming Semantic Interpretation Processor

Must be capable of accepting and processing any conforming Semantic Interpretation Tag as defined in 9.1.
Must execute every conforming Semantic Interpretation Tag according to the normative sections in this specification.
Should inform the hosting environment at the time it evaluates a conforming Semantic Interpretation Tag that causes a runtime error.
Must inform the hosting environment when it encounters a non-conforming Semantic Interpretation Tag. A processor is free to inform the hosting environment of such non-conforming tag any time between loading the non-conforming SI Tag and evaluating the offending language construct in the non-conforming SI Tag. There is no requirement for a processor to continue processing documents containing a non-conforming tag.

Informative Note:

We anticipate that following will be the non-conforming conditions a processor may encounter:

Non-conforming document by developer error (or error in automatic document generation).
Not conforming by use of a proprietary semantic interpretation syntax in the grammar tags.
Not conforming by use of proprietary extensions to SI Tags.

The W3C Voice Browser Working Group has applied to IETF to register MIME types for both the ABNF and XML grammar forms (See Appendix G. Media Types and File Suffix of the Speech Recognition Grammar Specification)

The ABNF MIME type will identify ABNF grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.

Similarly, the XML grammar MIME type will identify XML grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.

A grammar that contains tags in a format other than conforming SI Tags must have an explicit tag format declaration specifying the format (see Speech Recognition Grammar Specification 4.8 Tag Format Declaration). The tag-format for a grammar that contains conforming Semantic Interpretation Tags is "semantics/1.0" (for Script tags) or "semantics/1.0-literals" (for String Literals).

Note: a VoiceXML 2.0 processor will require support for Semantic Interpretation Tags as defined here, but will allow to support other grammar formats or SRGS with other tags in addition (probably identified by other MIME type).

9.4. Conforming ABNF and XML Grammar Processors

An ABNF or XML Grammar Processor is a conforming processor if:

it is a conforming ABNF or XML Grammar Processor as defined in the Speech Recognition Grammar Specification, and
it is a conforming Semantic Interpretation Processor

Acknowledgments

This document was written with the participation of members of the W3C Voice Browser Working Group. The following have significantly contributed to writing this specification:

Paolo Baggia, Loquendo
Dominique Boucher, Locus Dialogue
Dan Burnett, Nuance Communications
Dave Burke, Voxpilot
Sasha Caskey, SpeechWorks International
Andrew Hunt, ScanSoft
Stefan Krause, ScanSoft
Jeff Kusnitz, IBM
Bruce Lucas, IBM
Mitsuru Oshima, General Magic
Stephen Potter, Microsoft
Jan Verhasselt, ScanSoft
Dave Wood, Microsoft

References

[ECMA]: ECMA International - Standardizing Information and Communication Systems
http://www.ecma-international.org/
[ECMA-262]: ECMAScript Language Specification, 3^rd Edition - December 1999, published by ECMA.; http://www.ecma-international.org/publications/standards/Ecma-262.htm
[EMMA Requirements]: Requirements for EMMA (Extensible MultiModal Annotation), W3C Multimodal Interaction Activity.; http://www.w3.org/TR/EMMAreqs/
[EMMA]: EMMA: Extensible MultiModal Annotation markup language, W3C Multimodal Interaction Activity.; http://www.w3.org/TR/emma/
[ES-CP]: ECMA-327 Standard "ECMAScript 3^rd Edition Compact Profile", June 2001, published by ECMA.; http://www.ecma-international.org/publications/standards/Ecma-327.htm
[N-grams]: Stochastic Language Models (N-Gram) Specification, W3C Voice Browser Activity
http://www.w3.org/TR/ngram-spec/
[MMI]: W3C Multimodal Interaction Activity; http://www.w3.org/2002/mmi/
[SRGS]: Speech Recognition Grammar Specification for the W3C Speech Interface Framework, W3C Voice Browser Activity
http://www.w3.org/TR/speech-grammar/
[Voice]: W3C Voice Browser Activity; http://www.w3.org/Voice/
[VoiceXML]: Voice Extensible Markup Language (VoiceXML) Version 2.0, W3C Voice Browser Activity
http://www.w3.org/TR/voicexml20/
[XML Names]: Namespaces in XML
http://www.w3.org/TR/REC-xml-names/