Copyright ©2003-2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification.
Semantic Interpretation may be useful in combination with other specifications, such as the Stochastic Language Models (N-Gram) Specification, but their use with N-grams has not yet been studied.
The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity is defining a data format (EMMA) for representing information contained in user utterances. It is believed that semantic interpretation will be able to produce results that can be included in EMMA.
This document is a public W3C Last Call Working Draft for review by W3C members and other interested parties.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).
This document was produced under the 24 January 2002 Current Patent Practice as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a patent disclosure page relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.
This specification describes the syntax and semantics for semantic interpretation tags in speech recognition grammars, and forms part of the proposals for the W3C Speech Interface Framework. It is intended to be used with Speech Recognition grammars as defined in Speech Recognition Grammar Specification.
This Last Call Working Draft incorporates several changes to the previous working draft. Based on reviewer feedback, "tags" as attribute were dropped, and an alternative syntax has been defined.
This document is for public review, and comments and discussion are welcomed on the public mailing list <www-voice@w3.org>. The archive for the list is accessible online.
The working group's intention is to advance this specification to Candidate Recommendation during the 1st quarter of 2005 (see Work Items of the Voice Browser Activity). Reviewers are encouraged to send their comments on this working draft before 5 December 2004.
This section is informative.
Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.
In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is presented in the user utterance. What is needed is a computer processable representation of the information, the semantic result, more than a natural language transcript.
Semantic Interpretation Tags provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar.
When used with a VoiceXML Processor, it is expected that a Semantic Interpretation Tag Processor will convert the result generated by an SRGS speech grammar processor into an ECMAScript object that can then be processed as specified in the VoiceXML 2.0 specification section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms.
The W3C Multimodal Interaction working group is defining a data format (EMMA) for the representation of information contained in the user's input (a spoken utterance or other forms of input available through the modalities in the interaction). It is expected that Semantic Interpretation for Speech Recognition will be generating results that can be integrated into EMMA.
This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification.
It is possible that Semantic Interpretation Tags as defined here can be used also with the N-Gram Specification, but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.
The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:
While there was no explicit requirements document created for the properties of a semantic interpretation syntax, the working group gradually learned that there are some conflicting desires to be met.
Certainly, the Semantic Interpretation Tags must be easy to use by developers, and it should minimally provide the expressive power that is needed for the majority of applications. ECMAScript (ECMA-262) would meet these requirements.
On the other hand, there are concerns on performance and other implications from using ECMAScript (such as variable scoping, platform access, etc.).
The ECMAScript Compact Profile (ECMA 327) is a strict subset of the third edition of ECMA-262. It has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.
This document normatively references the ECMA-327 Standard "ECMAScript 3rd Edition Compact Profile", June 2001, further referenced as ES-CP.
The ES-CP itself references the ECMA-262 Standard "ECMAScript Language Specification", 3rd Edition - December 1999.
For informative purposes, some text from the ECMA-262 has been copied in this document. Where that is done, unless otherwise specified, such text should be considered informative and the corresponding reference to the ECMA-262 standard is normative.
All sections in this specification are normative, unless otherwise indicated.
Throughout the specification the following abbreviations will be used:
Abbreviation | Description |
ES n | Shorthand notation for ECMA-262 Section number n. |
ES-CP | ECMAScript Compact Profile, see section 2.1. |
SI | Semantic Interpretation. |
This specification uses the notational conventions for Syntactic and Lexical Grammars as given in ES 5.1, and the same Algorithm Conventions as in ES 5.2.
Semantic Interpretation Tags compute semantic values. During the semantic interpretation process, these values can be assigned to variables that are associated with the rules in the grammar. These variables are known as Rule Variables.
Every grammar rule has a single Rule Variable that holds a semantic value. The Rule Variable is typically assigned its value by the SI tags within its grammar rule. SI tags also have access to the Rule Variables of any other rules referenced by the current grammar rule and already processed by that point in the utterance (according to the visibility constraints defined in section 6.). The Rule Variables of other rules are referenced by the name of their grammar rule, as described in section 3.3.1.
Rule Variables can hold semantic values of any type defined in ES-CP. They are not explicitly typed. Rule Variables that have not been assigned a value are not defined. SI authors will typically use scalar types, e.g. string or numeric values, in lower level rules and more structured objects in higher level rules (particularly root rules).
In addition to semantic values, certain other values corresponding to Rule Variables are available during SI processing.
For every Rule Variable there is an associated variable named "text", of type string, which holds the substring (the series of tokens) in the utterance that is governed by the corresponding grammar rule. Text variables are not part of the Rule Variable and can not be modified.
Likewise, for every Rule Variable, there is an associated variable called "score", of type Number, which holds a value that is related to the confidence or probability of the corresponding grammar rule or some similar measure. Higher score values indicate higher confidence or probability over the corresponding grammar rule. Processors that don't compute or don't have access to such values can return a constant value for every score. Score variables are not part of the Rule Variable and can not be modified.
The semantic result for an utterance is the value of the Rule Variable of the root rule when all semantic interpretation evaluations have been completed. For certain result formats (e.g. EMMA), this value is serialized into an XML document according to the description in section 7. It is outside the scope of this specification to define how the semantic result is communicated to the application.
In the context of the W3C Voice Browser architecture, the semantic result will be directly cast into ECMAScript variables in the VoiceXML interpreter (see VoiceXML 2.0 section 3.1.6. Mapping Semantic Interpretation Results to VoiceXML forms).
In the W3C Multimodal architecure, the semantic result is expected to be transformed into EMMA following the mechanism described in section 7.
In other contexts, the mechanism described in section 7. can be used to transform the semantic result into other XML formats.
Score values are highly dependent on the processor's implementation.
In most implementations using speech recognition, scores are likely dependent on factors such as audio channel quality, grammar contents, grammar weights, language, individual speaker characteristics, and others. Scores for a particular word or phrase within a grammar are typically comparable over instances of the same word or phrase over time. Scores for different words in a single grammar are also typically comparable to one another. Scores accross grammars, or scores for words and word sequences, or scores between different processors, are very often not comparable.
It is anticipated that scores will be useful only for annotating the results, not for influencing the results during SI processing.
Note that an SI processor doesn't require a speech recognizer, and thus that the score does not even have to be related to speech recognition.
This specification defines the syntax for the contents of tags in the grammar. There are two different Semantic Interpretation tag syntaxes that can be used. The two different possible values of the tag-format declaration in the grammar define which of the two syntaxes is being used. The different syntaxes only change the processing of tags during Semantic Interpretation, in all other respects the grammar behaves identically.
The "Script" tag syntax, enabled by setting the tag-format to "semantics/1.0", defines the contents of tags to be ECMAScript. Each tag is a valid ES-CP program. Section 3.2.1. describes the processing of this tag syntax in more detail.
The "String Literal" tag syntax, enabled by setting the tag-format to "semantics/1.0-literals", defines the contents of tags to be strings. This syntax does not have the expressive power of a full scripting language, but does provide a way to produce semantic results consisting of simple strings. Section 3.2.2. describes this tag syntax in more detail.
Within one grammar, it is not possible to mix the two tag syntaxes. All tags in one grammar must have the same tag-format. However, it is possible for externally referenced grammars to have a different tag-format to the parent grammar they are referenced from.
Semantic Interpretation Tags
are added in the string content
of the tag
elements in the grammar rule expansion, as described in Section 2.6
Tags of the Speech Recognition Grammar Specification. This
specification further uses the term Semantic Interpretation Tag
(or SI Tag) to refer to such tag.
Below are two example formats of SI Tags in the Speech Recognition Grammar Specification; tag-content represents the content of the tag which can be either a Script or a String Literal.
In the XML grammar format, SI Tags are specified as the
content of the tag
element.
XMLSemanticTag: <tag/> <tag> </tag> <tag> tag-content </tag>
In the ABNF grammar format, SI Tags are enclosed in curly braces or in the three-character sequences '{!{' and '}!}'.
ABNFSemanticTag : {} { tag-content } {!{ tag-content }!}
A Semantic Interpretation Script holds a string that is treated as the source text of a valid ES-CP Program (with Program as defined by ES14).
The environment in which SI tags are embedded may introduce escaped characters, character references or other markup that has to be resolved by the environment. The result after resolution is treated as ES-CP.
It is illegal to make an assignment to a variable that has not been previously declared (either implicitly as is the case for Rule Variables or explicitly by using a var statement). Attempting to assign to an undeclared variable will result in a runtime error.
A tag using the String Literal tag syntax has content that is a sequence of zero or more characters. If the character sequence is not empty, it has to follow either the DoubleStringCharacters or the SingleStringCharacters production of ES 7.8.4
During processing, a tag with a String Literal has the same effect as a script that assigns the content of the tag, as a string literal, to the Rule Variable of the rule the tag is in.
As a consequence, if multiple tags are present in the rule expansion, the Rule Variable is set to the value of the last tag in the expansion. Prior tags are overwritten by the final tag.
A grammar using the Script tag syntax can reference rules of a grammar using the
String Literal syntax. The value of the string literal can be obtained
by the parent rule using the Rule Variable of the referenced rule. The
recognized text of the referenced rule is also available in
the meta.latest().text
and meta.rulename.text
(or the $$$.text
and $rulename$.text
) variables.
A grammar using the String Literal tag syntax can reference rules in other grammars (which can be using either the Script syntax or the String Literal syntax). See section 5. Default Assignment for the way semantic results from a referenced grammar can be used in a grammar with String Literal tag syntax.
The syntax for the XML Form and for the ABNF Form are provided below.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="answer"> <rule id="answer" scope="public"> <one-of> <item> <ruleref uri="#yes"/> </item> <item> <ruleref uri="#no"/> </item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item> <token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui <tag>yes</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>no</tag> </rule> </grammar>The grammar with string literals is equivalent to the grammar with SI Scripts below:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="answer"> <rule id="answer" scope="public"> <one-of> <item> <ruleref uri="#yes"/> </item> <item> <ruleref uri="#no"/> </item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>out="yes";</tag></item> <item> <token>you bet</token><tag>out="yes";</tag></item> <item xml:lang="fr-CA">oui <tag>out="yes";</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>out="no";</tag> </rule> </grammar>
#ABNF 1.0 ; language en-US; tag-format <semantics/1.0-literals>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {yes} | "you bet" {!{yes}!} | "oui"!fr-CA {yes}; $no = (no | nope | no way) {no};The grammar with string literals is equivalent to the grammar with SI Scripts below:
#ABNF 1.0 ; language en-US; tag-format <semantics/1.0>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {$="yes";} | "you bet" {!{$="yes";}!} | "oui"!fr-CA {$="yes";}; $no = (no | nope | no way) {$="no";};
SI Scripts can access Rule Variables using the syntax defined in this section. This syntax applies only to documents for which the SI Tags hold SI Scripts (and not to documents where SI Tags contain String Literals).
Two variant syntaxes are available for working with Rule Variables. Both syntaxes can be used inside SI Scripts.
The syntax introduced in this version of the working draft has been designed based on feedback on the previous working draft, that the original syntax was overloading the use of the $ sign.
The working group determined it was desireable to maintain both the original syntax next to this new syntax rather than replace it with the new syntax.
Throughout this document examples will alternate between the two variant syntaxes. Both syntaxes can be used in both the XML and ABNF grammar formats.
Every grammar rule has a single Rule Variable that holds a ES-CP value. This Rule Variable can both be evaluated and assigned to.
It is identified by out
or by the dollar sign $
.
Properties of the Rule Variable can
be individually accessed by out.Identifier
or
$.Identifier
, where
Identifier is the name of the property.
out identifies the Rule Variable out.pizza identifies the pizza property of the Rule Variable $ identifies the Rule Variable $.pizza identifies the pizza property of the Rule Variable
The Semantic Interpretation Script typically assigns a value to the Rule Variable of its embedding grammar rule. The Rule Variable is initialized to an empty Object before the first tag in the grammar rule is executed (see section 6.3). The SI author will usually either add properties to this Object or alternatively discard it by assigning a primitive value (e.g. String or Number) to the Rule Variable. Since the Rule Variable is initialized before the tag is executed, a var statement is not required prior to assigning to it.
As a consequence of normal ECMAScript behavior, the SI author is free to
override the Rule Variable type as well as value within the bounds of legal
ECMAScript. Note that ES-CP enforces rules that affect Semantic
Interpretation Scripts. For example, ES-CP reserved words cannot be used as
a property. Thus, out.for
is illegal because it uses the ES-CP
reserved word for
.
out.prop = 'my property' an Object with property name prop out = 'my value' a String with value 'my value' $.prop = 'my property'; $ = 'my value' a String with value 'my value' out = 'my value'; out.prop = 'my property' a String with value 'my value' $.prop1 = 'a'; $.prop2 = 'b'; $ = $.prop1 + $.prop2 a String with value 'ab' out = 'my value'; out = new Object(); out.prop = 'my property' an Object with property name prop
SI Scripts can access the Rule Variable associated with grammar rules referenced in SI Tags that appear after (to the right or below) the rule reference in the grammar expansion, and only if the referenced rule was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when Rule Variables associated to rule references can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.
Rule Variables associated to referenced rules can both be evaluated and assigned to.
The
Rule Variable
associated to a rule reference is identified by
rules.Rulename
or by
$Rulename
,
where Rulename is the rulename of the rule
, as defined in
SRGS Section 3.1 Basic Rule Definition.
Individual properties of a
Rule Variable
can be identified by
rules.Rulename.Identifier
or by
$Rulename.Identifier
, where
Rulename is the name of the rule and
Identifier is the name of the property.
Every SI Script has access to a rules
object that
has a property holding the Rule Variable value for every visible rule;
the property name is the name of the rule to which the Rule Variable is associated.
The Rule Variable for
the latest rule reference
that was used in the expansion matching the
utterance up to the position of the SI Tag
can also be referenced through
rules.latest()
or
$$
.
In an expression, both the Rule Variables of the current grammar rule and the referenced rules can be evaluated and assigned to.
Special Rules (NULL, VOID, GARBAGE) can not be evaluated.
The rules.Rulename
and
$Rulename
notations can be used only for
explicit local rule references
and for
explicit references to a named rule of a grammar, not for implicit rule references. (See
SRGS Section 2.2 Rule Reference for a definition
of explicit and implicit rule references).
To refer to the Rule Variable for a rule that is referenced by an
implicit reference to the root rule of a grammar,
the rules.latest()
or
$$
notation can be used.
out the Rule Variable for the current grammar rule out.prop the property prop of the Rule Variable for the current grammar rule rules.rname the Rule Variable associated to the referenced rule rname rules.rname.prop the property prop of the Rule Variable associated to the referenced rule rname rules.latest() the Rule Variable associated to the latest matching rule reference before the SI Tag rules.latest().prop the property prop of Rule Variable associated to latest matching rule reference before the SI Tag $ the Rule Variable of the for the current grammar rule $.prop the property prop of the Rule Variablefor the current grammar rule $rname the Rule Variable associated to the referenced rule rname $rname.prop the property prop of the Rule Variable associated to the referenced rule rname $$ the Rule Variable associated to the latest matching rule referencebefore the SI Tag $$.prop the property prop of Rule Variable associated tolatest matching rule reference before the SI Tag
Section 6 describes the visibility rules for accessing Rule Variables. If according to these rules a Rule Variable is not visible, one can still evaluate or declare and assign to the variable with that name (it is then simply behaving as a local variable). The value assigned to a local variable that has the name of a Rule Variable will be overwritten when that Rule Variable is visible according to Section 6. This behavior can be used to "initialize" Rule Variables to handle cases where a referenced rule may not actually be matched depending on the input to the grammar.
{ drinksize: "medium" type: "coke" }regardless of whether the input is 'coke' or 'medium coke':
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="drink"> <rule id="drink"> <tag> var $foodsize="medium"; </tag> <-- Note: var required since $foodsize not declared yet --> <item repeat="0-1"> <ruleref uri="#foodsize"/> </item> <ruleref uri="#kindofdrink"/> <tag> $.drinksize=$foodsize; $.type=$kindofdrink; </tag> </rule> <rule id="foodsize"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule> <rule id="kindofdrink"> <one-of> <item> coke </item> <item> pepsi </item> </one-of> </rule> </grammar>
A Rule Variable's text variable is identified by
meta.rulename.text
or
$rulename$.text
, where rulename is
the name of the Rule Variable.
rules.latest()
or
$$
is identified by meta.latest().text
or $$$.text
.
The text variable associated to the current grammar rule is identified by
meta.current().text
or
$meta.text
. The text variable of the current grammar rule is read-only.
A Rule Variable's score variable is identified by
meta.rulename.score
or
$rulename$.score
, where rulename is
the name of the Rule Variable.
rules.latest()
or
$$
is identified by meta.latest().score
or $$$.score
.
The score variable associated to the current grammar rule is identified by
meta.current().score
or
$meta.score
.
The score variable of the current grammar rule is read-only.
meta.rname.text the text variable of the Rule Variable referenced to by rules.rname meta.latest().text the text variable of the Rule Variable referenced to by rules.latest() meta.current().text the text variable of the current grammar rule (read-only) $rname$.text the text variable of the Rule Variable referenced to by $rname $$$.text the text variable of the Rule Variable referenced to by $$ $meta.text the text variable of the current grammar rule (read-only)
Since the text and score variables of the current grammar are read-only, they behave as read-only properties as defined in ES-CP. As a consequence, attempts to assign to the text or score variable associated to the Rule Variable of the current grammar rule will be ignored.
The header of an SRGS grammar may contain one or more global SI Tags. In grammars using the Script tag syntax, these tags are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (section 4.1 of SRGS). Global tags are ignored in Grammars using the String Literal tag syntax.
The SI Tags are evaluated only once, in a global scope that will be shared by all evaluations (see 6.3.)
Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.
In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header, before the first rule.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0"> root="rule"> <tag>var x=1;</tag> <tag>var y='low{1}';</tag> <rule id="rule">. . .</rule> </grammar>
In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header, before the first rule. Both tag delimiting syntaxes can be used.
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $rule; {var x=1;}; {!{var y='low{1}';}!}; $rule = . . .;
For a given parse, if there is
no SI Tag attached to
the expansion in the grammar rule that
is used to
match the utterance, then the value for the
out
Rule Variable is determined as follows.
If there are no rule references in the parse, the value for the
text meta variable
(meta.current().text
)
is automatically copied into the
Rule Variable (which
then becomes of type string).
Otherwise, the value of the Rule Variable of the last rule
reference in the parse (rules.latest()
) is automatically copied into the
Rule Variable.
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola</item> </one-of> </rule>For the following rule, there is an String Literal tag associated with "coca cola" and hence
rules.drink
is either "coke" or "pepsi". However, meta.drink.text
is either "coke", "coca cola", or "pepsi".
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>coke</tag></item> </one-of> </rule>
For the following grammar, the utterance "I want to fly to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly to <ruleref uri="#airports"/> </rule> <rule id="airports" scope="private"> <one-of> <ruleref uri="#USairport "/> <ruleref uri="#otherairport"/> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
Note that the default assignment has been designed to handle the simplest but most frequent cases only. It can not cope with combining information from different rule references. For example, the grammar below would return the information about the last airport only, not about both airports. For the following grammar, the utterance "I want to fly from Chicago to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> to <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
In order to make this grammar return both airports, one would have to add in explicit script tags, as shown below. This functionality can not be achieved by relying only on literal tags and default assignments.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag> out.departure = rules.latest(); </tag> to <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag> out.arrival = rules.latest(); </tag> </rule> </grammar>Grammar http://www.example.com/places.grxml:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="USairport"> <rule id="USairport" scope="public"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="public"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in N-grams), the visibility rules and order of evaluation may be defined differently.
The visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H. Logical Parse Structure of the Speech Recognition Grammar Specification.
Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical parse structure is only a means to illustrate the order of evaluation and visibility rules for SI Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.
The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.
The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].
Output entities can be one out of three kinds:
Appendix H of the Speech Recognition Grammar Specification contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.
For the purpose of building the logical parse, all String Literals are assumed to be converted into the equivalent SI Script as defined in 3.2.2.
The sentence "turn the heating off" on the following XML Form grammar
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="command"> <rule id="command"> <one-of> <item>set</item> <item>turn</item> </one-of> <ruleref uri="#object"/> <ruleref uri="#state"/> <tag>$.o=$object; $.s=$state;</tag> </rule> <rule id="object"> <item repeat="0-1">the</item> <one-of> <item> <one-of> <item>heating</item> <item>cooling</item> </one-of> <tag>$="airco";</tag> </item> <item>radio<tag>$="radio";</tag></item> <item>lights<tag>$="lights";</tag></item> </one-of> </rule> <rule id="state"> <one-of> <item>to</item> <item><ruleref special="NULL"/></item> </one-of> <one-of> <item>on<tag>$="1";</tag></item> <item>off<tag>$="0";</tag></item> <item>warm<tag>$="w";</tag></item> <item>cool<tag>$="c";</tag></item> <item>cold<tag>$="c";</tag></item> </one-of> </rule> </grammar>or equivalent ABNF Form grammar
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $command; $command = (set | turn) $object $state {$.o=$object; $.s=$state;}; $object = [the] (heating | cooling){$="airco";} | radio{$="radio";} | lights{$="lights";}); $state = (to|$NULL) (on{$="1";} | off{$="0";} | warm{$="w";} | cool{$="c";} | cold{$="c";});
would result in the logical parse
[$command [turn, $object [the, heating, {$="airco";}], $state [off, {$="0";}], {$.o=$object; $.s=$state;}] ]
The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses, with one parse for every grammar rule application.
The flat parse for a given rule application is represented as:
The output entities are as in the logical parse structure, except that rule references are represented without an array of output entities but followed by a sequence number in parenthesis.
The equivalent flat parse list for the above example is:
$command(1): turn, $object(1), $state(1), {$.o=$object; $.s=$state;} $object(1): the, heating, {$="airco";} $state(1): off, {$="0";}
The following example illustrates the use of the sequence number for rules that are applied more than once. Consider the grammar with String Literals, in XML Form:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="a"> <rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <one-of> <item> <item repeat="0-1">t1</item> <tag>tag1</tag> </item> <item> <ruleref uri="#d"/> <tag>tag2</tag> </item> </one-of> </rule> <rule id="b"> <one-of> <item>t2</item> <item>t3<tag>tag3</tag></item> <item>t4</item> </one-of> </rule> <rule id="c"> <item repeat="1-2">t5<tag>tag5</tag></item> </rule> <rule id="d"> t6 <ruleref uri="#c"/> </rule> </grammar>or equivalently in ABNF Form:
#ABNF 1.0; language en-US; tag-format <semantics/1.0-literals>; root $a; $a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2}; $b = t2 | t3 {tag3} | t4; $c = (t5 {tag5})<1-2>; $d = t6 $c;
Given the input "t2 t3 t5 t5", the logical parse structure is:
[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]
and the flat parse listis:
$a: $b(1), $b(2), $c(1), {tag1} $b(1): t2 $b(2): t3, {tag3} $c(1): t5, {tag5}, t5, {tag5}
Before evaluating any scripts in the flat parse list, a global anonymous ECMAScript scope is created for the grammar. This global scope is initialized by executing the scripts that are in the global tags in the grammar header (see 4.2.).
During evaluation of a script in the flat parse list, the global scope is accessible for reading only.
Every script has only one global scope associated: the global scope for the grammar in which the script appears. Scripts in referenced rules that are located in a referenced external grammar are thus executed with access to that referenced grammar's global scope, and don't have access to the referencing grammar's global scope.
For each flat parse , a new anonymous ECMAScript scope is created that is a direct child of the global scope object for the grammar in which the related rule is defined. The ECMAScript scope chains thus always have the global scope (the scope of the whole parse) as top-level object, and the scope belonging to the parse list as successor.
Access to variables in tag executions are resolved with the scope chain according to the ECMAScript rules. (Cf. to ES 10.1.4)
The variables object according to ES-CP is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule.
Before the first tag in a flat parse is executed, the environment of a new scope is set up in the following way:
$
is initialized as an empty object out
variable is initialized to a reference to $
$meta.text
is initialized (read-only) to the text variable of the current grammar rulemeta.current().text
is initialized (read-only) to the text variable of the current grammar rule$meta.score
is initialized (read-only) to the score value related to the current grammar rulemeta.current().score
is initialized (read-only) to the score value related to the current grammar ruleWhen execution of the flat parse is finished, the scope object of this flat parse is removed from the scope chain . The scope belonging to the referencing flat parse is then updated in the following way:
$rulename
of the scope of the referencing rule,
where rulename is the name of the
referenced rule, is set to the value of the variable $
of the child scope.rules.rulename
of the scope of the referencing rule,
where rulename is the name of the
referenced rule, is set to the value of the variable out
of the child scope.$rulename$.text
and meta.rulename.text
of the scope of the
referencing rule, where rulename
is the name of the referenced rule, are set to the concatenation of all terminals within the rule reference.$rulename$.score
and meta.rulename.score
of the scope of the
referencing rule, where rulename
is the name of the referenced rule, are set to score value for the referenced rule.$$
= $rulename
(both variables are in the scope of the referencing rule)
rules.latest()
= rules.rulename
(both variables are in the scope of the referencing rule)
$$$.text
= $rulename$.text
(both variables are in the scope of the referencing rule)
meta.latest().text
= meta.rulename.text
(both variables are in the scope of the referencing rule)
$$$.score
= $rulename$.score
(both variables are in the scope of the referencing rule)
meta.latest().score
= meta.rulename.score
(both variables are in the scope of the referencing rule)
Whether or not the $
, $rulename
,
$rulename$.text
and $rulename$.score
variables
are enumerated when enumerating the scope object is not defined by this specification
and may vary over implementations.
Authors are discouraged to use enumeration of the scope object.
Note: Assigning literals to a Rule Variable will result in the out
or $
variable being updated independently of the other. Mixing the two syntaxes in the
same grammar rule is not recommended.
$rulename
and rules.rulename
.$$
and rules.latest()
always refer to the result of the previous
reference in the current scope; $$$.text
and meta.latest().text
refer to the corresponding text utterance; and $$$.score
and meta.latest().score
refer to the corresponding score value.Since the global scope is read only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see 4.2.)
The following rule contains two Rule Variables associated with the same rule "city". The XML Form is:
<rule id="fromto"> from <ruleref uri="#city"/> <tag>out.fromcity=rules.city.name;</tag> to <ruleref uri="#city"/> <tag>out.tocity=meta.city.text;</tag> </rule>and the equivalent ABNF Form is:
$fromto = from $city {out.fromcity=rules.city.name;} to $city {out.tocity=meta.city.text;};
To determine which of the Rule Variable instances the tags refer to, we
can build the flat parse for $fromto
, which is always of the
form:
$fromto: from, $city(1), {out.fromcity=rules.city.name;}, to, $city(2), {out.tocity=meta.city.text;}
From this it follows that rules.city.name
in the first tag refers
to the first Rule Variable rules.city
in the rule, and that the reference to
meta.city.text
in the second tag is to the second Rule Variable named
rules.city
.
In the following rule, the flat parse is depending on whether the input matches the optional rule "b". The XML Form is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out.x=rules.b.x;</tag> </rule>and the equivalent ABNF Form is:
$a = $b [$b] {out.x=rules.b.x;};
The two possible flat parses are:
$a: $b(1), {out.x=rules.b.x;} $a: $b(1), $b(2), {out.x=rules.b.x;}
The reference rules.b.x
in the tag will thus refer to either the
first or the last rule "b", depending on whether the
optional rule "b" was matched in the input.
The SI Tag in the rule below contains a couple of references to Rule Variables that are undefined since there is no Rule Variable with that name before the tag in the flat parse. The XML Form is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#c"/></item> <tag>out.x=rules.c; out.y=rules.d; out.z=rules.e;</tag> <ruleref uri="#e"/> </rule>and the equivalent ABNF Form is:
$a = $b [$c] {out.x=rules.c; out.y=rules.d; out.z=rules.e;} $e;
The two possible flat parses are:
$a: $b(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1) $a: $b(1), $c(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)
This means that:
out.x
is undefined if rule "c" didn't match in the utterance
out.y
is undefined because rule "d" is not in the rule expansion at all
out.z
is undefined because rule "e" doesn't appear before the tag
Within a single SI Tag, the order of evaluation is determined by ES-CP for the evaluation of a valid ES-CP Program (ES 14)
All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see 4.2.).
The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating String Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see 6.2.).
The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI tag using that reference's value is processed.
Consider the following rules in XML Form:
<rule id="a"> <ruleref uri="#b"/> <tag>$.y=$b.x;</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>$.y=$.y+$b.x;</tag></item> </rule> <rule id="b"> foo <tag>$.x=1;</tag> <one-of> <item>bar<tag>$.x=3;</tag></item> <item> <item repeat="1-">boo<tag>$.x=$.x+1;</tag></item> </item> </one-of> </rule>or equivalently in ABNF Form:
$a = $b {$.y=$b.x;} [$b {$.y=$.y+$b.x;}]; $b = foo {$.x=1;} (bar {$.x=3;} | (boo {$.x=$.x+1;})<1->);For the input "foo boo boo boo", the flat parse lists are:
$a: $b(1), {$.y=$b.x} $b(1): foo, {$.x=1;}, boo, {$.x=$.x+1;}, boo, {$.x=$.x+1;}, boo, {$.x=$.x+1;}and $.y evaluates to 4. For the input "foo bar foo boo", the flat parse lists are:
$a: $b(1), {$.y=$b.x;}, $b(2), {$.y=$.y+$b.x;} $b(1): foo, {$.x=1;}, bar, {$.x=3;} $b(2): foo, {$.x=1;}, boo, {$.x=$.x+1;}and $.y evaluates to 5.
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>The $c.x causes a run-time error because it is used to the left of rule "c":
<rule id="a"> <ruleref uri="#b"/> <tag>$.x = $b.x + $c.x;</tag> <ruleref uri="#c"/> </rule>The rules.b.x evaluates to the x property of rules.b if rule "b" is matched on the input utterance. Otherwise it causes a run-time error:
<rule id="a"> <item repeat="0-1"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>A safer way to write this rule could be (assuming x is of type number):
<rule id="a"> <tag>out.x=0;</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <ruleref uri="#c"/> <tag>out.x = out.x + rules.c.x;</tag> </rule>The rules.b.x evaluates to the last occurrence of rule "b" in the repeat:
<rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>$.x=$b.x+$c.x;</tag> </rule>If the purpose was to add or concatenate over each occurrence of rules.b, it should be written as:
<rule id="a"> <item repeat="1-"><ruleref uri="#b"/><tag>$.x=$.x+$b.x;</tag></item> <ruleref uri="#c"/> <tag>$.x=$.x+$c.x;</tag> </rule>The rules.b evaluates to the last occurrence of rules.b in the repeat="0-" expansion, if any - otherwise it is undefined:
<rule id="a"> <item repeat="0-"><ruleref uri="#b"/><ruleref uri="#d"/></item> <ruleref uri="#c"/> <tag>out.x=rules.b+rules.c.x;</tag> </rule>Either $b.x or $c.x will cause a run-time error depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>$.x=$b.x+$c.x;</tag> </rule>This could be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>$.x=$b.x;</tag></item> <item><ruleref uri="#c"/><tag>$.x=$c.x;</tag></item> </one-of> </rule>The rules.b.x refers to whichever rules.b actually matched:
<rule id="a"> <one-of> <item><ruleref uri="#b"/> a</item> <item>a <ruleref uri="#b"/></item> </one-of> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>One of the operands to every addition causes a run-time error here depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <one-of> <item><ruleref uri="#d"/></item> <item><ruleref uri="#e"/></item> </one-of> <tag>out.x=(rules.b.x+rules.c.x) * (rules.d.x+rules.e.x);</tag> </rule>This rule can be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> <one-of> <item><ruleref uri="#d"/><tag>out.x=out.x*rules.d.x;</tag></item> <item><ruleref uri="#e"/><tag>out.x=out.x*rules.e.x;</tag></item> </one-of> </rule>Evaluation of $b.x always causes a run-time error because the expression will be evaluated only when rule "c" matches, not rule "b". (When rule "b" matches, the default assignment would cause $=$b$.text).
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/><tag>$.x=$b.x+$c.x;</tag></item> </one-of> </rule>A more useful rule could be:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>$.x=$b.x;</tag></item> <item><ruleref uri="#c"/><tag>$.x=$c.x;</tag></item> </one-of> </rule>The expression is only evaluated if rule "c" matches; in that case both rules.b and rules.c are defined:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </item> </rule>The expression is evaluated for every occurrence of rule "c". Note that this will actually result in $b.x to be added to $.x for the last occurrence of rule "c" because every evaluation will overwrite the previous result.
<rule id="a"> <ruleref uri="#b"/> <item repeat="1-"> <ruleref uri="#c"/> <tag>$.x = $b.x + $c.x;</tag> </item> </rule>Same effect as previous example except that now the expression is not evaluated if rule "c" did not match once.
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-"> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </item> </rule>These rules do the obvious concatenation of digits. Note that the ds property is first initialized to "" because otherwise in the first evaluation of the expression, ds would be undefined and would cause a run-time error:
<rule id="digits"> <tag>$.ds="";</tag> <item repeat="1-"> <ruleref uri="#digit"/> <tag>$.ds = $.ds + $digit;</tag> </item> </rule> <rule id="digit"> <one-of> <item>"0"</item> <item>"1"</item> <item>"2"</item> <item>"3"</item> <item>"4"</item> <item>"5"</item> <item>"6"</item> <item>"7"</item> <item>"8"</item> <item>"9"</item> </one-of> </rule>The rules.latest() resolves to rules.c:
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out=rules.latest();</tag> </rule>The $$ resolves to $.b:
<rule id="a"> <ruleref uri="#c"/> <ruleref uri="#b"/> <tag>$=$$;</tag> </rule>The rules.latest() cannot be resolved and causes a run-time error:
<rule id="a"> b c <tag>out=rules.latest();</tag> </rule>If rule "b" matches, $$ resolves to $.b. If rule "c" matches, $$ resolves to $.c.
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>$=$$;</tag> </rule>This is equivalent to:
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/><tag>$=$$;</tag></item> <item><ruleref uri="#c"/><tag>$=$$;</tag></item> </one-of> </rule>The rules.latest() resolves to rules.b, if rule "b" matches, if not, it resolves to rules.a.
<rule id="x"> <ruleref uri="#a"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out=rules.latest();</tag> </rule>The effect is equivalent to:
<rule id="x"> <ruleref uri="#a"/><tag>out=rules.latest();</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out=rules.latest();</tag></item> </rule>The $$ resolves to the last occurrence of rules.a:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/></item> <tag>$=$$;</tag> </rule>The effect is equivalent to:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/><tag>$=$$;</tag></item> </rule>
Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting (EMMA), the forthcoming W3C specification for the representation of user input.)
If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.
The serialization into XML has been designed as a convenient mechanism to generate XML fragments directly from SI grammars. It has not been designed as a generic conversion mechanism from ES-CP objects into XML fragments. It is not a generic conversion mechanism for at least the following reasons:
The serialization of the ECMAScript result into an XML fragment is constituted by the following general transformations:
Note: Properties which have the "DontEnum" attribute (see ES 8.6.1) are not serialized. This prevents functions and built-in properties from being serialized.
The values of properties of type String may contain special characters such as < and &, which could be erroneously treated as the start of markup by XML processors. An SI processor can use CDATA sections or character escaping to avoid this problem.
It is an error to transform an ECMAScript object into XML, that contains properties with names that are not allowed in XML. This can occur when a property of a Rule Variable has a name that is not a legal name for an XML element.
It is possible for circular references to exist between ECMAScript objects, for example, if an object contains a property that references itself. The handling of circular references is platform specific.
Following the above principles, to take the top-level Rule Variable with the properties drink and pizza of the example grammar in section 8:
{ drink: { liquid:"coke" drinksize:"medium"} pizza: { number: "3" pizzasize: "large" topping: [ "pepperoni" "mushrooms" ] } }SI processing in an XML environment would generate the following document:
<drink> <liquid> coke </liquid> <drinksize> medium </drinksize> </drink> <pizza> <number> 3 </number> <pizzasize> large </pizzasize> <topping length="2"> <item index="0"> pepperoni </item> <item index="1"> mushrooms </item> </topping> </pizza>The following example ECMAScript object would cause an error because the $size$ property while a valid name in ECMAScript is not a valid name for an XML Element:
{ drink: { liquid:"coke" $size$:"medium" } }
Variables named _attributes and _value can be created and used by the SI author to enable the generation of richer XML results, including the following structures:
The _attributes object is used to hold property name/value pairs which will be rendered as XML attributes of the object which contains _attributes.
The _value variable is used to hold a scalar value for character data contained in an element or to hold the value of an attribute.
Semantic Interpretation processors treat these objects in the following way:
If the value of _value is not a scalar type, the ToString() operation is performed to generate a string value.
It is an error to transform an ECMAScript object into XML, that contains properties with names that are not allowed in XML. This can occur when a property name in an _attribute has a name that is not a legal name for an XML attribute.{ martini: { gin: { _value: "Bombay Sapphire" _attributes { ratio: 8 } } vermouth: { _value: "Noilly Prat" _attributes { ratio: 1 } } _attributes { method: "shaken" } }would generate the following XML result:
... <martini method="shaken"> <gin ratio="8"> Bombay Sapphire </gin> <vermouth ratio="1"> Noilly Prat </vermouth> </martini> ...
The object named _nsdecl is used to declare a namespace (XML Names) in an element. The property named _nsprefix enables the SI author to associate an XML element or attribute with a particular namespace.
When an object contains the _nsdecl property, the namespace declaration is attached to
the resultant XML serialized element for this object. The _prefix
property of _nsdecl indicates the namespace prefix and the _name property
of _nsdecl indicates the corresponding namespace name (usually a URI reference).
If the _prefix property is an empty string, the default namespace is declared. If both
_prefix and _name are empty strings, the namespace declaration xmlns=""
applies.
When an Array object contains the _nsprefix property, the prefix also applies to the
automatically generated <item>
elements and length
and index
attributes.
Note that this transformation produces an XML fragment - see XML Names for rules on valid namespace usage in XML.
{ drink: { _nsdecl: { _prefix:"n1" _name:"http://www.example.com/n1" } _nsprefix:"n1" liquid: { _nsdecl: { _prefix:"n2" _name:"http://www.example.com/n2" } _attributes: { color: { _nsprefix:"n2" _value:"black" } } _value:"coke" } size:"medium" } }would generate the following XML result:
<n1:drink xmlns:n1="http://www.example.com/n1"> <liquid n2:color="black" xmlns:n2="http://www.example.com/n2"> coke </liquid> <size> medium </size> </n1:drink>Note that the _nsprefix property only applies to its parent object and hence neither the liquid element nor the size element are associated with a namespace in this fragment.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" tag-format="semantics/1.0" root="order"> <rule id="order"> I would like a <ruleref uri="#drink"/> <tag> out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize; </tag> and <ruleref uri="#pizza"/> <tag> out.pizza=rules.pizza; </tag> </rule> <rule id="kindofdrink"> <one-of> <item> coke </item> <item> pepsi </item> <item> coca cola <tag> out="coke"; </tag> </item> </one-of> </rule> <rule id="foodsize"> <tag> out="medium"; </tag> <!-- "medium" is default if nothing said --> <item repeat="0-1"> <one-of> <item> small </item> <item> medium </item> <item> large </item> <item> regular <tag> out="medium"; </tag></item> </one-of> </item> </rule> <!-- Construct Array of toppings, return Array --> <rule id="tops"> <tag> out=new Array; </tag> <ruleref uri="#top"/> <tag> out.push(rules.top); </tag> <item repeat="1-"> and <ruleref uri="#top"/> <tag> out.push(rules.top); </tag> </item> </rule> <rule id="top"> <one-of> <item> anchovies </item> <item> pepperoni </item> <item> mushroom <tag> out="mushrooms"; </tag> </item> <item> mushrooms </item> </one-of> </rule> <!-- Two named properties (drinksize and type) on left hand side Rule Variable --> <rule id="drink"> <ruleref uri="#foodsize"/> <ruleref uri="#kindofdrink"/> <tag> out.drinksize=rules.foodsize; out.type=rules.kindofdrink; </tag> </rule> <-- Three properties on rules.pizza’s --> <rule id="pizza"> <ruleref uri="#number"/> <ruleref uri="#foodsize"/> <tag> out.pizzasize=rules.foodsize; out.number=rules.number; </tag> pizzas with <ruleref uri="#tops"/> <tag> out.topping=rules.tops; </tag> </rule> <rule id="number"> <one-of> <item> <tag> out=1; </tag> <one-of> <item> a </item> <item> one </item> </one-of> </item> <item> two<tag> 2 </tag> </item> <item> three<tag> 3 </tag> </item> </one-of> </rule> </grammar>
Example in ABNF Form:
#ABNF 1.0 UTF-8; language en; mode voice; tag-format <semantics/1.0> root $order; $order = I would like a $drink {$.drink = new Object(); $.drink.liquid = $drink.type; $.drink.drinksize = $drink.drinksize;} and $pizza {$.pizza=$pizza;}; $kindofdrink = coke | pepsi | "coca cola"{$="coke";}; // "medium" is default if nothing said $foodsize = [ {$="medium";} | small | medium | large | regular {$="medium";}]; // Construct Array of toppings, return Array $tops = {$=new Array;} $top {$.push($top);} (and $top {$.push($top);})<1-> ; $top = anchovies | pepperoni | mushroom{$="mushrooms";} | mushrooms; // Two named properties (drinksize and type) on left hand side Rule Variable $drink = $foodsize $kindofdrink {$.drinksize=$foodsize; $.type=$kindofdrink; }; // Three properties on rules.pizza's Rule Variable $pizza = $number $foodsize {$.pizzasize=$foodsize; $.number=$number;} pizzas with $tops {$.topping=$tops;}; $number = (a | one){$="1";} | two{$="2";} | three{$="3";};
On the above grammar, the following utterance
"I would like a coca cola and three large pizzas with pepperoni and mushrooms."
Would create following struct Rule Variable on the rule "order":
{ drink: { liquid:"coke" drinksize:"medium"} pizza: { number: "3" pizzasize: "large" topping: [ "pepperoni", "mushrooms" ] } }
A Semantic Interpretation Tag (SI Tag) is a conforming SI Tag if it's content is matching the syntax as defined in the normative sections in this document.
There is no normative restriction on the size of a SI Tag.
A stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment with SI Tags is conforming if:
the tag-format for the grammar fragment or document is "semantics/1.0" or "semantics/1.0-literals".
The Speech Recognition Grammar Specification provides a tag-format declaration that identifies the format of the contents of the tag element in a speech grammar. The tag-format to reference Semantic Interpretation Tags conforming with the present specification is defined here as "semantics/1.0" or "semantics/1.0-literals". Note that the former is the default tag-format in the current Speech Recognition Grammar Specification when no explicit tag-format is specified.
It is expected that future revisions of this specification will use higher version numbers.
Other tag-formats can be used with Speech Recognition Grammars; in this case the tag-format must be explicitly declared and must not begin with "semantics/x.y" (where x and y are any digits).
A Semantic Interpretation Processor is a program that can parse and process Semantic Interpretation Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor or VoiceXML processor).
A Conforming Semantic Interpretation Processor
We anticipate that following will be the non-conforming conditions a processor may encounter:
The W3C Voice Browser Working Group has applied to IETF to register MIME types for both the ABNF and XML grammar forms (See Appendix G. Media Types and File Suffix of the Speech Recognition Grammar Specification)
The ABNF MIME type will identify ABNF grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.
Similarly, the XML grammar MIME type will identify XML grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.
A grammar that contains tags in a format other than conforming SI Tags must have an explicit tag format declaration specifying the format (see Speech Recognition Grammar Specification 4.8 Tag Format Declaration). The tag-format for a grammar that contains conforming Semantic Interpretation Tags is "semantics/1.0" (for Script tags) or "semantics/1.0-literals" (for String Literals).
Note: a VoiceXML 2.0 processor will require support for Semantic Interpretation Tags as defined here, but will allow to support other grammar formats or SRGS with other tags in addition (probably identified by other MIME type).
An ABNF or XML Grammar Processor is a conforming processor if:
This document was written with the participation of members of the W3C Voice Browser Working Group. The following have significantly contributed to writing this specification: