Copyright ©2003-2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].
The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.
Semantic Interpretation may be useful in combination with other specifications, such as Stochastic Language Models [N-GRAM], but their use with N-grams has not yet been studied.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 3 November 2006 W3C Last Call Working Draft of "Semantic Interpretation for Speech Recognition (SISR) Version 1.0". The Last Call period ends 24 November 2006.
Following the publication of this specification as Candidate Recommendation, a substantive change was required to remove the starttime/endtime feature. A list of changes since the Candidate Recommendation can be found in Appendix E. Alternatively, a non-normative version of this specification highlighting the changes since the previous version is available.
The Voice Browser Working Group believes that this specification addresses its requirements and all previous Last Call and Candidate Recommendation issues (see the Disposition of Comments document).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced as part of the Voice Browser Activity (activity statement), following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).
This document is for public review, and comments and discussion are welcomed on the (archived) public mailing list <www-voice@w3.org>.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is informative.
Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.
In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is present in the user utterance. What is needed is a computer processable representation of the information, the Semantic Result, more than a natural language transcript. The process of producing a Semantic Result representing the meaning of a natural language utterance is called Semantic Interpretation (SI).
The Semantic Interpretation process described in this specification uses Semantic Interpretation Tags (SI Tags) (see section 3.2) to provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar. When used with a [VOICEXML20] Processor, it is expected that a Semantic Interpretation Grammar Processor will convert the result generated by an [SRGS] speech grammar processor into an ECMAScript object that can then be processed as specified in section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML Forms in [VOICEXML20].
The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.
This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification [SRGS]. It is possible that Semantic Interpretation Tags as defined here can be used also with Stochastic Language Models [N-GRAM], but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.
The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:
This specification uses the ECMAScript Compact Profile [ECMA-327], which is a strict subset of [ECMA-262]. [ECMA-327] has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.
In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119]. Requirement levels for conforming Semantic Interpretation for Speech Recognition implementations are defined in Appendix A.
The sections in the main body of this document are normative unless otherwise specified. The appendices and examples in this document are informative unless otherwise indicated explicitly.
This specification normatively references [ECMA-327], which in turn references [ECMA-262]. The notation ES n is used in this document as shorthand for section number n in [ECMA-262].
SI Tags compute semantic values. During the semantic interpretation process, these values can be assigned to variables that are associated with the rules in the grammar. These variables are known as Rule Variables.
Every grammar rule has a single Rule Variable that holds a semantic value. The Rule Variable is typically assigned its value by the SI Tags within its grammar rule. SI Tags also have access to the Rule Variables of any other rules referenced by the current grammar rule and already processed up to that point in the utterance (according to the visibility constraints defined in section 6). The Rule Variables of other rules are referenced by the name of their grammar rule, as described in section 3.3.2.
Rule Variables can hold semantic values of any type defined in [ECMA-327]. They are not explicitly typed. Rule Variables that have not been assigned a value are not defined. SI authors will typically use scalar types, e.g. string or numeric values, in lower level rules and more structured objects in higher level rules (particularly root rules).
In addition to semantic values, certain other values corresponding to Rule Variables are available during SI processing.
For every Rule Variable there is an associated variable named
text
, of type String, which holds the substring (the
series of tokens) in the utterance that is governed by the
corresponding grammar rule. Text variables are not part of the
Rule Variable (see section 3.3.3) and the
value of the text variables cannot be modified.
Likewise, for every Rule Variable, there is an associated
variable called score
, of type Number, which holds a
value that is related to the confidence or probability of the
corresponding grammar rule or some similar measure. Higher score
values indicate higher confidence or probability over the
corresponding grammar rule. Processors that don't compute or
don't have access to such values must return undefined as the
score value. Score variables are not part of the Rule Variable
and the value of the score variables cannot be modified.
The semantic result for an utterance is the value of the Rule Variable of the root rule when all semantic interpretation evaluations have been completed. For certain result formats (e.g. [EMMA]), this value is serialized into an XML document according to the description in section 7. It is outside the scope of this specification to define how the semantic result is communicated to the application.
This section is informative.
In the context of the W3C Voice Browser architecture, the semantic result will be directly cast into ECMAScript variables in the VoiceXML interpreter (see section 3.1.6 in [VOICEXML20]). In the W3C Multimodal Interaction Framework [MMI-FRAMEWORK], the semantic result is expected to be transformed into EMMA following the mechanism described in section 7. In other contexts, the mechanism described in section 7 can be used to transform the semantic result into other XML formats.
Score values are highly dependent on the processor's implementation. In most implementations using speech recognition, scores are likely to be dependent on factors such as audio channel quality, grammar contents, grammar weights, language, individual speaker characteristics, and others. Scores for a particular word or phrase within a grammar are typically comparable over instances of the same word or phrase over time. Scores for different words in a single grammar are also typically comparable to one another. Scores across grammars, or scores for words and word sequences, or scores between different processors, are very often not comparable. It is anticipated that scores will be useful only for annotating the results, not for influencing the results during SI processing. Note that an SI processor doesn't require a speech recognizer, and thus that the score does not even have to be related to speech recognition.
Semantic Interpretation Tags are added in the string content
of the tag
elements in the grammar rule expansion,
as described in section 2.6 of [SRGS].
This specification further uses the term Semantic Interpretation
Tag (or SI Tag) to refer to such tag.
This specification defines two different Semantic
Interpretation tag syntaxes. The two different possible values of
the tag-format
declaration in the grammar define
which of the two syntaxes is being used. The different syntaxes
only change the processing of tags during Semantic
Interpretation, in all other respects the grammar behaves
identically.
The "Script" tag syntax, enabled by setting the
tag-format
to semantics/1.0
, defines
the contents of tags to be ECMAScript. Each tag is a valid
[ECMA-327] program. Section 3.2.2 describes the processing of this tag syntax
in more detail.
The "String Literal" tag syntax, enabled by setting the
tag-format
to semantics/1.0-literals
,
defines the contents of tags to be strings. This syntax does not
have the expressive power of a full scripting language, but does
provide a way to produce semantic results consisting of simple
strings. Section 3.2.3 describes this tag
syntax in more detail.
Within one grammar, it is not possible to mix the two tag
syntaxes. All tags in one grammar must have the same
tag-format
. However, it is possible for externally
referenced grammars to have a different tag-format
to the parent grammar from which they are referenced from.
Below are two example formats of SI Tags in the Speech Recognition Grammar Specification [SRGS] (tag-content represents the content of the tag which can be either ECMAScript code or a String Literal).
In the XML grammar format, SI Tags are specified as the
content of the <tag>
element:
<tag> tag-content </tag>
In the ABNF grammar format, SI Tags are enclosed in curly
braces or in the three-character sequences '{!{'
and
'}!}'
:
{ tag-content } {!{ tag-content }!}
A Semantic Interpretation Script (SI Script) holds a string that is treated as the source text of a valid [ECMA-327] Program ("Program" is defined by ES 14).
The environment in which SI Tags are embedded may introduce escaped characters, character references, or other markup that has to be resolved by the environment. The result after resolution is treated as ECMAScript code.
It is illegal to make an assignment to a variable that has not
been previously declared (either implicitly as is the case for
Rule Variables or explicitly by using a var
statement). Attempting to assign to an undeclared variable will
result in a runtime error.
A tag using the String Literal tag syntax has content that is
a sequence of zero or more characters. If the character sequence
is not empty, it has to follow either the
DoubleStringCharacters
or the
SingleStringCharacters
production of ES 7.8.4
During processing, a tag with a String Literal has the same effect as a script that assigns the content of the tag, as a string literal, to the Rule Variable of the rule the tag is in.
This section is informative.
If multiple tags are present in the rule expansion, the Rule Variable is set to the value of the last tag in the expansion. Prior tags are overwritten by the final tag.
A grammar using the Script tag syntax can reference rules of a
grammar using the String Literal tag syntax. The value of the
string literal can be obtained by the parent rule using the Rule
Variable of the referenced rule. The recognized text of the
referenced rule is also available in the
meta.latest().text
and
meta.rulename.text
variables (where
rulename
is the name of the rule).
A grammar using the String Literal tag syntax can reference rules in other grammars (which can be using either the Script tag syntax or the String Literal tag syntax). One consequence of this is that a grammar using the String Literal tag syntax can return a non-string result (e.g. an ECMAScript Object, Number, Boolean, etc) if it references a grammar that uses the Script tag syntax which returns a non-string result. See section 5 for the way semantic results from a referenced grammar can be used in a grammar with String Literal tag syntax.
Authors should take care to set the tag-format
correctly. Using the String Literal tag syntax when the
tag-format
is set to semantics/1.0
will
generally result in a runtime error. However, the converse (using
the Script tag syntax when the tag-format
is set to
semantics/1.0-literals
) will not produce a runtime
error but rather result in erroneously populating Rule Variables
with ECMAScript code.
Examples of equivalent grammars, one using the Script tag syntax and the other using the String Literal tag syntax, are given below for both the XML Form and ABNF Form.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="answer"> <rule id="answer" scope="public"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>no</tag> </rule> </grammar>
The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="answer"> <rule id="answer" scope="public"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>out="yes";</tag></item> <item><token>you bet</token><tag>out="yes";</tag></item> <item xml:lang="fr-CA">oui<tag>out="yes";</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>out="no";</tag> </rule> </grammar>
#ABNF 1.0; language en-US; tag-format <semantics/1.0-literals>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {yes} | "you bet" {!{yes}!} | "oui"!fr-CA {yes}; $no = (no | nope | no way) {no};
The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {out="yes";} | "you bet" {!{out="yes";}!} | "oui"!fr-CA {out="yes";}; $no = (no | nope | no way) {out="no";};
A SI Script can access Rule Variables using the syntax defined in this section. This syntax applies only to documents for which the SI Tags hold SI Scripts (and not to documents where SI Tags contain the String Literals tag syntax).
Every grammar rule has a single Rule Variable that holds a [ECMA-327] value. This Rule Variable can both be evaluated and assigned to.
The Rule Variable is identified by out
.
Properties of the Rule Variable can be individually accessed
by out.identifier
, where identifier
is
the name of the property.
out (identifies the Rule Variable) out.pizza (identifies the pizza property of the Rule Variable)
This section is informative.
The Semantic Interpretation Script typically assigns a value
to the Rule Variable of its embedding grammar rule. The Rule
Variable is initialized to an empty Object before the first tag
in the grammar rule is executed (see section 6.3). The SI author will usually either add
properties to this Object or alternatively discard it by
assigning a primitive value (e.g. String or Number) to the Rule
Variable. Since the Rule Variable is initialized before the tag
is executed, a var
statement is not required prior
to assigning to it.
As a consequence of normal ECMAScript behavior, the SI author
is free to override the Rule Variable type as well as value
within the bounds of legal ECMAScript. Note that [ECMA-327] enforces rules that affect Semantic
Interpretation Scripts. For example, [ECMA-327] reserved words cannot be used as a
property. Thus, out.for
is illegal because it uses
the [ECMA-327] reserved word
for
.
// An Object with property name prop out.prop = "my property"; // A String with value "my value" out = "my value"; // A String with value "my value" out.prop = "my property"; out = "my value"; // A String with value "my value" out = "my value"; out.prop = "my property"; // A String with value "ab" out.prop1 = "a"; out.prop2 = "b"; out = out.prop1 + out.prop2; // An Object with property name prop out = "my value"; out = new Object(); out.prop = "my property";
SI Scripts can access the Rule Variable associated with grammar rules referenced in SI Tags that appear after (to the right or below) the rule reference in the grammar expansion, and only if the referenced rule was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when Rule Variables associated to rule references can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.
Rule Variables associated to referenced rules can both be
evaluated and assigned to. Every SI Script has access to a
rules
object that has a property holding the Rule
Variable value for every visible rule. The Rule Variable
associated to a rule reference is identified by
rules.rulename
, where rulename
is the
rulename of the rule, as defined in Section 3.1 Basic Rule
Definition in [SRGS]. Individual
properties of a Rule Variable can be identified by
rules.rulename.identifier
, where
rulename
is the name of the rule and
identifier
is the name of the property.
The Rule Variable for the latest rule reference that was used
in the expansion matching the utterance up to the position of the
SI Tag can also be referenced through
rules.latest()
.
In an expression, both the Rule Variables of the current grammar rule and the referenced rules can be evaluated and assigned to.
Special rules (NULL, VOID, GARBAGE) cannot be evaluated.
This section is informative.
The rules.rulename
notation (where
rulename
is the name of a referenced rule) can be
used equivalently for explicit local rule references, for
explicit references to a named rule of a grammar, and for
implicit rule references (see SRGS Section 2.2 Rule Reference in
[SRGS] for a definition of explicit and
implicit rule references). In the case of a legal implicit rule
reference, the rule name is indicated by the root
attribute of the <grammar>
element (XML form)
or the root
keyword (ABNF form) in the referenced
grammar.
// The Rule Variable associated to the referenced rule "rulename" rules.rulename // The property "prop" of the Rule Variable associated with the referenced // rule "rulename" rules.rulename.prop // The Rule Variable associated to the latest matching rule reference before // the SI Tag rules.latest() // The property "prop" of Rule Variable associated to latest matching rule // reference before the SI Tag rules.latest().prop
Section 6 describes the visibility rules
for accessing Rule Variables. If according to these rules a Rule
Variable is not visible, one can still evaluate or declare and
assign to the variable with that name (it is just a property on
the rules
object). The value assigned to a property
of the rules
object that has the name of a Rule
Variable will be overwritten when that Rule Variable is visible
according to section 6. This behavior can be
used to "initialize" Rule Variables to handle cases where a
referenced rule may not actually be matched depending on the
input to the grammar.
In the following grammar, by declaring and assigning
rules.foodsize
a default value, the value for the
drink
rule will always be:
{ drinksize: "medium", type: "coke" }
regardless of whether the input is 'coke' or 'medium coke':
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="drink"> <rule id="drink"> <-- Note: rules object always exists in scope --> <tag>rules.foodsize="medium"; </tag> <item repeat="0-1"> <ruleref uri="#foodsize"/> </item> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <rule id="foodsize"> <one-of> <item>small</item> <item>medium</item> <item>large</item> </one-of> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> </one-of> </rule> </grammar>
A Rule Variable's text variable is identified by
meta.rulename.text
, where rulename
is
the name of the Rule Variable. The text variable of the Rule
Variable referred to by rules.latest()
is identified
by meta.latest().text
. The text variable associated
to the current grammar rule is identified by
meta.current().text
. The text variable of the
current grammar rule is read-only.
A Rule Variable's score variable is identified by
meta.rulename.score
, where rulename
is
the name of the Rule Variable. The score variable of the Rule
Variable referred to by rules.latest()
is identified
by meta.latest().score
. The score variable
associated to the current grammar rule is identified by
meta.current().score
. The score variable of the
current grammar rule is read-only.
This section is informative.
Since the text
and score
variables
of the current grammar are read-only, they behave as read-only
properties as defined in [ECMA-327]. As
a consequence, attempts to assign to the text
or
score
variable associated to the Rule Variable of
the current grammar rule will be ignored. Note, however, that the
text
and score
properties of a
referenced rule (i.e. those properties of
meta.rulename()
where rulename
is the
referenced rule or meta.latest()
), are not
read-only.
// The text variable of the Rule Variable called "rulename" meta.rulename.text // The text variable of the Rule Variable referenced to by rules.latest() meta.latest().text // The text (read-only) variable of the current grammar rule meta.current().text
semantics/1.0
or
semantics/1.0-literals
<tag>
element to
the grammar header for the purpose of setting global
variablesThe header of an [SRGS] grammar may contain one or more global SI Tags. In grammars using the Script tag syntax, these tags are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (see section 4.1 of [SRGS]). Global tags are ignored in grammars using the String Literal tag syntax.
The SI Tags are evaluated only once in a global scope that will be shared by all evaluations (see section 6.3)
Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.
In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header and before the first rule.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="rule"> <tag>var x=1;</tag> <tag>var y='abcd';</tag> <rule id="rule"> <one-of> <item>yes</item> <item>no</item> </one-of> </rule> </grammar>
In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header and before the first rule. Both tag delimiting syntaxes are illustrated in the example.
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $rule; {var x=1;}; {!{var y='abcd';}!}; $rule = yes | no;
For a given parse, if there is no SI Tag attached to the
expansion in the grammar rule that is used to match the
utterance, then the value for the out
Rule Variable
is determined as follows. If there are no rule references in the
parse, the value for the text meta variable
(meta.current().text
) is automatically copied into
the Rule Variable (which then becomes of type String). Otherwise,
the value of the Rule Variable of the last rule reference in the
parse (rules.latest()
) is automatically copied into
the Rule Variable.
For the following rule, rules.drink
is either
"coke", "pepsi" or "coca cola". Similarly for
meta.drink.text
.
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola</item> </one-of> </rule>
For the following rule, there is an String Literal tag
associated with "coca cola" and hence rules.drink
is
either "coke" or "pepsi". However, meta.drink.text
is either "coke", "coca cola", or "pepsi".
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>coke</tag></item> </one-of> </rule>
For the following grammar, the utterance "I want to fly to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly to <ruleref uri="#airports"/> </rule> <rule id="airports" scope="private"> <one-of> <item><ruleref uri="#USairport"/></item> <item><ruleref uri="#otherairport"/></item> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
Note that the default assignment has been designed to handle the simplest but most frequent cases only. It cannot cope with combining information from different rule references. For example, the grammar below would return the information about the last airport only, not about both airports. For the following grammar, the utterance "I want to fly from Chicago to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> to <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
In order to make this grammar return both airports, one would have to use the Script tag syntax, as shown below. This functionality cannot be achieved by relying only on literal tags and default assignments.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag>out.departure = rules.latest();</tag> to <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag>out.arrival = rules.latest();</tag> </rule> </grammar>
Grammar http://www.example.com/places.grxml:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="USairport"> <rule id="USairport" scope="public"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="public"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in [N-GRAM]), the visibility rules and order of evaluation may be defined differently.
After the initialization of the global scope (see section 6.3), the visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H Logical Parse Structure in [SRGS] .
Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical Parse Structure is only a means to illustrate the order of evaluation and visibility rules for SI Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.
The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.
The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].
Output entities can be one out of three kinds:
Appendix H in [SRGS] contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.
For the purpose of building the logical parse, all String Literals are assumed to be converted into the equivalent SI Script as defined in 3.2.3
The sentence "turn the heating off" on the following XML Form grammar
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="command"> <rule id="command"> <one-of> <item>set</item> <item>turn</item> </one-of> <ruleref uri="#object"/> <ruleref uri="#state"/> <tag>out.o=rules.object; out.s=rules.state;</tag> </rule> <rule id="object"> <item repeat="0-1">the</item> <one-of> <item> <one-of> <item>heating</item> <item>cooling</item> </one-of> <tag>out="airco";</tag> </item> <item>radio<tag>out="radio";</tag></item> <item>lights<tag>out="lights";</tag></item> </one-of> </rule> <rule id="state"> <one-of> <item>to</item> <item><ruleref special="NULL"/></item> </one-of> <one-of> <item>on<tag>out="1";</tag></item> <item>off<tag>out="0";</tag></item> <item>warm<tag>out="w";</tag></item> <item>cool<tag>out="c";</tag></item> <item>cold<tag>out="c";</tag></item> </one-of> </rule> </grammar>
or equivalent ABNF Form grammar
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $command; $command = (set | turn) $object $state {out.o=rules.object; out.s=rules.state;}; $object = [the] (heating | cooling){out="airco";} | radio{out="radio";} | lights{out="lights";}; $state = (to|$NULL) (on{out="1";} | off{out="0";} | warm{out="w";} | cool{out="c";} | cold{out="c";});
will result in the logical parse
[$command [turn, $object [the, heating, {out="airco";}], $state [off, {out="0";}], {out.o=rules.object; out.s=rules.state;}] ]
The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses, with one parse for every grammar rule application.
The flat parse for a given rule application is represented as:
The output entities are as in the logical parse structure, except that rule references are represented without an array of output entities but followed by a sequence number in parenthesis.
The equivalent flat parse list for the above example is:
$command(1): turn, $object(1), $state(1), {out.o=rules.object; out.s=rules.state;} $object(1): the, heating, {out="airco";} $state(1): off, {out="0";}
The following example illustrates the use of the sequence number for rules that are applied more than once. Consider the grammar with String Literals, in XML Form:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="a"> <rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <one-of> <item> <item repeat="0-1">t1</item> <tag>tag1</tag> </item> <item> <ruleref uri="#d"/> <tag>tag2</tag> </item> </one-of> </rule> <rule id="b"> <one-of> <item>t2</item> <item>t3<tag>tag3</tag></item> <item>t4</item> </one-of> </rule> <rule id="c"> <item repeat="1-2">t5<tag>tag5</tag></item> </rule> <rule id="d"> t6 <ruleref uri="#c"/> </rule> </grammar>
or equivalently in ABNF Form:
#ABNF 1.0; language en-US; tag-format <semantics/1.0-literals>; root $a; $a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2}; $b = t2 | t3 {tag3} | t4; $c = (t5 {tag5})<1-2>; $d = t6 $c;
Given the input "t2 t3 t5 t5", the logical parse structure is:
[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]
and the flat parse list is:
$a: $b(1), $b(2), $c(1), {tag1} $b(1): t2 $b(2): t3, {tag3} $c(1): t5, {tag5}, t5, {tag5}
Before evaluating any scripts in the flat parse list, a global anonymous ECMAScript scope is created for the grammar. This global scope is initialized by executing the scripts that are in the global tags in the grammar header (see section 4.2).
During evaluation of a script in the flat parse list, the global scope is accessible for reading only.
Every script has only one global scope associated: the global scope for the grammar in which the script appears. Scripts in referenced rules that are located in a referenced external grammar are thus executed with access to that referenced grammar's global scope, and don't have access to the referencing grammar's global scope.
The tags within a flat parse are executed in the order in which they appear, left to right. The global tags (in the grammar header) are executed in document order. See section 6.4 for details.
For each flat parse, a new anonymous ECMAScript scope is created that is a direct child of the global scope object for the grammar in which the related rule is defined. The ECMAScript scope chains thus always have the global scope (the scope of the whole parse) as the top-level object, and the scope belonging to the parse list as the successor.
Access to variables in tag executions are resolved with the scope chain according to the ECMAScript rules (ES 10.1.4).
The variables object according to [ECMA-327] is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule.
Before the first tag in a flat parse is executed, the environment of a new scope is set up in the following way:
out
is initialized to a new
object as constructed by the expression new
Object()
.rules
is initialized to a new
object as constructed by the expression new
Object()
.meta
is initialized to a new
object as constructed by the expression new
Object()
.meta.current().text
is initialized (read-only)
to the text variable of the current grammar rule.meta.current().score
is initialized
(read-only) to the score value related to the current grammar
rule.rules.latest()
returns undefined.meta.latest()
returns undefined.When execution of the flat parse is finished, the scope object
of this flat parse is removed from the scope chain. The scope
belonging to the referencing flat parse is then updated in the
following way (replace rulename
with the name of the
rule in what follows):
rules.rulename
of the scope of the referencing
rule is set to the value of the variable out
of
the child scope.meta.rulename.text
of the scope of the
referencing rule is set to the concatenation of all terminals
within the rule reference.meta.rulename.score
of the scope of the
referencing rule is set to score value for the referenced
rule.rules.latest()
= rules.rulename
(both variables are in the scope of the referencing rule).meta.latest().text
=
meta.rulename.text
(both variables are in the
scope of the referencing rule).meta.latest().score
=
meta.rulename.score
(both variables are in the
scope of the referencing rule).Note: Whether or not the out
, rules
and meta
variables are enumerated when enumerating
the scope object is not defined by this specification and may
vary over implementations. Authors are discouraged to use
enumeration of the scope object.
rules.rulename
(where rulename
is
the name of the referenced rule).rules.latest()
always refers to the result of
the previous reference in the current scope;
meta.latest().text
refers to the corresponding
text utterance; and meta.latest().score
refers to
the corresponding score value.Since the global scope is read-only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see section 4.2)
The following rule contains two Rule Variables associated with the same rule "city". The XML Form is:
<rule id="fromto"> from <ruleref uri="#city"/> <tag>out.fromcity=rules.city.name;</tag> to <ruleref uri="#city"/> <tag>out.tocity=meta.city.text;</tag> </rule>
and the equivalent ABNF Form is:
$fromto = from $city {out.fromcity=rules.city.name;} to $city {out.tocity=meta.city.text;};
To determine which of the Rule Variable instances the tags
refer to, we can build the flat parse for $fromto
,
which is always of the form:
$fromto: from, $city(1), {out.fromcity=rules.city.name;}, to, $city(2), {out.tocity=meta.city.text;}
From this it follows that rules.city.name
in the
first tag refers to the first Rule Variable
rules.city
in the rule, and that the reference to
meta.city.text
in the second tag is to the second
Rule Variable named rules.city
.
In the following rule, the flat parse is depending on whether
the input matches the optional rule b
. The XML Form
is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out.x=rules.b.x;</tag> </rule>
and the equivalent ABNF Form is:
$a = $b [$b] {out.x=rules.b.x;};
The two possible flat parses are:
$a: $b(1), {out.x=rules.b.x;} $a: $b(1), $b(2), {out.x=rules.b.x;}
The reference rules.b.x
in the tag will thus
refer to either the first or the last rule b
,
depending on whether the optional rule b
was matched
in the input.
The SI Tag in the rule below contains a couple of references to Rule Variables that are undefined since there is no Rule Variable with that name before the tag in the flat parse. The XML Form is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#c"/></item> <tag>out.x=rules.c; out.y=rules.d; out.z=rules.e;</tag> <ruleref uri="#e"/> </rule>
and the equivalent ABNF Form is:
$a = $b [$c] {out.x=rules.c; out.y=rules.d; out.z=rules.e;} $e;
The two possible flat parses are:
$a: $b(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1) $a: $b(1), $c(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)
This means that:
out.x
is undefined if rule c
didn't match in the utterance.out.y
is undefined because rule d
is not in the rule expansion at all.out.z
is undefined because rule e
doesn't appear before the tag.Within a single SI Tag, the order of evaluation is determined by [ECMA-327] for the evaluation of a valid [ECMA-327] Program (ES 14).
All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see section 4.2).
The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI Tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating String Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see section 6.2).
The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI Tag using that reference's value is processed.
Consider the following rules in XML Form:
<rule id="a"> <ruleref uri="#b"/> <tag>out.y=rules.b.x;</tag> <item repeat="0-1"> <ruleref uri="#b"/><tag>out.y=out.y+rules.b.x;</tag> </item> </rule> <rule id="b"> foo <tag>out.x=1;</tag> <one-of> <item>bar<tag>out.x=3;</tag></item> <item> <item repeat="1-">boo<tag>out.x=out.x+1;</tag></item> </item> </one-of> </rule>
or equivalently in ABNF Form:
$a = $b {out.y=rules.b.x;} [$b {out.y=out.y+rules.b.x;}]; $b = foo {out.x=1;} (bar {out.x=3;} | (boo {out.x=out.x+1;})<1->);
For the input "foo boo boo boo", the flat parse lists are:
$a: $b(1), {out.y=rules.b.x} $b(1): foo, {out.x=1;}, boo, {out.x=out.x+1;}, boo, {out.x=out.x+1;}, boo, {out.x=out.x+1;}
and out.y
evaluates to 4.
For the input "foo bar foo boo", the flat parse lists are:
$a: $b(1), {out.y=rules.b.x;}, $b(2), {out.y=out.y+rules.b.x;} $b(1): foo, {out.x=1;}, bar, {out.x=3;} $b(2): foo, {out.x=1;}, boo, {out.x=out.x+1;}
and out.y
evaluates to 5.
The rules.b.x
and rules.c.x
refer to
the respective Rule Variable properties:
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>
The rules.c.x
causes a run-time error because it
is used to the left of rule c
:
<rule id="a"> <ruleref uri="#b"/> <tag>out.x = rules.b.x + rules.c.x;</tag> <ruleref uri="#c"/> </rule>
The rules.b.x
evaluates to the x
property of rules.b
if rule b
is
matched on the input utterance. Otherwise it causes a run-time
error:
<rule id="a"> <item repeat="0-1"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>
A safer way to write this rule could be (assuming
x
is of type Number):
<rule id="a"> <tag>out.x=0;</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <ruleref uri="#c"/> <tag>out.x = out.x + rules.c.x;</tag> </rule>
The rules.b.x
evaluates to the last occurrence of
rule b
in the repeat:
<rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
If the purpose was to add or concatenate over each occurrence
of rules.b
, it should be written as:
<rule id="a"> <item repeat="1-"> <ruleref uri="#b"/><tag>out.x=out.x+rules.b.x;</tag> </item> <ruleref uri="#c"/> <tag>out.x=out.x+rules.c.x;</tag> </rule>
The rules.b
evaluates to the last occurrence of
rules.b
in the repeat="0-"
expansion,
if any, otherwise it is undefined:
<rule id="a"> <item repeat="0-"><ruleref uri="#b"/><ruleref uri="#d"/></item> <ruleref uri="#c"/> <tag>out.x=rules.b+rules.c.x;</tag> </rule>
Either rules.b.x
or rules.c.x
will
cause a run-time error depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
This could be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> </rule>
The rules.b.x
refers to whichever
rules.b
actually matched:
<rule id="a"> <one-of> <item><ruleref uri="#b"/> a</item> <item>a <ruleref uri="#b"/></item> </one-of> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
One of the operands to every addition causes a run-time error here depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <one-of> <item><ruleref uri="#d"/></item> <item><ruleref uri="#e"/></item> </one-of> <tag>out.x=(rules.b.x+rules.c.x) * (rules.d.x+rules.e.x);</tag> </rule>
This rule can be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> <one-of> <item><ruleref uri="#d"/><tag>out.x=out.x*rules.d.x;</tag></item> <item><ruleref uri="#e"/><tag>out.x=out.x*rules.e.x;</tag></item> </one-of> </rule>
Evaluation of rules.b.x
always causes a run-time
error because the expression will be evaluated only when rule
c
matches, not rule b
. (When rule
b
matches, the default assignment would cause
out=meta.b.text
).
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/><tag>out.x=rules.b.x+rules.c.x;</tag></item> </one-of> </rule>
A more useful rule could be:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> </rule>
The expression is only evaluated if rule c
matches; in that case both rules.b
and
rules.c
are defined:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </item> </rule>
The expression is evaluated for every occurrence of rule
c
. Note that this will actually result in
rules.b.x
to be added to out.x
for the
last occurrence of rule c
because every evaluation
will overwrite the previous result.
<rule id="a"> <ruleref uri="#b"/> <item repeat="1-"> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </item> </rule>
Same effect as previous example except that now the expression
is not evaluated if rule c
did not match once.
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-"> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </item> </rule>
These rules do the obvious concatenation of digits. Note that
the ds
property is first initialized to
""
because otherwise in the first evaluation of the
expression, ds
would be undefined and would cause a
run-time error:
<rule id="digits"> <tag>out.ds="";</tag> <item repeat="1-"> <ruleref uri="#digit"/> <tag>out.ds = out.ds + rules.digit;</tag> </item> </rule> <rule id="digit"> <one-of> <item>"0"</item> <item>"1"</item> <item>"2"</item> <item>"3"</item> <item>"4"</item> <item>"5"</item> <item>"6"</item> <item>"7"</item> <item>"8"</item> <item>"9"</item> </one-of> </rule>
The rules.latest()
resolves to
rules.c
:
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out=rules.latest();</tag> </rule>
The rules.latest()
resolves to
rules.b
:
<rule id="a"> <ruleref uri="#c"/> <ruleref uri="#b"/> <tag>out=rules.latest();</tag> </rule>
The rules.latest()
returns
undefined
:
<rule id="a"> b c <tag>out=rules.latest();</tag> </rule>
If rule b
matches, rules.latest()
resolves to rules.b
. If rule c
matches,
rules.latest()
resolves to rules.c
:
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>out=rules.latest();</tag> </rule>
This is equivalent to:
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/><tag>out=rules.latest();</tag></item> <item><ruleref uri="#c"/><tag>out=rules.latest();</tag></item> </one-of> </rule>
The rules.latest()
resolves to
rules.b
, if rule b
matches, if not, it
resolves to rules.a
:
<rule id="x"> <ruleref uri="#a"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out=rules.latest();</tag> </rule>
The effect is equivalent to:
<rule id="x"> <ruleref uri="#a"/><tag>out=rules.latest();</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out=rules.latest();</tag></item> </rule>
The rules.latest()
resolves to the last
occurrence of rules.a
:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/></item> <tag>out=rules.latest();</tag> </rule>
The effect is equivalent to:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/><tag>out=rules.latest();</tag></item> </rule>
Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting [EMMA]).
If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI Tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.
The serialization into XML has been designed as a convenient mechanism to generate XML fragments directly from SI grammars. It has not been designed as a generic conversion mechanism from [ECMA-327] objects into XML fragments. It is not a generic conversion mechanism for at least the following reasons:
DontEnum
properties are not serialized.The serialization of the ECMAScript result into an XML fragment is governed by the following transformations rules:
Object
but a simple scalar type (String, Number,
Boolean, Null or Undefined) then the resulting XML fragment
only consists of character data without any mark-up. The
character data will be the value of the top-level Rule Variable
as if the ToString()
operation had been performed
on an argument of this type (e.g., for Boolean, the result
would be true
or false
).ToString()
operation had been
performed on an argument of this type.Array
object (e.g.
a[0]
, a[1]
. etc.) become XML child
elements with name <item>
. Each
<item>
element has an attribute named
index
, which is the index of the corresponding
element in the array. In addition, the XML element containing
the <item>
elements includes an attribute
named length
, whose value is given by the length
property of the ECMAScript Array object. Any other properties
of an Array object, for instance the keys of an associative
array (e.g. a["prop"]
), are subject to the same
transformation rules as the regular properties of an object. In
a sparse array, only those elements which hold defined values
will be serialized._attributes
,
_value
, _nsdecl
and
_nsprefix
will be treated according to the rules
described in the sections below.Notes:
DontEnum
attribute
(see ES 8.6.1) are not serialized. This prevents functions and
built-in properties from being serialized.Array
object, the
length
attribute will not be present because there
will be no XML element containing the <item>
child elements.Following the above principles, to take the top-level Rule Variable with the properties drink and pizza of the example grammar in section 8:
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: "3", pizzasize: "large", topping: [ "pepperoni" "mushrooms" ] } }
SI processing in an XML environment would generate the following document:
<drink> <liquid>coke</liquid> <drinksize>medium</drinksize> </drink> <pizza> <number>3</number> <pizzasize>large</pizzasize> <topping length="2"> <item index="0">pepperoni</item> <item index="1">mushrooms</item> </topping> </pizza>
The following example ECMAScript object would cause an error
because the $size$
property while a valid name in
ECMAScript is not a valid name for an XML Element:
{ drink: { liquid:"coke", $size$:"medium"} }
Variables named _attributes
and
_value
can be created and used by the author to
enable the generation of richer XML results, including the
following structures:
The _attributes
object is used to hold property
name/value pairs which will be rendered as XML attributes of the
object which contains _attributes
.
The _value
variable is used to hold a scalar
value for character data contained in an element or to hold the
value of an attribute.
Semantic Interpretation processors treat these objects in the following way:
_attributes
object
are rendered as XML attributes of the containing object._value
is treated as character
data content of the containing object or the value of an
attribute if the containing object is a child of
_attributes
.If the value of _value
is not a scalar type, the
ToString()
operation is performed to generate a
string value.
_attribute
has a name that is not a legal name for
an XML attribute.
The following ECMAScript object:
{ martini: { gin: { _value: "Bombay Sapphire", _attributes { ratio: 8 } }, vermouth: { _value: "Noilly Prat" , _attributes { ratio: 1 } }, _attributes { method: "shaken" } } }
would generate the following XML result:
... <martini method="shaken"> <gin ratio="8">Bombay Sapphire</gin> <vermouth ratio="1">Noilly Prat</vermouth> </martini> ...
The object named _nsdecl
is used to declare a
namespace [XML Names] in an element.
The property named _nsprefix
enables the SI author
to associate an XML element or attribute with a particular
namespace.
When an object contains the _nsdecl
property, the
namespace declaration is attached to the resultant XML serialized
element for this object. The _prefix
property of
_nsdecl
indicates the namespace prefix and the
_name
property of _nsdecl
indicates the
corresponding namespace name (usually a URI reference). If the
_prefix
property is an empty string, the default
namespace is declared. If both _prefix
and
_name
are empty strings, the namespace declaration
xmlns=""
applies.
When an Array
object contains the
_nsprefix
property, the prefix also applies to the
automatically generated <item>
elements and
length
and index
attributes.
Note that this transformation produces an XML fragment - see [XML Names] for rules on valid namespace usage in XML.
_nsprefix
can be used for example to generate XML
attributes such as emma:hook
or
emma:tokens
when generating XML fragments to be
embedded in EMMA documents. See Appendix C of the [EMMA] specification for more information and
examples. The namespace declaration with _nsdecl
may
not be needed when provided by the XML document in which the
fragment will be embedded.
The following ECMAScript object:
{ drink: { _nsdecl: { _prefix:"n1", _name:"http://www.example.com/n1" }, _nsprefix:"n1", liquid: { _nsdecl: { _prefix:"n2", _name:"http://www.example.com/n2" }, _attributes: { color: { _nsprefix:"n2", _value:"black" } }, _value:"coke" }, size:"medium" } }
would generate the following XML result:
<n1:drink xmlns:n1="http://www.example.com/n1"> <liquid n2:color="black" xmlns:n2="http://www.example.com/n2">coke</liquid> <size>medium</size> </n1:drink>
Note that the _nsprefix
property only applies to
its parent object and hence neither the
<liquid>
element nor the
<size>
element are associated with a namespace
in this fragment.
With the grammar illustrated below, the following utterance
"I would like a coca cola and three large pizzas with pepperoni and mushrooms."
would create the following Rule Variable on the rule
order
:
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: "3", pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] } }
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" tag-format="semantics/1.0" root="order"> <rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>out="coke";</tag></item> </one-of> </rule> <rule id="foodsize"> <tag>out="medium";</tag> <!-- "medium" is default if nothing said --> <item repeat="0-1"> <one-of> <item>small<tag>out="small";</tag></item> <item>medium</item> <item>large<tag>out="large";</tag></item> <item>regular<tag>out="medium";</tag></item> </one-of> </item> </rule> <!-- Construct Array of toppings, return Array --> <rule id="tops"> <tag>out=new Array;</tag> <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> <item repeat="1-"> and <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> </item> </rule> <rule id="top"> <one-of> <item>anchovies</item> <item>pepperoni</item> <item>mushroom<tag>out="mushrooms";</tag></item> <item>mushrooms</item> </one-of> </rule> <!-- Two properties (drinksize, type) on left hand side Rule Variable --> <rule id="drink"> <ruleref uri="#foodsize"/> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <!-- Three properties on rules.pizza --> <rule id="pizza"> <ruleref uri="#number"/> <ruleref uri="#foodsize"/> <tag>out.pizzasize=rules.foodsize; out.number=rules.number;</tag> pizzas with <ruleref uri="#tops"/> <tag>out.topping=rules.tops;</tag> </rule> <rule id="number"> <one-of> <item> <tag>out=1;</tag> <one-of> <item>a</item> <item>one</item> </one-of> </item> <item>two<tag>out=2;</tag></item> <item>three<tag>out=3;</tag></item> </one-of> </rule> </grammar>
#ABNF 1.0 UTF-8; language en; mode voice; tag-format <semantics/1.0>; root $order; $order = I would like a $drink {out.drink = new Object(); out.drink.liquid = rules.drink.type; out.drink.drinksize = rules.drink.drinksize;} and $pizza {out.pizza=rules.pizza;}; $kindofdrink = coke | pepsi | "coca cola"{out="coke";}; // "medium" is default if nothing said $foodsize = {out="medium";} [small {out="small";} | medium | large {out="large";}| regular {out="medium";}]; // Construct Array of toppings, return Array $tops = {out=new Array;} $top {out.push(rules.top);} (and $top {out.push(rules.top);})<1->; $top = anchovies | pepperoni | mushroom{out="mushrooms";} | mushrooms; // Two properties (drinksize, type) on left hand side Rule Variable $drink = $foodsize $kindofdrink {out.drinksize=rules.foodsize; out.type=rules.kindofdrink; }; // Three properties on rules.pizza's Rule Variable $pizza = $number $foodsize {out.pizzasize=rules.foodsize; out.number=rules.number;} pizzas with $tops {out.topping=rules.tops;}; $number = (a | one){out="1";} | two{out="2";} | three{out="3";};
The following grammar demonstrates the use of Semantic Interpretation for computation within a grammar.
This simple number grammar accepts as input whole numbers between 0 and 99,999 inclusive. It demonstrates how rule references may be reused multiple times and the returned SI information processed differently each time. The grammar also shows how the Rule Variable may be given a default value (0 in this case) and also used as an intermediate variable during computation (essentially incrementing the running total stored in the Rule Variable). In this example, the Rule Variable type is changed from an Object to a Number but an alternative strategy might just as easily store the number as a property of the Rule Variable object.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" tag-format="semantics/1.0" root="main"> <rule id="main"> <one-of> <item> <ruleref uri="#sub_hundred_thousand"/> <tag>out = rules.sub_hundred_thousand;</tag> </item> <item> <ruleref uri="#sub_thousand"/> <tag>out = rules.sub_thousand;</tag> </item> <item> <ruleref uri="#sub_hundred"/> <tag>out = rules.sub_hundred;</tag> </item> </one-of> </rule> <rule id="sub_hundred_thousand"> <ruleref uri="#sub_hundred"/> <tag>out = (1000 * rules.sub_hundred)</tag> thousand <item repeat="0-1"> <item repeat="0-1">and</item> <ruleref uri="#sub_thousand"/><tag>out += rules.sub_thousand;</tag> </item> </rule> <rule id="sub_thousand"> <ruleref uri="#sub_hundred"/> <tag>out = (100 * rules.sub_hundred);</tag> hundred <item repeat="0-1"> <item repeat="0-1">and</item> <ruleref uri="#sub_hundred"/><tag>out += rules.sub_hundred;</tag> </item> </rule> <rule id="sub_hundred"> <tag>out = 0;</tag> <one-of> <item>zero</item> <item><ruleref uri="#teens"/><tag>out += rules.teens;</tag></item> <item> <ruleref uri="#tens"/><tag>out += rules.tens;</tag> <item repeat="0-1"> <ruleref uri="#digit"/> <tag>out += rules.digit;</tag> </item> </item> <item><ruleref uri="#digit"/><tag>out += rules.digit;</tag></item> </one-of> </rule> <rule id="tens"> <one-of> <item>twenty<tag>out = 20;</tag></item> <item>thirty<tag>out = 30;</tag></item> <item>forty<tag>out = 40;</tag></item> <item>fifty<tag>out = 50;</tag></item> <item>sixty<tag>out = 60;</tag></item> <item>seventy<tag>out = 70;</tag></item> <item>eighty<tag>out = 80;</tag></item> <item>ninety<tag>out = 90;</tag></item> </one-of> </rule> <rule id="teens"> <one-of> <item>ten<tag>out = 10;</tag></item> <item>eleven<tag>out = 11;</tag></item> <item>twelve<tag>out = 12;</tag></item> <item>thirteen<tag>out = 13;</tag></item> <item>fourteen<tag>out = 14;</tag></item> <item>fifteen<tag>out = 15;</tag></item> <item>sixteen<tag>out = 16;</tag></item> <item>seventeen<tag>out = 17;</tag></item> <item>eighteen<tag>out = 18;</tag></item> <item>nineteen<tag>out = 19;</tag></item> </one-of> </rule> <rule id="digit"> <one-of> <item>one<tag>out = 1;</tag></item> <item>two<tag>out = 2;</tag></item> <item>three<tag>out = 3;</tag></item> <item>four<tag>out = 4;</tag></item> <item>five<tag>out = 5;</tag></item> <item>six<tag>out = 6;</tag></item> <item>seven<tag>out = 7;</tag></item> <item>eight<tag>out = 8;</tag></item> <item>nine<tag>out = 9;</tag></item> </one-of> </rule> </grammar>
#ABNF 1.0 UTF-8; language en; mode voice; tag-format <semantics/1.0>; root $main; $main = $sub_hundred_thousand { out = rules.sub_hundred_thousand; } | $sub_thousand { out = rules.sub_thousand; } | $sub_hundred { out = rules.sub_hundred; }; $sub_hundred_thousand = $sub_hundred { out = (1000 * rules.sub_hundred); } thousand [ [and] $sub_thousand { out += rules.sub_thousand; } ]; $sub_thousand = $sub_hundred { out = (100 * rules.sub_hundred); } hundred [ [and] $sub_hundred { out += rules.sub_hundred; } ]; $sub_hundred = { out = 0; } (zero | $teens { out += rules.teens; } | $tens { out += rules.tens; } [ $digit { out += rules.digit; } ] | $digit { out += rules.digit; }); $tens = twenty { out = 20; } | thirty { out = 30; } | forty { out = 40; } | fifty { out = 50; } | sixty { out = 60; } | seventy { out = 70; } | eighty { out = 80; } | ninety { out = 90; }; $teens = ten { out = 10; } | eleven { out = 11; } | twelve { out = 12; } | thirteen { out = 13; } | fourteen { out = 14; } | fifteen { out = 15; } | sixteen { out = 16; } | seventeen { out = 17; } | eighteen { out = 18; } | nineteen { out = 19; }; $digit = one { out = 1; } | two { out = 2; } | three { out = 3; } | four { out = 4; } | five { out = 5; } | six { out = 6; } | seven { out = 7; } | eight { out = 8; } | nine { out = 9; };
This section is normative.
A Semantic Interpretation Tag (SI Tag) is a Conforming SI Tag if its content matches the syntax as defined in the normative sections in this document.
There is no normative restriction on the size of a SI Tag.
A Conforming Semantic Interpretation Grammar is a stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment where:
semantics/1.0
or
semantics/1.0-literals
.A grammar that contains tags in a format other than specified
by this document or its successors must have a tag format
declaration with a value that is not beginning with the string
semantics/x.y
(where x
and
y
are digits) (see Speech Recognition Grammar
Specification 4.8 Tag Format Declaration [SRGS]).
A Semantic Interpretation Processor is a program that can parse and process Conforming SI Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor).
A Conforming Semantic Interpretation Processor:
A Semantic Interpretation Grammar Processor is a system that can parse and process Conforming Semantic Interpretation Grammars. Specifically, a Semantic Interpretation Grammar Processor is a conforming processor if:
Anyone wishing to state conformance of a Grammar Fragment or Grammar Document with SI Tags (document) to this specification should use the following wording:
This document conforms to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/.Anyone wishing to state conformance of a processor to this specification should use the following wording:
[PROCESSOR] is a Conforming [ (1) ABNF, (2) XML, (3) ABNF and XML ] Semantic Interpretation Grammar Processor according to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/ [with support for XML Transformation].
Make the appropriate substitutions:
This document was written with the participation of members of the W3C Voice Browser Working Group [VBWG]. The following have significantly contributed to writing this specification:
The following is a summary of the major changes since the Candidate Recommendation was published on January 11, 2006, based on input from reviewers and the working group: