Copyright © 2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document describes possible strategies for tokenizing the [XML Path Language (XPath) 2.0] and [XQuery 1.0: An XML Query Language] languages, and is provided as a helpful guide to those who are designing an implementation for these languages, and as background material for the normative EBNF found in the language specifications. In the future this document may be expanded to cover more general parsing strategies.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a public W3C Working Draft for review by W3C Members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
Building a Tokenizer for XPath or XQuery has been defined through the efforts of a joint task force of the XML Query Working Group and the XSL Working Group (both part of the XML Activity). It is designed to be read in conjunction with the following documents: [XML Path Language (XPath) 2.0] and [XQuery 1.0: An XML Query Language].
This is the first public Working Draft of these guidelines. Public comments on this document are invited. Comments should be entered into the last-call issue tracking system for this specification (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/), with “[Parsing]” at the beginning of the subject field.
As of this publication, the Working Group expects to eventually publish this document as a Working Group Note. It is not expected to become a W3C Recommendation, and therefore it has no associated W3C Patent Policy licensing obligations. If this expectation changes, the Working Group will have an opportunity to fulfill the associated patent policy requirements with respect to a future draft.
1 Introduction
1.1 Structure of Parsers
1.2 Lexical Scanning Strategies
1.2.1 Token Granularity
1.3 Parser Strategies
2 Lexical Processing
2.1 Lexical States
2.1.1 XQuery Lexical States
2.1.2 XPath Lexical States
A XPath/XQuery Grammar
A.1 EBNF
A.1.1 XPath EBNF
A.1.2 XQuery EBNF
B References
B.1 Main References
C Glossary (Non-Normative)
The XQuery language is, for the most part, a super-set of the smaller XPath language [1]. XPath is designed to be a simple addressing and expression language, while XQuery includes prologue declarations, function, variable, and extension declarations, XML element construction, and advanced iteration constructs. These languages do not contain reserved keywords, which is to say data references in the form of QNames are designed to
never overlap with keywords in the language. In order to achieve the combination of expression language, directory-style addressing, prologue declarations, type matching, and XML element construction, the meaning of the tokens are extremely context dependent. For instance, the string div
may be an element reference, or may be a division operator. Likewise, the *
might be a wildcard element reference, may be a multiplication operator, or may be a kleene operator.
As illustration, the following is a very unlikely, but legal, XQuery example:
declare namespace namespace = "http://example.com"; declare union <union>for gibberish { for $for in for return <for>***div div</for> }</union>, if(if) then then else else- +-++-**-* instance of element(*)* * * **---++div- div -div
XPath and XQuery parsers may be either top-down or bottom-up. Traditionally, the parsing process is decomposed into subcomponents:
[Definition: Lexical scanner, which assigns symbols known as tokens to units in the input, in essence recognizing the basic words and punctuation of the language. Typically, the lexer uses regular expressions as a specification for how to do this tokenization. The rules that govern the lexical structure of the language, sometimes known as microsyntax, should be simple, regular, and as context-free as possible.]
[Definition: Parser, which arranges the tokens into a hierarchical structure. Typically, the parser uses a BNF or variant description (EBNF in the case of XPath/XQuery) as a specification for this parsing.]
Since the tokenization of XPath/XQuery is so context dependent, this breakdown may not be as simple as with other programming languages. Furthermore, the granularity of tokenization is an issue.
We can speculate on an enumeration of some strategies for implementing a lexical scanner as follows:
[Definition: Scan-while-parse scanner. Using this strategy, the parser would evaluate the character stream into tokens, while at the same time branching to the respective hierarchical context. Thus the words can be discovered in context. Complex parsers are typically not built in this fashion, but, given the issue of context sensitivity, this may be a reasonable approach. This would probably be implemented as a hand-constructed recursive-descent parser. ]
[Definition: Fuzzy token scanner. Using this strategy, a typical simple lexical scanner is implemented, but the meaning of the tokens are often left somewhat ambiguous. Thus, the word div
might be assigned the symbol QNameOrKeyword. The words can be disambiguated during the context sensitive parse process. This approach would probably be a hand-constructed, Top-down parser.]
[Definition: Two-pass scanner. This is the same as the Fuzzy token scanner, but the ambiguous words can be disambiguated by a second scan, using some rules (probably expressed as a context-free grammar) that reflect the basic structure of the EBNF (call this the three-pass approach). The three-pass approach could conceivably be implemented via a parser constructor program, with the addition of a custom disambiguator in between the lexer and parser. This method could be used for either Top-down or Bottom-up techniques.]
[Definition: State-driven scanner. This approach recognizes sets of words only in certain states, each word or punctuation possibly transitioning to a next state. In other words, it implements lexical recognition as transition tables for a push-down automata.]
In XPath/XQuery, the meaning of a word may often be determined by the words and punctuation ahead of it. Thus, the word if
is only the "if" keyword if it is followed by a left-parenthesis (ignoring whitespace). Likewise, the word declare
may be a QName, or may be part of a NamespaceDecl in the prolog, if followed by a namespace
word. Inversely, namespace
is determined to be a keyword verses a QName by it's appearance after declare
. In the
XQuery and XPath specifications, in the EBNF appendix, the notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols.
A scanner might take one of two approaches for assigning token units to the character stream:
[Definition: Short tokens. Using this approach, declare namespace
would be considered two tokens, the exact symbols assigned based on lexical look-ahead (and look-behind).] In this case, a parser that has at least a look-ahead of two tokens is required (in particular, a two-token look-ahead should be sufficient for a LALR parser).
[Definition: Long tokens. Using this approach, declare namespace
would be considered a single token.] In this case, a parser that has a look-ahead of only one token can be implemented.
The general strategies for implementing the parser are as follows:
Recursive-descent parser. For our purposes, this term means a hand-constructed, recursive descent parser.
[Definition: LL Parser (Left to right, Leftmost derivation, hence LL). ] The XPath/XQuery EBNF is expressed as an LL(1) grammar, so an LL parser can use the EBNF from the specifications fairly directly.
[Definition: LALR Parser (Look Ahead Left to right, Rightmost derivation, hence LALR). ] The XPath/XQuery EBNF is expressed as an LL(1) grammar, so an LALR parser must use what is essentially an inverted form of the original EBNF.
Many lexical scanner generators will allow specifications of tokens that are recognized based on some condition. It is possible to condition the XQuery and XPath token recognition in a lexical scanner in terms of a series of states and transitions between those states.
[Definition: A lexical state is a condition where a defined set of patterns are recognized.] The tables below define the complete lexical rules for [XPath/XQuery]. Each two column table corresponds to a lexical state and lists in the first column the tokens that are recognized when in that state. Any token not listed should result in a syntax error when in that state. When a given token is recognized in the given state, the transition to the next state and/or a lexical action is given in the right column. [Definition: A lexical action is an action that occurs as an side-effect of a pattern recognition.] Following are a list of lexical actions used in the tables:
The current state, before the transition, is pushed onto an abstract stack.
A STATE parameter is pushed onto an abstract stack.
The most recently pushed state is removed from the top of the stack, and becomes the current state.
This is a special action that backs up the character stream by n characters. See the description of the OCCURRENCEINDICATOR state for more information.
This simply means that the current state is maintained with no modification.
The lexical states have, in many cases, close connection to the parser productions. However, just because a token is recognized in a certain lexical state, does not mean it will be legal in the current EBNF production.
The XQuery EBNF has some places where whitespace has explicit specification, which is to say, whitespace is not ignored in these states: START_TAG, END_TAG, ELEMENT_CONTENT, XML_COMMENT, PROCESSING_INSTRUCTION, PROCESSING_INSTRUCTION_CONTENT, CDATA_SECTION, QUOT_ATTRIBUTE_CONTENT, and APOS_ATTRIBUTE_CONTENT.
Editorial note | |
The following tables were hand-constructed and have not, at the time of this writing, been exhaustively verified against all possible paths that may be legal in the XQuery and XPath EBNFs, which is to say, it is possible they contain bugs. |
This state is for patterns that occur at the beginning of an expression or subexpression.
Pattern | Transition To State | ||
---|---|---|---|
DecimalLiteral, "..", ".", DoubleLiteral, IntegerLiteral, NotNumber, <NCName ":" "*">, QName, ")", <"*" ":" NCName>, "*", StringLiteral, <"declare" "construction">, <"declare" "default" "order"> |
|
||
<"declare" "default" "collation">, <"declare" "namespace">, <"module" "namespace">, <"declare" "base-uri"> |
|
||
<"declare" "default" "element">, <"declare" "default" "function">, <"import" "schema">, <"import" "module">, <"declare" "copy-namespaces"> |
|
||
"$", <"for" "$">, <"let" "$">, <"some" "$">, <"every" "$"> |
|
||
<"declare" "variable" "$"> |
|
||
<")" "as"> |
|
||
<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "("> |
|
||
<"processing-instruction" "("> |
|
||
"<!--" |
|
||
"<?" |
|
||
"<![CDATA[" |
|
||
"<" |
|
||
<"declare" "boundary-space"> |
|
||
"}" |
|
||
<"validate" "{">, <"validate" ValidationMode> |
|
||
<"typeswitch" "("> |
|
||
<"element" "{">, <"attribute" "{"> |
|
||
<"attribute" QName "{">, <"element" QName "{">, <"document" "{">, <"text" "{">, <"processing-instruction" "{">, <"processing-instruction" NCName "{">, <"comment" "{"> |
|
||
<"declare" "function"> |
|
||
"(:" |
|
||
"{", <"ordered" "{">, <"unordered" "{"> |
|
||
<"declare" "ordering"> |
|
||
<"xquery" "version"> |
|
||
"(#" |
|
||
<"declare" "option"> |
|
||
<"at" URILiteral> |
|
||
";", ",", "(", <QName "(">, <"if" "(">, "-", "+", "//", "/", <"ancestor-or-self" "::">, <"ancestor" "::">, <"attribute" "::">, <"child" "::">, <"descendant-or-self" "::">, <"descendant" "::">, <"following-sibling" "::">, <"following" "::">, <"parent" "::">, <"preceding-sibling" "::">, <"preceding" "::">, <"self" "::">, "@" |
|
Special state to recognize declare ordering specific keywords.
Pattern | Transition To State | |
---|---|---|
"ordered", "unordered" |
|
This state is for patterns that are defined for operators.
Pattern | Transition To State | ||
---|---|---|---|
"{" |
|
||
";", "then", "else", "external", "and", "at", ":=", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "<=", "<<", "<", "-", "mod", "*", "!=", <"order" "by">, <"stable" "order" "by">, "or", "+", "return", "satisfies", "//", "/", "to", "union", "|", "where", ("preserve" | "strip") |
|
||
<"castable" "as">, <"cast" "as"> |
|
||
<"instance" "of">, <"treat" "as">, "case", "as", <")" "as"> |
|
||
"}" |
|
||
"$", <"for" "$">, <"let" "$"> |
|
||
"(:" |
|
||
")", "?", <"empty" "greatest">, <"empty" "least">, "ascending", "descending", "default", "]" |
|
||
"collation" |
|
||
StringLiteral, NotOperatorKeyword |
|
This state is for recognition of XQuery version specific keywords.
Pattern | Transition To State | |
---|---|---|
";" |
|
|
StringLiteral, "encoding" |
|
This state occurs inside of a namespace declaration, and is needed to recognize a NCName that is to be used as the prefix, as opposed to allowing a QName to occur. (Otherwise, the difference between NCName and QName are ambiguous.)
Pattern | Transition To State | ||
---|---|---|---|
URILiteral, ",", <"at" URILiteral> |
|
||
"(:" |
|
||
"=", NCName |
|
||
";" |
|
This state is to recognize a URILiteral that transits to the OPERATOR state.
Pattern | Transition To State | |
---|---|---|
URILiteral |
|
This state occurs at places where the keyword "namespace" is expected, which would otherwise be ambiguous compared to a QName. QNames can not occur in this state.
Pattern | Transition To State | ||
---|---|---|---|
URILiteral |
|
||
"inherit", "no-inherit" |
|
||
"namespace" |
|
||
"(:" |
|
||
<"default" "element">, "preserve", "no-preserve", "," |
|
This state occurs at places where the keywords "preserve" and "strip" is expected to support "declare xmlspace". QNames can not occur in this state.
Pattern | Transition To State | ||
---|---|---|---|
"preserve", "strip" |
|
||
"(:" |
|
This state distinguishes tokens that can occur only inside the SingleType production.
Pattern | Transition To State | ||
---|---|---|---|
QName |
|
||
"(:" |
|
This state distinguishes tokens that can occur only inside the ItemType production.
Pattern | Transition To State | ||
---|---|---|---|
"$" |
|
||
<"void" "(" ")">, QName |
|
||
"(:" |
|
||
<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "("> |
|
||
<"processing-instruction" "("> |
|
||
QName, <"item" "(" ")"> |
|
||
"(#" |
|
||
";", "then", "else" |
|
||
<"at" URILiteral> |
|
||
"external", "and", "at", ":=", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "(", "<=", "<<", "<", "-", "mod", "!=", <"order" "by">, <"stable" "order" "by">, "or", "return", "satisfies", "to", "union", "|", "where" |
|
||
<"castable" "as">, <"cast" "as"> |
|
||
<"instance" "of">, <"treat" "as">, "case", "as", <")" "as"> |
|
This state is for the psuedo-parameters for the KindTest productions.
Pattern | Transition To State | ||
---|---|---|---|
"{" |
|
||
")" |
|
||
"*", QName |
|
||
<"element" "(">, <"schema-element" "("> |
|
||
"(:" |
|
This state is similar to KINDTEST, but recognizes NCNames instead of QNames.
Pattern | Transition To State | ||
---|---|---|---|
")" |
|
||
"(:" |
|
||
NCName, StringLiteral |
|
This state is expecting to close a KINDTEST sequence.
Pattern | Transition To State | ||
---|---|---|---|
")" |
|
||
"," |
|
||
"{" |
|
||
"?" |
|
||
"(:" |
|
This special state is needed to distinguish occurrence indicators that appear in the SequenceType production. For instance, compare "foo instance of baz*" to "baz*foo". In the first case, the "*" is interpreted as an occurrence indicator, and in the second case, it must be interpreted as a multiplication operator. But, when in the OCCURRENCEINDICATOR state, if anything else other than "?", "*", and "+", those symbols must be interpreted in the OPERATOR state. For instance, this would occur with the expression "foo instance of baz and $x", with the operator "and". This backing up of the lexical characters in order to reset the state, is symbolized by the notation "input_stream.backup(1)". NotOccurrenceIndicator is a special symbol for any character that is not an occurrence indicator.
Pattern | Transition To State | ||
---|---|---|---|
[NotOccurrenceIndicator] |
|
||
"?", "*", "+" |
|
||
"(:" |
|
This state is entered in the prolog for an option declaration, and recognizes a QName that transits to a DEFAULT state rather than a OPERATOR state.
Pattern | Transition To State | |
---|---|---|
QName |
|
This state is entered in a a pragma expression, and recognizes a QName that transits to a PRAGMACONTENTS state rather than a OPERATOR state.
Pattern | Transition To State | |
---|---|---|
QName |
|
This state recognizes characters in pragma content, and transits out of this state when a '#)' pattern is recognized.
Pattern | Transition To State | |
---|---|---|
"#)" |
|
|
S, Char |
|
This state differentiates variable names from qualified names. This allows only the pattern of a QName to be recognized when otherwise ambiguities could occur.
Pattern | Transition To State | ||
---|---|---|---|
VarName |
|
||
"(:" |
|
This state allows attributes in the native XML syntax, and marks the beginning of an element construction. Element constructors also push the current state, popping it at the conclusion of an end tag. In the START_TAG state, the string ">" is recognized as a token which is associated with the transition to the original state.
Pattern | Transition To State | |
---|---|---|
"/>" |
|
|
">" |
|
|
'"' |
|
|
"'" |
|
|
"=" |
|
|
S, QName |
|
This state allows XML-like content, without these characters being misinterpreted as expressions. The character "{" marks a transition to the DEFAULT state, i.e. the start of an embedded expression, and the "}" character pops back to the ELEMENT_CONTENT state. To allow curly braces to be used as character content, a double left or right curly brace is interpreted as a single curly brace character. The string "</" is interpreted as the beginning of an end tag, which is associated with a transition to the END_TAG state.
Pattern | Transition To State | ||
---|---|---|---|
"</" |
|
||
"{" |
|
||
"<!--" |
|
||
"<?" |
|
||
"<![CDATA[" |
|
||
"<" |
|
||
ElementContentChar |
|
||
PredefinedEntityRef, CharRef, "{{", "}}" |
|
When the end tag is terminated, the state is popped to the state that was pushed at the start of the corresponding start tag.
Pattern | Transition To State | |
---|---|---|
">" |
|
|
S, QName |
|
The "<--" token marks the beginning of an XML Comment, and the "-->" token marks the end. This allows no special interpretation of other characters in this state.
Pattern | Transition To State | |
---|---|---|
"-->" |
|
|
(Char - '-'), <'-' (Char - '-')> |
|
The "(:" token marks the beginning of an expression Comment, and the ":)" token marks the end. This allows no special interpretation of other characters in this state.
Pattern | Transition To State | ||
---|---|---|---|
":)" |
|
||
"(:" |
|
||
Char |
|
In this state, only patterns that are legal in a processing instruction name are recognized.
Pattern | Transition To State | |
---|---|---|
S |
|
|
"?>" |
|
|
PITarget |
|
In this state, only characters are that are legal in processing instruction content are recognized.
Pattern | Transition To State | |
---|---|---|
"?>" |
|
|
Char |
|
In this state, only lexemes that are legal in a CDATA section are recognized.
Pattern | Transition To State | |
---|---|---|
"]]>" |
|
|
Char |
|
This state allows content legal for attributes. The character "{" marks a transition to the DEFAULT state, i.e. the start of an embedded expression, and the "}" character pops back to the original state. To allow curly braces to be used as character content, a double left or right curly brace is interpreted as a single curly brace character. This state is the same as APOS_ATTRIBUTE_CONTENT, except that apostrophes are allowed without escaping, and an unescaped quote marks the end of the state.
Pattern | Transition To State | ||
---|---|---|---|
'"' |
|
||
"{" |
|
||
EscapeQuot, QuotAttrContentChar |
|
||
PredefinedEntityRef, CharRef, "{{", "}}" |
|
This state is the same as QUOT_ATTRIBUTE_CONTENT, except that quotes are allowed, and an unescaped apostrophe marks the end of the state.
Pattern | Transition To State | ||
---|---|---|---|
"'" |
|
||
"{" |
|
||
EscapeApos, AposAttrContentChar |
|
||
PredefinedEntityRef, CharRef, "{{", "}}" |
|
This state is for patterns that occur at the beginning of an expression or subexpression.
Pattern | Transition To State | ||
---|---|---|---|
DecimalLiteral, "..", ".", DoubleLiteral, IntegerLiteral, NotNumber, <NCName ":" "*">, QName, ")", <"*" ":" NCName>, "*", StringLiteral |
|
||
"$", <"for" "$">, <"some" "$">, <"every" "$"> |
|
||
<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "("> |
|
||
<"processing-instruction" "("> |
|
||
"(:" |
|
||
",", "(", <QName "(">, <"if" "(">, "-", "+", "//", "/", <"ancestor-or-self" "::">, <"ancestor" "::">, <"attribute" "::">, <"child" "::">, <"descendant-or-self" "::">, <"descendant" "::">, <"following-sibling" "::">, <"following" "::">, <"namespace" "::">, <"parent" "::">, <"preceding-sibling" "::">, <"preceding" "::">, <"self" "::">, "@" |
|
This state is for patterns that are defined for operators.
Pattern | Transition To State | ||
---|---|---|---|
"then", "else", "and", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "<=", "<<", "<", "-", "mod", "*", "!=", "or", "+", "return", "satisfies", "//", "/", "to", "union", "|" |
|
||
<"castable" "as">, <"cast" "as"> |
|
||
<"instance" "of">, <"treat" "as"> |
|
||
"$", <"for" "$"> |
|
||
"(:" |
|
||
")", "?", "]" |
|
||
StringLiteral, NotOperatorKeyword |
|
This state distinguishes tokens that can occur only inside the SingleType production.
Pattern | Transition To State | ||
---|---|---|---|
QName |
|
||
"(:" |
|
This state distinguishes tokens that can occur only inside the ItemType production.
Pattern | Transition To State | ||
---|---|---|---|
<"void" "(" ")">, QName |
|
||
"(:" |
|
||
<"element" "(">, <"attribute" "(">, <"schema-element" "(">, <"schema-attribute" "(">, <"comment" "(">, <"text" "(">, <"node" "(">, <"document-node" "("> |
|
||
<"processing-instruction" "("> |
|
||
QName, <"item" "(" ")"> |
|
||
"then", "else" |
|
||
"and", ",", "div", "=", "except", "eq", "ge", "gt", "le", "lt", "ne", ">=", ">>", ">", "idiv", "intersect", "in", "is", "[", "(", "<=", "<<", "<", "-", "mod", "!=", "or", "return", "satisfies", "to", "union", "|" |
|
||
<"castable" "as">, <"cast" "as"> |
|
||
<"instance" "of">, <"treat" "as"> |
|
This state is for the psuedo-parameters for the KindTest productions.
Pattern | Transition To State | ||
---|---|---|---|
")" |
|
||
"*", QName |
|
||
<"element" "(">, <"schema-element" "("> |
|
||
"(:" |
|
This state is similar to KINDTEST, but recognizes NCNames instead of QNames.
Pattern | Transition To State | ||
---|---|---|---|
")" |
|
||
"(:" |
|
||
NCName, StringLiteral |
|
This state is expecting to close a KINDTEST sequence.
Pattern | Transition To State | ||
---|---|---|---|
")" |
|
||
"," |
|
||
"?" |
|
||
"(:" |
|
This special state is needed to distinguish occurrence indicators that appear in the SequenceType production. For instance, compare "foo instance of baz*" to "baz*foo". In the first case, the "*" is interpreted as an occurrence indicator, and in the second case, it must be interpreted as a multiplication operator. But, when in the OCCURRENCEINDICATOR state, if anything else other than "?", "*", and "+", those symbols must be interpreted in the OPERATOR state. For instance, this would occur with the expression "foo instance of baz and $x", with the operator "and". This backing up of the lexical characters in order to reset the state, is symbolized by the notation "input_stream.backup(1)". NotOccurrenceIndicator is a special symbol for any character that is not an occurrence indicator.
Pattern | Transition To State | ||
---|---|---|---|
[NotOccurrenceIndicator] |
|
||
"?", "*", "+" |
|
||
"(:" |
|
This state differentiates variable names from qualified names. This allows only the pattern of a QName to be recognized when otherwise ambiguities could occur.
Pattern | Transition To State | ||
---|---|---|---|
VarName |
|
||
"(:" |
|
The "(:" token marks the beginning of an expression Comment, and the ":)" token marks the end. This allows no special interpretation of other characters in this state.
Pattern | Transition To State | ||
---|---|---|---|
":)" |
|
||
"(:" |
|
||
Char |
|
For convenience, the basic EBNF of the XPath and XQuery grammars are reproduced here. For further details please consult the respective specifications.
The following grammars use the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following minor differences. The notation "< ... >" is used to indicate a grouping of terminals that together may help disambiguate the individual symbols. To help readability, this "< ... >" notation is absent in the EBNF in the main body of this document. For further details concerning the EBNF notation used, see Section A.1 EBNFXQ in [XQuery 1.0: An XML Query Language] or Section A.1 EBNFXP in [XML Path Language (XPath) 2.0].
Comments on grammar productions are between '/*' and '*/' symbols. A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and are explained in Section A.1.1 Grammar NotesXQ in [XQuery 1.0: An XML Query Language] or Section A.1.1 Grammar NotesXP in [XML Path Language (XPath) 2.0]. A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in Section A.2.2 Whitespace RulesXQ in [XQuery 1.0: An XML Query Language] or Section A.2.2 Whitespace RulesXP in [XML Path Language (XPath) 2.0].
[1] | XPath |
::= | Expr |
|
[2] | Expr |
::= | ExprSingle ("," ExprSingle)* |
|
[3] | ExprSingle |
::= | ForExpr |
|
[4] | ForExpr |
::= | SimpleForClause "return" ExprSingle |
|
[5] | SimpleForClause |
::= | <"for" "$"> VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* |
|
[6] | QuantifiedExpr |
::= | (<"some" "$"> | <"every" "$">) VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle |
|
[7] | IfExpr |
::= | <"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle |
|
[8] | OrExpr |
::= | AndExpr ( "or" AndExpr )* |
|
[9] | AndExpr |
::= | ComparisonExpr ( "and" ComparisonExpr )* |
|
[10] | ComparisonExpr |
::= | RangeExpr ( (ValueComp |
|
[11] | RangeExpr |
::= | AdditiveExpr ( "to" AdditiveExpr )? |
|
[12] | AdditiveExpr |
::= | MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )* |
|
[13] | MultiplicativeExpr |
::= | UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )* |
|
[14] | UnionExpr |
::= | IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )* |
|
[15] | IntersectExceptExpr |
::= | InstanceofExpr ( ("intersect" | "except") InstanceofExpr )* |
|
[16] | InstanceofExpr |
::= | TreatExpr ( <"instance" "of"> SequenceType )? |
|
[17] | TreatExpr |
::= | CastableExpr ( <"treat" "as"> SequenceType )? |
|
[18] | CastableExpr |
::= | CastExpr ( <"castable" "as"> SingleType )? |
|
[19] | CastExpr |
::= | UnaryExpr ( <"cast" "as"> SingleType )? |
|
[20] | UnaryExpr |
::= | ("-" | "+")* ValueExpr |
|
[21] | ValueExpr |
::= | PathExpr |
|
[22] | GeneralComp |
::= | "=" | "!=" | "<" | "<=" | ">" | ">=" |
/* gn: ltXP */ |
[23] | ValueComp |
::= | "eq" | "ne" | "lt" | "le" | "gt" | "ge" |
|
[24] | NodeComp |
::= | "is" | "<<" | ">>" |
|
[25] | PathExpr |
::= | ("/" RelativePathExpr?) |
/* gn: leading-lone-slashXP */ |
[26] | RelativePathExpr |
::= | StepExpr (("/" | "//") StepExpr)* |
|
[27] | StepExpr |
::= | AxisStep | FilterExpr |
|
[28] | AxisStep |
::= | (ForwardStep | ReverseStep) PredicateList |
|
[29] | ForwardStep |
::= | (ForwardAxis NodeTest) | AbbrevForwardStep |
|
[30] | ForwardAxis |
::= | <"child" "::"> |
|
[31] | AbbrevForwardStep |
::= | "@"? NodeTest |
|
[32] | ReverseStep |
::= | (ReverseAxis NodeTest) | AbbrevReverseStep |
|
[33] | ReverseAxis |
::= | <"parent" "::"> |
|
[34] | AbbrevReverseStep |
::= | ".." |
|
[35] | NodeTest |
::= | KindTest | NameTest |
|
[36] | NameTest |
::= | QName | Wildcard |
|
[37] | Wildcard |
::= | "*" |
/* ws: explicitXP */ |
[38] | FilterExpr |
::= | PrimaryExpr PredicateList |
|
[39] | PredicateList |
::= | Predicate* |
|
[40] | Predicate |
::= | "[" Expr "]" |
|
[41] | PrimaryExpr |
::= | Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall |
|
[42] | Literal |
::= | NumericLiteral | StringLiteral |
|
[43] | NumericLiteral |
::= | IntegerLiteral | DecimalLiteral | DoubleLiteral |
|
[44] | VarRef |
::= | "$" VarName |
|
[45] | ParenthesizedExpr |
::= | "(" Expr? ")" |
|
[46] | ContextItemExpr |
::= | "." |
|
[47] | FunctionCall |
::= | <QName "("> (ExprSingle ("," ExprSingle)*)? ")" |
/* gn: parensXP */ |
/* gn: reserved-function-namesXP */ | ||||
[48] | SingleType |
::= | AtomicType "?"? |
|
[49] | SequenceType |
::= | (ItemType OccurrenceIndicator?) |
|
[50] | OccurrenceIndicator |
::= | "?" | "*" | "+" |
/* gn: occurrence-indicatorsXP */ |
[51] | ItemType |
::= | AtomicType | KindTest | <"item" "(" ")"> |
|
[52] | AtomicType |
::= | QName |
|
[53] | KindTest |
::= | DocumentTest |
|
[54] | AnyKindTest |
::= | <"node" "("> ")" |
|
[55] | DocumentTest |
::= | <"document-node" "("> (ElementTest | SchemaElementTest)? ")" |
|
[56] | TextTest |
::= | <"text" "("> ")" |
|
[57] | CommentTest |
::= | <"comment" "("> ")" |
|
[58] | PITest |
::= | <"processing-instruction" "("> (NCName | StringLiteral)? ")" |
|
[59] | AttributeTest |
::= | <"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")" |
|
[60] | AttribNameOrWildcard |
::= | AttributeName | "*" |
|
[61] | SchemaAttributeTest |
::= | <"schema-attribute" "("> AttributeDeclaration ")" |
|
[62] | AttributeDeclaration |
::= | AttributeName |
|
[63] | ElementTest |
::= | <"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")" |
|
[64] | ElementNameOrWildcard |
::= | ElementName | "*" |
|
[65] | SchemaElementTest |
::= | <"schema-element" "("> ElementDeclaration ")" |
|
[66] | ElementDeclaration |
::= | ElementName |
|
[67] | AttributeName |
::= | QName |
|
[68] | ElementName |
::= | QName |
|
[69] | TypeName |
::= | QName |
[70] | IntegerLiteral |
::= | Digits |
|
[71] | DecimalLiteral |
::= | ("." Digits) | (Digits "." [0-9]*) |
/* ws: explicitXP */ |
[72] | DoubleLiteral |
::= | (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits |
/* ws: explicitXP */ |
[73] | StringLiteral |
::= | ('"' (('"' '"') | [^"])* '"') | ("'" (("'" "'") | [^'])* "'") |
/* ws: explicitXP */ |
[74] | VarName |
::= | QName |
|
[75] | Digits |
::= | [0-9]+ |
|
[76] | Comment |
::= | "(:" (CommentContents | Comment)* ":)" |
/* ws: explicitXP */ |
/* gn: commentsXP */ | ||||
[77] | CommentContents |
::= | (Char+ - (Char* ':)' Char*)) |
|
[78] | QName |
::= | [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names |
/* gn: xml-versionXP */ |
[79] | NCName |
::= | [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names |
/* gn: xml-versionXP */ |
[80] | Char |
::= | [http://www.w3.org/TR/REC-xml#NT-Char]XML |
/* gn: xml-versionXP */ |
[1] | Module |
::= | VersionDecl? (MainModule | LibraryModule) |
|
[2] | VersionDecl |
::= | <"xquery" "version"> StringLiteral ("encoding" StringLiteral)? Separator |
|
[3] | MainModule |
::= | Prolog QueryBody |
|
[4] | LibraryModule |
::= | ModuleDecl Prolog |
|
[5] | ModuleDecl |
::= | <"module" "namespace"> NCName "=" URILiteral Separator |
|
[6] | Prolog |
::= | ((Setter | Import | NamespaceDecl | DefaultNamespaceDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)* |
|
[7] | Setter |
::= | BoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDecl |
|
[8] | Import |
::= | SchemaImport | ModuleImport |
|
[9] | Separator |
::= | ";" |
|
[10] | NamespaceDecl |
::= | <"declare" "namespace"> NCName "=" URILiteral |
|
[11] | BoundarySpaceDecl |
::= | <"declare" "boundary-space"> ("preserve" | "strip") |
|
[12] | DefaultNamespaceDecl |
::= | (<"declare" "default" "element"> | <"declare" "default" "function">) "namespace" URILiteral |
|
[13] | OptionDecl |
::= | <"declare" "option"> QName StringLiteral |
|
[14] | OrderingModeDecl |
::= | <"declare" "ordering"> ("ordered" | "unordered") |
|
[15] | EmptyOrderDecl |
::= | <"declare" "default" "order"> (<"empty" "greatest"> | <"empty" "least">) |
|
[16] | CopyNamespacesDecl |
::= | <"declare" "copy-namespaces"> PreserveMode "," InheritMode |
|
[17] | PreserveMode |
::= | "preserve" | "no-preserve" |
|
[18] | InheritMode |
::= | "inherit" | "no-inherit" |
|
[19] | DefaultCollationDecl |
::= | <"declare" "default" "collation"> URILiteral |
|
[20] | BaseURIDecl |
::= | <"declare" "base-uri"> URILiteral |
|
[21] | SchemaImport |
::= | <"import" "schema"> SchemaPrefix? URILiteral (<"at" URILiteral> ("," URILiteral)*)? |
|
[22] | SchemaPrefix |
::= | ("namespace" NCName "=") | (<"default" "element"> "namespace") |
|
[23] | ModuleImport |
::= | <"import" "module"> ("namespace" NCName "=")? URILiteral (<"at" URILiteral> ("," URILiteral)*)? |
|
[24] | VarDecl |
::= | <"declare" "variable" "$"> VarName TypeDeclaration? ((":=" ExprSingle) | "external") |
|
[25] | ConstructionDecl |
::= | <"declare" "construction"> ("preserve" | "strip") |
|
[26] | FunctionDecl |
::= | <"declare" "function"> <QName "("> ParamList? (")" | (<")" "as"> SequenceType)) (EnclosedExpr | "external") |
|
[27] | ParamList |
::= | Param ("," Param)* |
|
[28] | Param |
::= | "$" VarName TypeDeclaration? |
|
[29] | EnclosedExpr |
::= | "{" Expr "}" |
|
[30] | QueryBody |
::= | Expr |
|
[31] | Expr |
::= | ExprSingle ("," ExprSingle)* |
|
[32] | ExprSingle |
::= | FLWORExpr |
|
[33] | FLWORExpr |
::= | (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle |
|
[34] | ForClause |
::= | <"for" "$"> VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)* |
|
[35] | PositionalVar |
::= | "at" "$" VarName |
|
[36] | LetClause |
::= | <"let" "$"> VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)* |
|
[37] | WhereClause |
::= | "where" ExprSingle |
|
[38] | OrderByClause |
::= | (<"order" "by"> | <"stable" "order" "by">) OrderSpecList |
|
[39] | OrderSpecList |
::= | OrderSpec ("," OrderSpec)* |
|
[40] | OrderSpec |
::= | ExprSingle OrderModifier |
|
[41] | OrderModifier |
::= | ("ascending" | "descending")? (<"empty" "greatest"> | <"empty" "least">)? ("collation" URILiteral)? |
|
[42] | QuantifiedExpr |
::= | (<"some" "$"> | <"every" "$">) VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle |
|
[43] | TypeswitchExpr |
::= | <"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle |
|
[44] | CaseClause |
::= | "case" ("$" VarName "as")? SequenceType "return" ExprSingle |
|
[45] | IfExpr |
::= | <"if" "("> Expr ")" "then" ExprSingle "else" ExprSingle |
|
[46] | OrExpr |
::= | AndExpr ( "or" AndExpr )* |
|
[47] | AndExpr |
::= | ComparisonExpr ( "and" ComparisonExpr )* |
|
[48] | ComparisonExpr |
::= | RangeExpr ( (ValueComp |
|
[49] | RangeExpr |
::= | AdditiveExpr ( "to" AdditiveExpr )? |
|
[50] | AdditiveExpr |
::= | MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )* |
|
[51] | MultiplicativeExpr |
::= | UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )* |
|
[52] | UnionExpr |
::= | IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )* |
|
[53] | IntersectExceptExpr |
::= | InstanceofExpr ( ("intersect" | "except") InstanceofExpr )* |
|
[54] | InstanceofExpr |
::= | TreatExpr ( <"instance" "of"> SequenceType )? |
|
[55] | TreatExpr |
::= | CastableExpr ( <"treat" "as"> SequenceType )? |
|
[56] | CastableExpr |
::= | CastExpr ( <"castable" "as"> SingleType )? |
|
[57] | CastExpr |
::= | UnaryExpr ( <"cast" "as"> SingleType )? |
|
[58] | UnaryExpr |
::= | ("-" | "+")* ValueExpr |
|
[59] | ValueExpr |
::= | ValidateExpr | PathExpr | ExtensionExpr |
|
[60] | GeneralComp |
::= | "=" | "!=" | "<" | "<=" | ">" | ">=" |
/* gn: ltXQ */ |
[61] | ValueComp |
::= | "eq" | "ne" | "lt" | "le" | "gt" | "ge" |
|
[62] | NodeComp |
::= | "is" | "<<" | ">>" |
|
[63] | ValidateExpr |
::= | (<"validate" "{"> | (<"validate" ValidationMode> "{")) Expr "}" |
|
[64] | ExtensionExpr |
::= | Pragma+ "{" Expr? "}" |
|
[65] | Pragma |
::= | "(#" S? QName PragmaContents "#)" |
/* ws: explicitXQ */ |
[66] | PragmaContents |
::= | (Char* - (Char* '#)' Char*)) |
|
[67] | PathExpr |
::= | ("/" RelativePathExpr?) |
/* gn: leading-lone-slashXQ */ |
[68] | RelativePathExpr |
::= | StepExpr (("/" | "//") StepExpr)* |
|
[69] | StepExpr |
::= | AxisStep | FilterExpr |
|
[70] | AxisStep |
::= | (ForwardStep | ReverseStep) PredicateList |
|
[71] | ForwardStep |
::= | (ForwardAxis NodeTest) | AbbrevForwardStep |
|
[72] | ForwardAxis |
::= | <"child" "::"> |
|
[73] | AbbrevForwardStep |
::= | "@"? NodeTest |
|
[74] | ReverseStep |
::= | (ReverseAxis NodeTest) | AbbrevReverseStep |
|
[75] | ReverseAxis |
::= | <"parent" "::"> |
|
[76] | AbbrevReverseStep |
::= | ".." |
|
[77] | NodeTest |
::= | KindTest | NameTest |
|
[78] | NameTest |
::= | QName | Wildcard |
|
[79] | Wildcard |
::= | "*" |
/* ws: explicitXQ */ |
[80] | FilterExpr |
::= | PrimaryExpr PredicateList |
|
[81] | PredicateList |
::= | Predicate* |
|
[82] | Predicate |
::= | "[" Expr "]" |
|
[83] | PrimaryExpr |
::= | Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | Constructor | OrderedExpr | UnorderedExpr |
|
[84] | Literal |
::= | NumericLiteral | StringLiteral |
|
[85] | NumericLiteral |
::= | IntegerLiteral | DecimalLiteral | DoubleLiteral |
|
[86] | VarRef |
::= | "$" VarName |
|
[87] | ParenthesizedExpr |
::= | "(" Expr? ")" |
|
[88] | ContextItemExpr |
::= | "." |
|
[89] | OrderedExpr |
::= | <"ordered" "{"> Expr "}" |
|
[90] | UnorderedExpr |
::= | <"unordered" "{"> Expr "}" |
|
[91] | FunctionCall |
::= | <QName "("> (ExprSingle ("," ExprSingle)*)? ")" |
/* gn: parensXQ */ |
/* gn: reserved-function-namesXQ */ | ||||
[92] | Constructor |
::= | DirectConstructor |
|
[93] | DirectConstructor |
::= | DirElemConstructor |
|
[94] | DirElemConstructor |
::= | "<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">")) |
/* ws: explicitXQ */ |
/* gn: ltXQ */ | ||||
[95] | DirAttributeList |
::= | (S (QName S? "=" S? DirAttributeValue)?)* |
/* ws: explicitXQ */ |
[96] | DirAttributeValue |
::= | ('"' (EscapeQuot | QuotAttrValueContent)* '"') |
/* ws: explicitXQ */ |
[97] | QuotAttrValueContent |
::= | QuotAttrContentChar |
|
[98] | AposAttrValueContent |
::= | AposAttrContentChar |
|
[99] | DirElemContent |
::= | DirectConstructor |
|
[100] | CommonContent |
::= | PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr |
|
[101] | DirCommentConstructor |
::= | "<!--" DirCommentContents "-->" |
/* ws: explicitXQ */ |
[102] | DirCommentContents |
::= | ((Char - '-') | <'-' (Char - '-')>)* |
/* ws: explicitXQ */ |
[103] | DirPIConstructor |
::= | "<?" PITarget (S DirPIContents)? "?>" |
/* ws: explicitXQ */ |
[104] | DirPIContents |
::= | (Char* - (Char* '?>' Char*)) |
/* ws: explicitXQ */ |
[105] | CDataSection |
::= | "<![CDATA[" CDataSectionContents "]]>" |
/* ws: explicitXQ */ |
[106] | CDataSectionContents |
::= | (Char* - (Char* ']]>' Char*)) |
/* ws: explicitXQ */ |
[107] | ComputedConstructor |
::= | CompDocConstructor |
|
[108] | CompDocConstructor |
::= | <"document" "{"> Expr "}" |
|
[109] | CompElemConstructor |
::= | (<"element" QName "{"> | (<"element" "{"> Expr "}" "{")) ContentExpr? "}" |
|
[110] | ContentExpr |
::= | Expr |
|
[111] | CompAttrConstructor |
::= | (<"attribute" QName "{"> | (<"attribute" "{"> Expr "}" "{")) Expr? "}" |
|
[112] | CompTextConstructor |
::= | <"text" "{"> Expr "}" |
|
[113] | CompCommentConstructor |
::= | <"comment" "{"> Expr "}" |
|
[114] | CompPIConstructor |
::= | (<"processing-instruction" NCName "{"> | (<"processing-instruction" "{"> Expr "}" "{")) Expr? "}" |
|
[115] | SingleType |
::= | AtomicType "?"? |
|
[116] | TypeDeclaration |
::= | "as" SequenceType |
|
[117] | SequenceType |
::= | (ItemType OccurrenceIndicator?) |
|
[118] | OccurrenceIndicator |
::= | "?" | "*" | "+" |
/* gn: occurrence-indicatorsXQ */ |
[119] | ItemType |
::= | AtomicType | KindTest | <"item" "(" ")"> |
|
[120] | AtomicType |
::= | QName |
|
[121] | KindTest |
::= | DocumentTest |
|
[122] | AnyKindTest |
::= | <"node" "("> ")" |
|
[123] | DocumentTest |
::= | <"document-node" "("> (ElementTest | SchemaElementTest)? ")" |
|
[124] | TextTest |
::= | <"text" "("> ")" |
|
[125] | CommentTest |
::= | <"comment" "("> ")" |
|
[126] | PITest |
::= | <"processing-instruction" "("> (NCName | StringLiteral)? ")" |
|
[127] | AttributeTest |
::= | <"attribute" "("> (AttribNameOrWildcard ("," TypeName)?)? ")" |
|
[128] | AttribNameOrWildcard |
::= | AttributeName | "*" |
|
[129] | SchemaAttributeTest |
::= | <"schema-attribute" "("> AttributeDeclaration ")" |
|
[130] | AttributeDeclaration |
::= | AttributeName |
|
[131] | ElementTest |
::= | <"element" "("> (ElementNameOrWildcard ("," TypeName "?"?)?)? ")" |
|
[132] | ElementNameOrWildcard |
::= | ElementName | "*" |
|
[133] | SchemaElementTest |
::= | <"schema-element" "("> ElementDeclaration ")" |
|
[134] | ElementDeclaration |
::= | ElementName |
|
[135] | AttributeName |
::= | QName |
|
[136] | ElementName |
::= | QName |
|
[137] | TypeName |
::= | QName |
[138] | IntegerLiteral |
::= | Digits |
|
[139] | DecimalLiteral |
::= | ("." Digits) | (Digits "." [0-9]*) |
/* ws: explicitXQ */ |
[140] | DoubleLiteral |
::= | (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits |
/* ws: explicitXQ */ |
[141] | URILiteral |
::= | StringLiteral |
|
[142] | StringLiteral |
::= | ('"' (PredefinedEntityRef | CharRef | ('"' '"') | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | ("'" "'") | [^'&])* "'") |
/* ws: explicitXQ */ |
[143] | PITarget |
::= | [http://www.w3.org/TR/REC-xml#NT-PITarget]XML |
/* gn: xml-versionXQ */ |
[144] | VarName |
::= | QName |
|
[145] | ValidationMode |
::= | "lax" | "strict" |
|
[146] | Digits |
::= | [0-9]+ |
|
[147] | PredefinedEntityRef |
::= | "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" |
/* ws: explicitXQ */ |
[148] | CharRef |
::= | [http://www.w3.org/TR/REC-xml#NT-CharRef]XML |
/* gn: xml-versionXQ */ |
[149] | EscapeQuot |
::= | '""' |
|
[150] | EscapeApos |
::= | "''" |
|
[151] | ElementContentChar |
::= | Char - [{}<&] |
|
[152] | QuotAttrContentChar |
::= | Char - ["{}<&] |
|
[153] | AposAttrContentChar |
::= | Char - ['{}<&] |
|
[154] | Comment |
::= | "(:" (CommentContents | Comment)* ":)" |
/* ws: explicitXQ */ |
/* gn: commentsXQ */ | ||||
[155] | CommentContents |
::= | (Char+ - (Char* ':)' Char*)) |
|
[156] | QName |
::= | [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names |
/* gn: xml-versionXQ */ |
[157] | NCName |
::= | [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names |
/* gn: xml-versionXQ */ |
[158] | S |
::= | [http://www.w3.org/TR/REC-xml#NT-S]XML |
/* gn: xml-versionXQ */ |
[159] | Char |
::= | [http://www.w3.org/TR/REC-xml#NT-Char]XML |
/* gn: xml-versionXQ */ |
Fuzzy token scanner. Using this strategy, a typical simple lexical scanner is implemented, but the meaning of the tokens are often left somewhat ambiguous. Thus, the word div
might be assigned the symbol QNameOrKeyword. The words can be disambiguated during the context sensitive parse process. This approach would probably be a hand-constructed, Top-down parser.
LALR Parser (Look Ahead Left to right, Rightmost derivation, hence LALR).
A lexical action is an action that occurs as an side-effect of a pattern recognition.
Lexical scanner, which assigns symbols known as tokens to units in the input, in essence recognizing the basic words and punctuation of the language. Typically, the lexer uses regular expressions as a specification for how to do this tokenization. The rules that govern the lexical structure of the language, sometimes known as microsyntax, should be simple, regular, and as context-free as possible.
A lexical state is a condition where a defined set of patterns are recognized.
LL Parser (Left to right, Leftmost derivation, hence LL).
Long tokens. Using this approach, declare namespace
would be considered a single token.
Parser, which arranges the tokens into a hierarchical structure. Typically, the parser uses a BNF or variant description (EBNF in the case of XPath/XQuery) as a specification for this parsing.
Scan-while-parse scanner. Using this strategy, the parser would evaluate the character stream into tokens, while at the same time branching to the respective hierarchical context. Thus the words can be discovered in context. Complex parsers are typically not built in this fashion, but, given the issue of context sensitivity, this may be a reasonable approach. This would probably be implemented as a hand-constructed recursive-descent parser.
Short tokens. Using this approach, declare namespace
would be considered two tokens, the exact symbols assigned based on lexical look-ahead (and look-behind).
State-driven scanner. This approach recognizes sets of words only in certain states, each word or punctuation possibly transitioning to a next state. In other words, it implements lexical recognition as transition tables for a push-down automata.
Two-pass scanner. This is the same as the Fuzzy token scanner, but the ambiguous words can be disambiguated by a second scan, using some rules (probably expressed as a context-free grammar) that reflect the basic structure of the EBNF (call this the three-pass approach). The three-pass approach could conceivably be implemented via a parser constructor program, with the addition of a custom disambiguator in between the lexer and parser. This method could be used for either Top-down or Bottom-up techniques.
XQuery does not include the ForExpr production and <"namespace" "::"> axis.