The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.
This document is also available in these non-normative formats: XML and Changes since previous Recommendation.
Copyright © 2014 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines the syntax and formal semantics of XQuery and XPath Full Text 3.1, which is a language that extends XQuery 3.1 [XQuery 3.1: An XML Query Language] and XPath 3.1 [XML Path Language (XPath) 3.1] with full-text search capabilities.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is governed by the 14 October 2005 W3C Process Document.
This is a Last Call Working Draft as described in the Process Document. It was jointly developed by the W3C XML Query Working Group and the W3C XSLT Working Group, each of which is part of the XML Activity. Comments on this document will be formally accepted at least through TO BE SPECIFIED. The Working Groups expect to advance this specification to Recommendation Status.
This version of Full Text has to be described in a customized paragraph before publication as a FPWD. The purpose of this First Public Working Draft is to align the grammar of XQuery and XPath Full Text 3.1 with the grammars of [XQuery 3.1: An XML Query Language] and [XML Path Language (XPath) 3.1].
No implementation report currently exists. However, a Test Suite for this document is under development. Implementors are encouraged to run this test suite and report their results. The Test Suite can be found at http://dev.w3.org/cvsweb/2011/xpath-full-text-31-test-suite/.
No substantive changes have been made to this specification since its previous publication as a Working Draft.
Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FT31]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.
Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
1.1 Full-Text Search and XML
1.2 Organization of this document
1.3 A word about namespaces
2 Full-Text Extensions to XQuery and XPath
2.1 Processing Model
2.2 Full-Text Contains Expression
2.2.1 Description
2.2.2 Examples
2.3 Score Variables
2.3.1 Using Weights Within a Scored FTContainsExpr
2.4 Highlight Expression
2.5 Extensions to the Static Context
3 Full-Text Selections
3.1 Primary Full-Text Selections
3.1.1 Weights
3.2 Search Tokens and Phrases
3.3 Cardinality Selection
3.4 Match Options
3.4.1 Language Option
3.4.2 Wildcard Option
3.4.3 Thesaurus Option
3.4.4 Stemming Option
3.4.5 Case Option
3.4.6 Diacritics Option
3.4.7 Stop Word Option
3.4.8 Extension Option
3.5 Logical Full-Text Operators
3.5.1 Or-Selection
3.5.2 And-Selection
3.5.3 Mild-Not Selection
3.5.4 Not-Selection
3.6 Positional Filters
3.6.1 Ordered Selection
3.6.2 Window Selection
3.6.3 Distance Selection
3.6.4 Scope Selection
3.6.5 Anchoring Selection
3.7 Ignore Option
3.8 Extension Selections
4 Semantics
4.1 Tokenization
4.1.1 Examples
4.1.2 Representations of Tokenized Text and Matching
4.2 Evaluation of FTSelections
4.2.1 AllMatches
4.2.1.1 Formal Model
4.2.1.2 Examples
4.2.1.3 XML representation
4.2.2 XML Representation
4.2.3 The evaluate function
4.2.4 FTWords
4.2.5 Match Options Semantics
4.2.5.1 Types
4.2.5.2 High-Level Semantics
4.2.5.3 Formal Semantics Functions
4.2.5.4 FTCaseOption
4.2.5.5 FTDiacriticsOption
4.2.5.6 FTStemOption
4.2.5.7 FTThesaurusOption
4.2.5.8 FTStopWordOption
4.2.5.9 FTLanguageOption
4.2.5.10 FTWildCardOption
4.2.6 Full-Text Operators Semantics
4.2.6.1 FTOr
4.2.6.2 FTAnd
4.2.6.3 FTUnaryNot
4.2.6.4 FTMildNot
4.2.6.5 FTOrder
4.2.6.6 FTScope
4.2.6.7 FTContent
4.2.6.8 FTWindow
4.2.6.9 FTDistance
4.2.6.10 FTTimes
4.3 FTContainsExpr
4.4 Scoring
4.5 Example
5 Conformance
5.1 Minimal Conformance
5.2 Optional Features
5.2.1 FTMildNot Operator
5.2.2 FTUnaryNot Operator
5.2.3 FTUnit and FTBigUnit
5.2.4 FTOrder Operator
5.2.5 FTScope Operator
5.2.6 FTWindow Operator
5.2.7 FTDistance Operator
5.2.8 FTTimes Operator
5.2.9 FTContent Operator
5.2.10 FTCaseOption
5.2.11 FTStopWordOption
5.2.12 FTLanguageOption
5.2.13 FTIgnoreOption
5.2.14 Scoring
5.2.15 Weights
A EBNF for XQuery 3.1 Grammar with Full Text extensions
A.1 Terminal Symbols
B EBNF for XPath 3.1 Grammar with Full-Text extensions
B.1 Terminal Symbols
C Static Context Components
D Error Conditions
E XML Syntax (XQueryX) for XQuery and XPath Full Text 3.1
E.1 XQueryX representation of XQuery and XPath Full Text 3.1
E.2 XQueryX stylesheet for XQuery and XPath Full Text 3.1
E.3 XQueryX for XQuery and XPath Full Text 3.1 example
E.3.1 Example
E.3.1.1 XQuery solution in XQuery and XPath Full Text 3.1 Use Cases:
E.3.1.2 A Solution in Full Text XQueryX:
E.3.1.3 Transformation of Full Text XQueryX Solution into XQuery Full Text
F References
F.1 Normative References
F.2 Non-normative References
G Acknowledgements (Non-Normative)
H Glossary (Non-Normative)
I Checklist of Implementation-Defined Features (Non-Normative)
J Change Log (Non-Normative)
This document defines the language and the formal semantics of XQuery and XPath Full Text 3.1. This language is designed to meet the requirements identified in, and to support the queries in, W3C XQuery and XPath Full Text Requirements and Use Cases [XQuery and XPath Full Text 3.1 Requirements and Use Cases] .
In this document, examples and material labeled as "Note" are provided for explanatory purposes and are not normative.
XQuery and XPath Full Text 3.1 extends the syntax and semantics of XQuery 3.1 and XPath 3.1.
Additionally, this document defines an XML syntax for XQuery and XPath Full Text 3.1. The most recent versions of the two XQueryX XML Schemas and the XQueryX XSLT stylesheet for XQuery and XPath Full Text 3.1 are available at http://www.w3.org/2014/09/xpath-full-text/xpath-full-text-31-xqueryx.xsd, http://www.w3.org/2014/09/xpath-full-text/xpath-full-text-31-xqueryx-ftmatchoption-extensions.xsd, and http://www.w3.org/2014/09/xpath-full-text/xpath-full-text-31-xqueryx.xsl, respectively.
As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 3.1 and XPath 3.1.
XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.
Full-text search is different from substring search in many ways:
A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases version 20.9 ...". A full-text search for the token "lease" will not.
There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as 'mouse'" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens 'XML' and 'Query' allowing up to 3 intervening tokens".
Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.
Note:
As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text 3.1.
[Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.
Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.
Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization operates on the string value of an item; for element nodes this does not include the content of attribute nodes, but for attribute nodes it does. Tokenization is defined more formally in 4.1 Tokenization.
[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]
Note:
Consecutive tokens need not be separated by either punctuation or space, and tokens may overlap.
Note:
In some natural languages, tokens and words can be used interchangeably.
[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]
[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]
Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries; for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization. In the absence of an implementation-defined way to differentiate, element markup (start tags, end tags, and empty-element tags) creates token boundaries.
A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.
Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).
Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).
This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.
Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.
[Definition: Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]
[Definition: Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]
This document is organized as follows. We first present a high level syntax for the XQuery and XPath Full Text 3.1 language along with some examples. Then, we present the syntax and examples of the basic primitives in the XQuery and XPath Full Text 3.1 language. This is followed by the semantics of the XQuery and XPath Full Text 3.1 language. The appendix contains a section that provides an EBNF for the XPath 3.1 Grammar with Full-Text Extensions, an EBNF for XQuery 3.1 Grammar with Full-Text Extensions, acknowledgements and a glossary.
Certain namespace prefixes are predeclared by XQuery 3.1 and, by implication, by this specification, and bound to fixed namespace URIs. These namespace prefixes are as follows:
xml = http://www.w3.org/XML/1998/namespace
xs = http://www.w3.org/2001/XMLSchema
xsi = http://www.w3.org/2001/XMLSchema-instance
fn = http://www.w3.org/2005/xpath-functions
local = http://www.w3.org/2005/xquery-local-functions
In addition to the prefixes in the above list, this document uses the prefix
err
to represent the namespace URI http://www.w3.org/2005/xqt-errors
,
This namespace prefix is not predeclared and its use in this document is not normative.
Error codes that are not defined in this document are defined in other XQuery 3.1 and XPath 3.1
specifications, particularly
[XML Path Language (XPath) 3.1] and [XQuery and XPath Functions and Operators 3.1]
.
Finally, this document uses the prefix fts
to represent a namespace
containing a number of functions used in this document to describe the semantics
of XQuery and XPath Full Text functions. There is no
requirement that these functions be implemented, therefore no URI is associated with that prefix.
XQuery and XPath Full Text 3.1 extends the languages of XQuery 3.1 and XPath 3.1 in three ways. It:
Adds a new expression called FTContainsExpr;
Enhances the syntax of FLWOR expressions in XQuery 3.1 and
for
expressions in XPath 3.1 with optional score
variables; and
Adds static context declarations for full-text match options to the query prolog.
Additionally, it extends the data model and processing models in various ways.
A full-text contains expression (2.2 Full-Text Contains Expression) is composed of several parts:
An XPath 3.1 or XQuery 3.1 expression (RangeExpr StringConcatExpr) that specifies the sequence of items to be searched. [Definition: Those items are called the search context.]
The full-text selection to be applied (3 Full-Text Selections). Full-text selections are, syntactically and semantically, fully composable and contain:
Required:
Tokens and phrases for which a search is performed (3.2 Search Tokens and Phrases).
Optional:
Match options, such as indicators for case sensitivity and stop words (3.4 Match Options);
Boolean full-text operators, that compose a full-text selection from simpler full-text selections (3.5 Logical Full-Text Operators);
Other full-text operators that are constraints on the positions of matches, such as indicators for distance between tokens and for the cardinality of matches (3.6 Positional Filters and 3.3 Cardinality Selection); and
The weighting information. Each individual search term in a full-text selection may be annotated with optional weight information. This information may be used during the evaluation of the full-text selections to calculate scoring, information that quantifies the relevance of the result to the given search criteria.
An optional XPath 3.1 or XQuery 3.1 expression (UnionExpr) that specifies the set of nodes, descendents of the RangeExp StringConcatExpr, whose contents must be ignored for the purpose of determining a match during the search (3.7 Ignore Option).
The results of the evaluation of the full-text selection operators are instances of the AllMatches model, which complements the XQuery Data Model (XDM) for processing full-text queries. An AllMatches instance describes all possible solutions to the full-text query for a given search context item. Each solution is described by a Match instance. A Match instance contains the tokens from the search context that must be included (described using StringInclude instances which model the positive terms) and the tokens from search context item that must be excluded (described using StringExclude instances which model the negative terms). Each negative or positive term is modeled as a tuple: the position of the query token or phrase in the full-text selection, and a TokenInfo structure that describes a set of tokens in the text string which match the query token or phrase.
Figure 1 provides a schematic overview of the XQuery and XPath Full Text 3.1 processing steps that are discussed in detail below. Some of these steps are completely outside the domain of XQuery; in Figure 1, these are depicted outside the black line that represents the boundaries of the language. The diagram only shows the central pieces of the XQuery Processing Model (see Section 2.2 Processing ModelXQ), however zooms in on the Execution Engine where the processing of the full-text extensions takes place. The full-text processing steps are labeled as FTn within the diagram and are referenced within the text.
Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization transforms an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models.
The resulting AllMatches instance obtained by the evaluation of an FTContainsExpr is converted into a Boolean value before being returned to the enclosing XPath or XQuery operation as follows. If at least one member of the disjunction contains only positive terms then value returned is true. If all members of the disjunction contain negative terms the result is false.
Weighting information, in an implementation-dependent fashion, may be used when calculating the scoring information computed and made available by FTContainsExpr to the optional score construct.
Given the components of a given full-text contains expression, the evaluation algorithm will proceed according to the following steps, also referenced in the processing model diagram as steps FTn (see Fig. 1):
Evaluate the search context expression (resulting in the sequence of search context items), the ignore option, if any (resulting in the set of ignored nodes), and any other XQuery/XPath exprssions nested within the full-text contains expression. (FT1)
Tokenize the query string(s). (FT2.1)
For each search context item:
Delete the ignored nodes from the search context item.
Tokenize the result of the previous step. This produces a sequence of tokens. (FT2.2) Note that implementations may (as an optimization) perform tokenization as part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance.
Evaluate the FTSelection against the tokens of the search context. (FT3, FT4)
Convert the topmost AllMatches instances into a Boolean value. (FT5)
The additional scoring information (also part of FT5) that is produced by the evaluation of the full-text contains expression is implementation-dependent and is not specified in this document. The scoring information is made available at the same time the Boolean value is returned.
(A more detailed version of the above procedure appears in Section 4.3 FTContainsExpr.)
Section 3 Full-Text Selections describes the syntax and the informal semantics of full-text operators. Their formal semantics as well as the formal definition of the AllMatches data model are given in Section 4 Semantics.
[Definition: A full-text contains expression is a expression that evaluates a sequence of items against a full-text selection. ]
As a syntactic construct, a full-text contains expression (grammar symbol: FTContainsExpr) behaves like a comparison expression (see Section 3.5.2 General ComparisonsXQ). This grammar rule introduces FTContainsExpr.
[87] | ComparisonExpr | ::= |
FTContainsExpr ( (ValueComp
|
A full-text contains expression may be used anywhere a
ComparisonExpr may be
used. The contains text
operator has higher precedence than
other comparison operators, so the results of contains text
expressions may be compared without enclosing them in parentheses.
[88] | FTContainsExpr | ::= |
StringConcatExpr ( "contains" "text" FTSelection
FTIgnoreOption? )? |
A full-text contains expression returns a Boolean value. It returns true if there is some item returned by the RangeExpr StringConcatExpr that, after tokenization, matches the full-text selection FTSelection. Since tokenization includes tokens derived only from the string values of items, a full-text contains expression searches the text of element nodes and of their descendant elements. The string value of other kinds of nodes, such as attributes and comments, will not be included unless the attribute or comment node itself is the target (RangeExpr StringConcatExpr) of the full-text contains expression. See Section 3 Full-Text Selections for more details. For the purpose of determining a match, certain descendants of nodes (identified by FTIgnoreOption) in the RangeExpr StringConcatExpr may be ignored, as specified in Section 3.7 Ignore Option.
An XQuery and XPath Full Text 3.1 processor SHOULD try to use the information available in xml:lang for processing of collations, as well as the various match options defined in Section 3.4 Match Options.
The following example in XQuery 3.1 Full Text returns the author of
each book with a title containing a token with the same root as
dog
and the token
cat
.
for $b in /books/book where $b/title contains text ("dog" using stemming) ftand "cat" return $b/author
The same example in XPath 3.1 Full Text is written as:
/books/book[title contains text ("dog" using stemming) ftand "cat"]/author
In the next example a ComparisonExpr is combined with an FTContainsExpr
using the logical XQuery operator and
. The query
selects books that have a price of less than 50 and a title which contains
a token with the same root as train
:
/books/book[price < 50 and title contains text ("train" using stemming)]
The following example shows the combination of two contains text
expressions the results of which are compared using the not-equals operator.
The query
selects books where either the title contains the token
dog
and the token cat
and the content
does not contain a token with the same root as train
, or where the
title fails to have one of the matching tokens but the content does:
/books/book[title contains text "dog" ftand "cat" ne content contains text ("train" using stemming)]
Besides specifying a match of a full-text query as a Boolean condition, full-text query applications typically also have the ability to associate scores with the results. [Definition: The score of a full-text query result expresses its relevance to the search conditions.]
XQuery and XPath Full Text 3.1 extends the languages of
XQuery 3.1 and XPath 3.1 further by adding optional
score
variables to the for
and
let
clauses of FLWOR expressions.
The production for the extended for
clause in XQuery 3.1 follows.
[45] | ForClause | ::= | "for" ForBinding ("," ForBinding)* |
[46] | ForBinding | ::= | "$" VarName
TypeDeclaration? AllowingEmpty? PositionalVar? FTScoreVar? "in" ExprSingle
|
[49] | FTScoreVar | ::= | "score" "$" VarName
|
In XPath 3.1, the SimpleForClause is extended similarly.
When a score
variable is present in a for
clause the evaluation of the expression following the in
keyword not only needs to determine the result sequence of the
expression, i.e., the sequence of items which are iteratively
bound to the for
variable. It must also determine in each
iteration the relevance "score" value of the current item
and bind the score
variable to that value.
The scope of a score variable bound in a for or let clause comprises all subexpressions of the containing FLWOR expression that appear after the variable binding. The scope does not include the expression to which the variable is bound. The for and let clauses of a given FLWOR expression may bind the same score variable name more than once. In this case, each new binding occludes the previous one, which becomes inaccessible in the remainder of the FLWOR expression.
The expanded QName of a score variable bound in a for clause must be distinct from both the expanded QName of the variable with which it is associated and the expanded QName of any positional variable with which it is associated [err:XQST0089]XQ30.
The semantics of scoring and how it relates to second-order functions is discussed in Section 4.4 Scoring.
In the following example book
elements are determined that satisfy
the condition [content contains text "web site" ftand "usability" and
.//chapter/title contains text "testing"]
. The scores assigned to the
book
elements are returned.
for $b score $s in /books/book[content contains text "web site" ftand "usability" and .//chapter/title contains text "testing"] return $s
The example above is also a valid example of the XPath 3.1 extension.
Scores are typically used to order results, as in the following, more complete example.
for $b score $s in /books/book[content contains text "web site" ftand "usability"] where $s > 0.5 order by $s descending return <result> <title> {$b//title} </title> <score> {$s} </score> </result>
Note that the score variable gets one score value for each item
in the value of the expression after the in
keyword,
regardless of the number of FTContainsExprs in that expression.
In the following example, two separate full-text contains expressions are
used to select the matching paragraphs. There is still just one score for each
para
returned. The highest scoring paragraphs will be returned
first:
for $p score $s in //book[title contains text "software"]/para[. contains text "usability"] order by $s descending return $p
The following more elaborate example uses multiple score variables to return the matching paragraphs ordered so that those from the highest scoring books precede those from the lowest scoring books, where the highest scoring paragraphs of each book are returned before the lower scoring paragraphs of that book:
for $b score $score1 in //book[title contains text "software"] order by $score1 descending return for $p score $score2 in $b/para[. contains text "usability"] order by $score2 descending return $p
The score
variable is bound to a value which reflects
the relevance of the match criteria in the
full-text selections to the items returned by the respective RangeExprs
StringConcatExprs. The
calculation of relevance is implementation-dependent, but score
evaluation must follow these rules:
Score values are of type xs:double
in the range
[0, 1].
For score values greater than 0, a higher score must imply a higher degree of relevance
Similarly to their use in a for
clause, score variables
may be specified in a let
clause. A score variable in a
let
clause is also bound to the score of the expression
evaluation, but in the let
clause one score is determined
for the complete result.
The production for the extended let
clause follows.
[50] | LetClause | ::= | "let" LetBinding ("," LetBinding)* |
[51] | LetBinding | ::= | (("$" VarName
TypeDeclaration?) | FTScoreVar) ":=" ExprSingle
|
When using the score option in a for
clause the
expression following the in
keyword has the dual purpose
of filtering, i.e., driving the iteration, and determining the scores.
It is possible to separately specify expressions for filtering and
scoring by combining a simple for
clause with a
let
clause that uses scoring. The following is
an example of this.
for $b in /books/book[.//chapter/title contains text "testing"] let score $s := $b/content contains text "web site" ftand "usability" order by $s descending return <result score="{$s}">{$b}</result>
This example returns book
elements with chapter titles that contain "testing".
Along with the book
elements scores are returned.
These scores, however, reflect whether the book content contains "web site" and "usability".
Note that it is not a requirement of the score of an
FTContainsExpr to be 0, if the expression evaluates to false, nor to
be non-zero, if the expression evaluates to true.
Hence, in the example above it is not possible to infer the Boolean
value of the FTContainsExpr in the let
clause from the
calculated score of a returned result
element. For instance, an
implementation may want to assign a non-zero score to a book that
contained "web site", but not "usability", as this may be
considered more relevant than a book that does not contain "web site" or "usability".
The expression ExprSingle associated with the score variable is passed to the scoring algorithm. The scoring algorithm calculates the score value based on the passed expression (not on the value returned by evaluating the expression). The set of expressions supported by the scoring algorithm is implementation-defined. If an expression not supported by the scoring algorithm is passed to the scoring algorithm, the result is implementation-defined.
The use of score
variables introduces a second-order
aspect to the evaluation of expressions which cannot be emulated by
(first-order) XQuery functions. Consider the following replacement of
the clause let score $s := FTContainsExpr
let $s := score(FTContainsExpr)
where a function score
is applied to some
FTContainsExpr. If the function score
were first-order, it
would only be applied to the result of the evaluation of
its argument, which is one of the Boolean constants true
or false
. Hence, there would be at most two possible
values such a score
function would be able to return and
no further differentiation would be possible.
[Definition: Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.] Weight declarations are introduced syntactically in the FTPrimaryWithOptions production, described in Section 3.1.1 Weights.
The weights assigned are not related to any absolute standard, but typically have a relationship to other weights within the same FTContains expression.
The effect of weights on the resulting score is implementation-dependent. However, scoring algorithms MUST conform to the constraint that when no explicit weight is specified, the default weight is 1.0.
The following example illustrates how different weights can be used for different search terms.
for $b in /books/book let score $s := $b/content contains text ("web site" weight {0.5}) ftand ("usability" weight {2}) return <result score="{$s}">{$b}</result>
Full text search applications typically need to display a content fragment with the highlighted search terms in the search results. This allows user to peek into the content of the document to quickly judge the relevance of the result. Applications may also need to display the entire document with all the search terms highlighted, and permit user to browse all the highlighted terms forwards and backwards.
XQuery and XPath Full Text 3.1 supports highlight by introducing the FtHighlightsExpr.
# place holder FtHighlightsExpr ::= StringConcatExpr "highlights" "text" FTSelection FTIgnoreOption?
A full text highlight expression MUST return a highlight variable with the following structure.
map { "offsets" : map { 4 : 3, 9 : 4 }, "summary" : ("text", "term", "text", "term") }
"offsets" is a map of the character offsets of the highlighted terms. The character offset is the offset in the string value returned by the StringConcatExpr after 3.7 Ignore Option is applied. Refer to the section Tokenization for a more detailed definition of "the string value".
The keys of offsets map are the start character positions of the highlighted terms. The values are the character length of the highlighted terms. The offsets MUST NOT contain overlapping ranges.
"summary" is a sequence of strings where every string at the even position is a highlighted search term. The combined string value of the "summary" sequence is a snippet of the entire searched string content. The intent of the summary field is to allow application quickly construct a highlighted summary along with the search results.
The size and computation of the summary fragment with the highlighted search terms is implementation-dependent. The implementation SHOULD consider both weight and frequency of the search terms. The summary MAY be an empty sequence if the implementation chooses to not calculate the summary.
The evaluation of FtHighlightsExpr conforms to the same rules as specified in the Evaluation of FTSelections. The "offsets" information may be obtained by applying the following steps.
Introduce character offset information to the TokenInfo structure.
Perform match as detailed in the Evaluation of FTSelections.
Extract the character offset information from all the TokenInfos contained in the final AllMatches result.
The following example illustrates the basic usage of the FtHighlightsExpr.
for $b score $s in /books/book[content contains text ("web site") with stemming] let $highlight := $b/content highlights text ("web site") with stemming return ($b, $highlight)
This example demonstrates how to construct a html fragment from the summary.
for $b score $s in /books/book[content contains text ("web site") with stemming] let $highlight := $b/content highlights text ("web site") with stemming let $summary := <summary>{ for $token at $i in $highlight?summary return if ($i mod 2 = 0) then <strong>{$token}</strong> else $token }</summary> return <result>{$b}{$summary}></result>
The following is a more elaborate XQuery example showing how to construct a summary using the offsets information based solely on the term frequencies.
(: $size is the desired size of summary in characters, the function attempts to find the most densely highlighted section of no more than $size number of characters in the content :) declare function local:summary($offsets as map(*), $content as xs:string, $size as xs:integer) as xs:string* { let $sections := for sliding window $w in (for $x in map:keys($offsets) order by $x return $x) start $s when fn:true() end $e when $e - $s le $size return array { $w } let $most-dense-section := subsequence( for $section in $sections order by array:size($section) descending return $section ,1,1) let $summary := for sliding window $w in $most-dense-section?* start $offset at $s when fn:true() end $next-offset at $e when $e - $s eq 1 let $end := $offset + $offsets($offset) return if (fn:count($w) eq 2) then (fn:substring($content, $offset, $offsets($offset)), fn:substring($content, $end, $next-offset - $end)) else fn:substring($content, $offset, $offsets($offset)) return ("", $summary) (: compensate for highlighted term at even position :) }; for $b in /books/book[content contains text ("web site") with stemming] let $highlight := $b/content highlights text ("web site") with stemming let $summary-tokens := local:summary($highlight?offsets, fn:string($b/content), 1024) let $summary := <summary>{ for $token at $i in $summary-tokens return if ($i mod 2 = 0) then <strong>{$token}</strong> else $token }</summary> return <result>{$b}{$summary}></result>
The XQuery Static Context is extended with a component for each full-text match option group. The settings of these components can be changed by using the following declaration syntax in the Prolog.
[6] | Prolog | ::= | ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((ContextItemDecl | AnnotatedDecl | OptionDecl) Separator)* |
[26] | FTOptionDecl | ::= | "declare" "ft-option" FTMatchOptions
|
Match options modify the match semantics of full-text expressions. They are described in detail in Section 3.4 Match Options. When a match option is specified explicitly in a full-text expression, it overrides the setting of the respective component in the static context.
This section describes the full-text selections which contain the full-text operators in a full-text contains expression (FTContainsExpr), as well as the match options which modify the matching semantics of the full-text selections. In the following, the syntax for each type of full-text selection is given together with an informal statement of its meaning.
[Definition: A full-text selection specifies the conditions of a full-text search. ]
[217] | FTSelection | ::= |
FTOr
FTPosFilter* |
As shown in the grammar, a full-text selection consists of search conditions possibly involving logical operators (FTOr), followed by an arbitrary number of positional filters (FTPosFilter).
The syntax and semantics of the individual full-text selection operators follow.
This XML document is the source document for examples in this section.
<books> <book number="1"> <title shortTitle="Improving Web Site Usability">Improving the Usability of a Web Site Through Expert Reviews and Usability Testing</title> <author>Millicent Marigold</author> <author>Montana Marigold</author> <editor>Véra Tudor-Medina</editor> <content> <p>The usability of a Web site is how well the site supports the users in achieving specified goals. A Web site should facilitate learning, and enable efficient and effective task completion, while propagating few errors. </p> <note>This book has been approved by the Web Site Users Association. </note> </content> </book> </books>
Tokenization is implementation-defined. A sample tokenization is
used for the examples in this section.
This sample tokenization uses white space, punctuation and XML tags as word-breakers, periods followed by a space as sentence boundaries, and
<p>
for paragraph boundaries. The first sentence and paragraph start at the beginning of the document, and the last sentence and paragraph end at the end of the document.
The results may be different
for other tokenizations.
The first five tokens in this example using the sample tokenization would be "Improving", "the", "usability", "of", and "a".
Unless stated otherwise, the results assume a case-insensitive match.
[224] | FTPrimary | ::= | (FTWords
FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection
|
[Definition: A primary full-text selection is the basic form of a full-text selection. It specifies tokens and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses and the FTExtensionSelection are also a primary full-text selections.]
[223] | FTPrimaryWithOptions | ::= |
FTPrimary
FTMatchOptions? FTWeight? |
[218] | FTWeight | ::= | "weight" "{" Expr "}" |
As shown in the grammar, a full-text primary selection
may be optionally followed by match options (which are discussed in
3.4 Match Options) and
by a "weight" value that is specified using an expression enclosed in braces.
The Expr is evaluated as if it were an argument to a function
with an expected type xs:double
.
The weight MUST have an absolute value between 0.0 and 1000.0 inclusive.
If the absolute value of the weight is greater than 1000.0, an
error is raised: [err:FTDY0016].
Note:
As a consequence of the flexibility given to implementations under Section 2.3.4 Errors and OptimizationXQ, it is possible that evaluation of weight declarations in an FTContainsExpr for which no scores are evaluated may be skipped by the implementation and errors with them may go unreported.
[225] | FTWords | ::= |
FTWordsValue
FTAnyallOption? |
[226] | FTWordsValue | ::= |
StringLiteral | ("{" Expr "}") |
[228] | FTAnyallOption | ::= | ("any" "word"?) | ("all" "words"?) | "phrase" |
FTWords finds matches that contain the specified tokens and phrases.
FTWords consists of two parts: a mandatory FTWordsValue part and an optional FTAnyallOption part. FTWordsValue specifies the tokens and phrases that must be contained in the matches. FTAnyallOption specifies how containment is checked.
In general, the tokens and phrases in FTWordsValue are specified using a nested XQuery expression. To simplify notation, the enclosing braces may be omitted if FTWordsValue consists of a single string literal.
The following rules specify how an FTWordsValue
matches tokens and phrases. First, the
FTWordsValue is converted to a sequence of
strings as though it were an argument to a function with the expected
type of xs:string*
.
If the sequence is empty, the FTWords yields no matches.
Otherwise, each of those strings is tokenized into a
sequence of tokens as
described in Section 4.1 Tokenization.
Then, FTAnyallOption is checked.
If FTAnyallOption is "any", the sequence of tokens for each string is considered as a phrase. If the sequence of tokens is empty, then the phrase contributes nothing to the set of matches for the FTWords. Otherwise, a match is found in the tokenized form of the text being searched, whenever that form contains a subsequence of tokens that corresponds to the sequence of query tokens in an implementation-defined way and that subsequence of tokens covers consecutive token positions in the tokenized text. If the value of the FTWordsValue contains more than one string, the different strings are considered to be alternatives, i.e., the search context must contain at least one of the generated phrases. Each resulting match will contain exactly one such phrase.
If FTAnyallOption is "all", the sequence of tokens for each string is considered as a phrase. If any such sequence of tokens is empty, the FTWords yields no matches. The resulting matches must contain all of the generated phrases.
If FTAnyallOption is "phrase", the tokens from all the strings are concatenated in a single sequence, which is considered as a phrase. If the sequence of tokens is empty, the FTWords yields no matches. The resulting matches must contain the generated phrase.
If FTAnyallOption is "any word", the tokens from all the strings are combined into a single set. If the set is empty, the FTWords yields no matches. The search context must contain at least one of the tokens in the set. Each resulting match will contain exactly one such token.
If FTAnyallOption is "all words", the tokens from all the strings are combined into a single set. If the set is empty, the FTWords yields no matches. The resulting matches must contain all of the tokens in the set.
If the FTWordsValue evaluates to a single string, the use of "any", "all", and "phrase" in FTAnyallOption produces the same results.
If FTAnyallOption is omitted, "any" is the default.
The following expression returns the sample book
element,
because its title
element contains the token "Expert":
//book[./title contains text "Expert"]
The following expression returns the sample book
element,
because its title
element contains the phrase "Expert Reviews":
//book[./title contains text "Expert Reviews"]
The following expression returns the sample book
element,
because its title
element contains the two tokens "Expert" and "Reviews":
//book[./title contains text {"Expert", "Reviews"} all]
The following expression returns false for our sample document, because
the p
element doesn't
contain the phrase "Web Site Usability" although it contains all of the tokens
in the phrase:
//book//p contains text "Web Site Usability"
The following expression returns book numbers of book
elements by
"Marigold" with a title about "Web Site Usability", sorting them in descending
score order:
for $book in /books/book[.//author contains text "Marigold"] let score $score := $book/title/@shortTitle contains text "Web Site Usability" where $score > 0.8 order by $score descending return $book/@number
[229] | FTTimes | ::= | "occurs" FTRange "times" |
[Definition: A cardinality selection consist of an FTWords followed by the FTTimes postfix operator.] A cardinality selection selects matches for which the operand FTWords is matched a specified number of times.
A cardinality selection limits the number of different matches of FTWords within the specified range. The semantics of FTRange are described in 3.6.3 Distance Selection.
In the document fragment "very very big":
The FTWords
"very big"
has 1
match consisting of the second "very" and "big".
The FTWords
{"very", "big"} all
has 2 matches; one consisting of the first "very" and "big", and
the other containing the second "very" and "big".
The FTWords
{"very", "big"} any
has 3 matches.
The following expression returns the example book
element's
number, because the book
element contains 2 or more occurrences
of "usability":
//book[. contains text "usability" occurs at least 2 times]/@number
The following expression returns the empty sequence, because there are
3 occurrences of {"usability", "testing"} any
in the designated
title
:
//book[@number="1" and title contains text {"usability", "testing"} any occurs at most 2 times]
Full-text match options modify the matching behaviour of the primary full-text selection to which they are applied.
[223] | FTPrimaryWithOptions | ::= |
FTPrimary
FTMatchOptions? FTWeight? |
[239] | FTMatchOptions | ::= | ("using" FTMatchOption)+ |
[240] | FTMatchOption | ::= |
FTLanguageOption
|
[Definition: Match options modify the set of tokens in the query, or how they are matched against tokens in the text.]
[Definition: Each of the alternatives of production FTMatchOption other than FTExtensionOption corresponds to one match option group. ] The match options from any given group are mutually exclusive, i.e., only one of these settings can be in effect, whereas match options of different groups can be combined freely.
It is a static error [err:FTST0019] if, within a single FTMatchOptions, there is more than one match option of any given match option group. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.
Although match options only take effect in the application of
FTWords, the syntax also allows to specify
match options that modify the non-primitive full-text selection
"(" FTSelection ")"
. Such a higher-level match option
provides a default for the respective match option group for any
embedded FTPrimary, just as
match option declarations
in the Prolog
provide default match options for the whole query.
Match options are propagated through the query via the static context.
For each of the seven match option groups,
the static context has a component
that contains one option from that group.
The seven settings are initialized by the implementation
in accordance with the table in
Appendix C Static Context Components,
and are modified
by any FTOptionDecls
in the Prolog.
The resulting settings are then propagated unchanged
to every FTContainsExpr in the module
(including those in VarDecl
s and FunctionDecl
s,
and including any that happen to be nested within
another FTContainsExpr
).
At any given FTContainsExpr
,
the settings from the static context
are copied to the FTContainsExpr
's inner settings,
which are then propagated down the syntax tree.
At each FTPrimaryWithOptions,
the locally specified match options (if any)
overwrite the corresponding inner setting(s).
At each FTWords,
the inner settings are used
as the effective match options
for tokenizing the query strings
and matching them against the tokens in the text.
(These inner settings could be seen
as a parallel set of components in the static context,
but Section 4 Semantics models them
as structures that get passed as parameters
to various semantic functions.)
Thus, when a match option appears in an FTSelection,
it applies to the associated FTPrimary,
but not to any FTContainsExpr
s
that happen to be embedded within that FTPrimary
.
Instead, for a nested FTContainsExpr
,
the default match options are those declared in the Prolog
or, if not declared in the Prolog
,
then supplied by the implementation's initial values.
An FTMatchOption applies to the FTPrimary that immediately precedes it. That FTPrimary is either an FTWords (possibly qualified by an FTTimes), an FTExtensionSelection, or a parenthesized FTSelection.
[Definition: The order in which effective match options for an FTWords are applied is called the match option application order.] This order is significant because match options are not always commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)).
The match option application order is subject to some constraints:
The Language Option must be applied first
The Stemming Option must be applied before the Case Option and the Diacritics Option
Aside from these constraints, the full order of the application of match options is implementation-defined.
More information on their semantics is given in 4.2.5 Match Options Semantics.
If no match options declarations are present in the prolog and the implementation does not define any overwriting of the static context components for the match options, the query:
/books/book/title contains text "usability"
is, assuming "de" is the implementation-defined default language, equivalent to the query:
/books/book/title contains text "usability" using language "de" using no wildcards using no thesaurus using no stemming using case insensitive using diacritics insensitive using no stop words
We describe each match option group in more detail in the following sections.
[250] | FTLanguageOption | ::= | "language" StringLiteral
|
[Definition: A language option modifies token matching by specifying the language of search tokens and phrases.]
The StringLiteral following the keyword language
designates one language. It must be castable to xs:language
; otherwise, an
error is raised: [err:XPTY0004]XP30.
The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. The "language" option MAY influence the behavior of other match options in an implementation-defined way.
The set of standardized language identifiers is defined in [BCP 47]. The set of valid language identifiers among the standardized set is implementation-defined. An implementation MAY choose to use private extensions introduced by a singleton 'x' for additional language identifiers, or other singletons for registered extensions as described in sec. 2.2.6 of [BCP 47]. It is implementation-defined what additional language identifiers, if any, are valid. If an invalid language identifier is specified, then the behavior is implementation-defined. If the implementation chooses to raise an error in that case, it must raise [err:FTST0009]. An implementation MUST treat language identifiers that [BCP 47] defines as equivalent as identifying the same language. For example "mn" and "MN" are equivalent, as language tags are case insensitive, and "de" and "deu" are equivalent, as they are different codes for the same language. However, it is implementation-defined whether an implementation treats a particular language identifier with script, region, or variant portions as equivalent to the language identifier without them. For example, an implementation may treat "en-UK" as equivalent "en" and "en-US" but "sr-Latn" as different from "sr" and "sr-Cyrl".
The default language is specified in the static context.
When an XQuery and XPath Full Text processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language from the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.
This is an example where the language option is used to select the appropriate stop word list:
//book[@number="1"]/content//p contains text "salon de thé" using stop words default using language "fr"
[251] | FTWildCardOption | ::= | "wildcards" | ("no" "wildcards") |
[Definition: A wildcard option modifies token and phrase matching by specifying whether or not wildcards are recognized in query strings.]
When the "wildcards" option is used, wildcard syntax may be included within query strings. A wildcard consists of an indicator (a period or full stop, "."), optionally followed by a qualifier. Each wildcard in a query token will match zero or more characters within a token in the text being searched, as described below. The number of characters that can be matched depends on the qualifier. The forms of wildcard syntax specified by this document are:
A single period, without any qualifiers: Matches a single arbitrary character.
A period immediately followed by a single question mark, "?": Matches either no characters or one character.
A period immediately followed by a single asterisk, "*": Matches zero or more characters.
A period immediately followed by a single plus sign, "+": Matches one or more characters.
A period immediately followed by a sequence of characters
that matches the regular expression {[0-9]+,[0-9]+}
:
Matches a number of characters, where the number is
no less than
the number represented by the series of digits before the comma,
and
no greater than
the number represented by the series of digits following the comma.
If a period in the query string is immediately followed by a left curly brace, but the subsequent characters do not conform to the given regular expression, then an error is raised: [err:FTDY0020].
A question mark, asterisk, plus sign, or left curly brace that is not immediately preceded by a period is not treated as a qualifier. For example, using the sample tokenization and "wildcards", the query string "wil+" does not match the search text "will" or "willlllll", but only matches the search text "wil". (The sample tokenization treats the plus sign as punctuation.)
When "wildcards" is used, any character in a query string can be "escaped" by immediately preceding it with a backslash, "\". That is, a backslash immediately followed by any character represents that character literally, preventing any special interpretation that the "wildcards" option might otherwise attach to it. In particular:
Escaping a period prevents its interpretation as a wildcard.
Escaping a question mark, asterisk, plus sign, or left curly brace ensures that it is not interpreted as a qualifier.
An escaped backslash ("\\") represents a literal backslash.
If a query string is terminated by an unescaped backslash, an error is raised: [err:FTDY0020].
Note:
A query string of the form "abc\"xyz"
does not represent
the three characters "abc"
followed by a literal double-quote
followed by the three characters "xyz".
Instead, this is a malformed StringLiteral,
and the processor will report a syntax error
[err:XPST0003]XP30.
When the "no wildcards" option is used, no wildcards are recognized in query strings. Periods, question marks, asterisks, plus signs, left curly braces, and backslashes are always recognized as ordinary text characters.
The default is "no wildcards".
The following expression returns true, because the p
element
contains "well":
//book[@number="1"]/p contains text "w.ll" using wildcards
The following expression returns true, because the title
element
contains "site":
//book[@number="1"]/title contains text ".?site" using wildcards
The following expression returns true, because the title
element
contains "improving":
//book[@number="1"]/title contains text "improv.*" using wildcards
The following expression raises error [err:FTDY0020], because the query string uses incorrect syntax:
//book[@number="1"]/p contains text "wi.{5,7]" using wildcards
The following expression returns true, because the title contains "site":
//book[@number="1"]/title contains text "\s\i\t\e" using wildcards
The following expression returns true, because the title contains "Usability":
//book[@number="1"]/title contains text "Usab.+\\" using wildcards
(Note that "\\" represents a literal backslash, which the sample tokenization treats as punctuation.)
The following expression raises error [err:FTDY0020], because the query string ends with an unescaped backslash:
//book[@number="1"]/p contains text "will\" using wildcards
The following expression returns false, because the p
element
does not contain the phrase "w ll":
//book[@number="1"]/p contains text "w.ll" using no wildcards
(Note that, without wildcards, the sample tokenization will treat the period in "w.ll" as punctuation, thus producing "w" and "ll" as separate tokens.)
[244] | FTThesaurusOption | ::= | ("thesaurus" (FTThesaurusID | "default")) |
[245] | FTThesaurusID | ::= | "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")? |
[216] | URILiteral | ::= |
StringLiteral
|
[246] | FTLiteralRange | ::= | ("exactly" IntegerLiteral) |
[Definition: A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not.] If thesauri are used, the thesaurus option specifies information to locate the thesauri either by default or through a URI reference. It also states the relationship to be applied and how many levels within the thesaurus to be traversed.
If the thesaurus option specifies a thesaurus with a relative URI, that relative URI is resolved to an absolute URI using the base URI in the static context and that absolute URI is used to identify the thesaurus.
If the URI specifies a thesaurus that is not found in the statically known thesauri, an error is raised [err:FTST0018].
Thesauri add related tokens and phrases to the query or change query tokens. Thus, the user may narrow, broaden, or otherwise modify the query using synonyms, hypernyms (more generic terms), etc. The search is performed as though the user has specified all related query tokens and phrases in a disjunction (FTOr).
Note:
A thesaurus may be standards-based or locally-defined. It may be a traditional thesaurus, or a taxonomy, soundex, ontology, or topic map. How the thesaurus is represented is implementation-dependent.
An FTThesaurusID may optionally contain a StringLiteral to specify the relationship sought between tokens and phrases written in the query and terms in the thesaurus. Relationships include, but are not limited to, the relationships and their abbreviations presented in [ISO 2788] and their equivalents in other languages. The set of relationships supported by an implementation is implementation-defined, but implementations SHOULD support the relationships defined in [ISO 2788]. The following list of terms have the meanings defined in [ISO 2788]. If a query specifies thesaurus relationships not supported by the thesaurus, or does not specify a relationship, the behavior is implementation-defined.
equivalence relationships (synonyms): PREFERRED TERM (USE), NONPREFERRED USED FOR TERM (UF);
hierarchical relationships: BROADER TERM (BT), NARROWER TERM (NT), BROADER TERM GENERIC (BTG), NARROWER TERM GENERIC (NTG), BROADER TERM PARTITIVE (BTP), NARROWER TERM PARTITIVE (NTP), TOP Terms (TT); and
associative relationships: RELATED TERM (RT).
An FTThesaurusID may also optionally include an FTLiteralRange to specify the number of levels to be queried in hierarchical relationships. An FTLiteralRange is a constrained form of FTRange, and specifies a (possibly empty) range of integer values according to the same rules.
Note:
For historical reasons, an implementation MAY allow an FTLiteralRange to have subexpressions more general than IntegerLiterals, and MAY even allow its subexpressions to be dynamically evaluated.
The effect of specifying a particular range of levels in an FTThesaurusID is implementation-defined. This includes cases involving empty ranges, negative levels, or levels not supported by the thesaurus.
If no levels are specified, the default is to query all levels in hierarchical relationships or to query an implementation-defined number of levels in hierarchical relationships.
The "thesaurus" option specifies that string matches include tokens that can be found in one of the specified thesauri. When "default" is used in place of a FTThesaurusID, the thesauri specified in the static context are used, which are either given by the prolog declaration for the thesaurus option, or, if no such declaration exists a system-defined default thesaurus with a system-defined relationship. The default thesaurus may be used in combination with other explicitly specified thesauri.
The "no thesaurus" option specifies that no thesaurus will be used.
The default is "no thesaurus".
The following expression returns true, because it finds a content
element containing "task" which the thesaurus identified as a synonym for
"duty":
.//book/content contains text "duty" using thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "UF"
The following expression returns a book
element, because it finds a
content
element containing "users", which is a
narrower term of "people":
doc("http://bstore1.example.com/full-text.xml") /books/book[./content contains text "people" using thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "NT" at most 2 levels]
Assuming the thesaurus available at URL
"http://bstore1.example.com/UsabilitySoundex.xml"
contains soundex capabilities, the following query
returns a book
element containing "Marigold" which
sounds like "Merrygould":
doc("http://bstore1.example.com/full-text.xml") /books/book[. contains text "Merrygould" using thesaurus at "http://bstore1.example.com/UsabilitySoundex.xml" relationship "sounds like"]
[243] | FTStemOption | ::= | "stemming" | ("no" "stemming") |
[Definition: A stemming option modifies token and phrase matching by specifying whether stemming is applied or not. ]
The "stemming" option specifies that matches may contain tokens that have the same stem as the tokens and phrases written in the query. It is implementation-defined what a stem of a token is.
The "no stemming" option specifies that the tokens and phrases are not stemmed.
It is implementation-defined whether the stemming is based on an algorithm, dictionary, or mixed approach.
The default is "no stemming".
The following expression returns true, because the title
of the specified
book
contains "improving" which has the same stem as
"improve":
/books/book[@number="1"]/title contains text "improve" using stemming
[241] | FTCaseOption | ::= | ("case" "insensitive") |
[Definition: A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.]
There are four possible character case options:
Using the option "case insensitive", tokens and phrases are matched, regardless of the case of characters of the query tokens and phrases.
Using the option "case sensitive", tokens and phrases are matched, if and only if the case of their characters is the same as written in the query.
Using the option "lowercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only lowercase characters.
Using the option "uppercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only uppercase characters.
The default is "case insensitive".
The effect of the case options is also influenced by the query's default collation (see Section 2.1.1 Static ContextXQ and Section 4.4 Default Collation DeclarationXQ). The following table summarizes how these interact.
Case option \ Default collation | UCC (Unicode Codepoint Collation) | CCS (some generic case-sensitive collation) | CCI (some generic case-insensitive collation) |
---|---|---|---|
case insensitive | compare as if both lower | case-insensitive variant of CCS if it exists, else error | CCI |
case sensitive | UCC | CCS | case-sensitive variant of CCI if it exists, else error |
lowercase | compare using UCC after applying fn:lower-case() to the query string | compare using CCS after applying fn:lower-case() to the query string | CCI |
uppercase | compare using UCC after applying fn:upper-case() to the query string | compare using CCS after applying fn:upper-case() to the query string | CCI |
Note:
In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO30". The phrase "if it exists" is used, because the case-sensitive collation CCS does not always have a case-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the case-insensitive collation CCI does not always have a case-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).
The following expression returns false, because the title
element
doesn't contain "usability" in lower-case characters:
//book[@number="1"]/title contains text "Usability" using lowercase
The following expression returns true, because the character case is not considered:
//book[@number="1"]/title contains text "usability" using case insensitive
[242] | FTDiacriticsOption | ::= | ("diacritics" "insensitive") |
[Definition: A diacritics option modifies token and phrase matching by specifying how diacritics are considered. ]
There are two possible diacritics options:
The option "diacritics" "insensitive" matches tokens and phrases with and without diacritics. Whether diacritics are written in the query or not is not considered.
The option "diacritics" "sensitive" matches tokens and phrases only if they contain the diacritics as they are written in the query.
The default is "diacritics insensitive".
The effect of the diacritics options is also influenced by the query's default collation (see Section 2.1.1 Static ContextXQ and Section 4.4 Default Collation DeclarationXQ). The following table summarizes how these interact.
Diacritics option \ Default collation | UCC (Unicode Codepoint Collation) | CDS (some generic diacritics-sensitive collation) | CDI (some generic diacritics-insensitive collation) |
---|---|---|---|
diacritics insensitive | UCC comparison, but without considering diacritics | diacritics-insensitive variant of CDS if it exists, else error | CDI |
diacritics sensitive | UCC | CDS | diacritics-sensitive variant of CDI if it exists, else error |
Note:
In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO30". The phrase "if it exists" is used, because the diacritics-sensitive collation CDS does not always have a diacritics-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the diacritics-insensitive collation CDI does not always have a diacritics-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).
The following expression returns true, because the token "Véra" in the
editor
element is matched, as the acute accent is not
considered in the comparison:
//book[@number="1"]//editor contains text "Vera" using diacritics insensitive
This returns false, because the editor
element does not
contain the token "Vera" in this exact form, i.e. without any diacritics:
//book[@number="1"]/editors contains text "Vera" using diacritics sensitive
[247] | FTStopWordOption | ::= | ("stop" "words" FTStopWords
FTStopWordsInclExcl*) |
[248] | FTStopWords | ::= | ("at" URILiteral) |
[249] | FTStopWordsInclExcl | ::= | ("union" | "except") FTStopWords
|
[Definition: A stop word option controls matching of tokens by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text being searched. ] More precisely, a stop word option defines a collection of stop words according to the rules below. Then, in every FTWords to which the stop word option applies, each query token is checked: if it appears (using an implementation-defined comparison) in the specified collection of stop words, it is considered a stop word.
Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.
Tokens matched by stop words retain their position numbers and are counted by FTDistance and FTWindow filters.
FTStopWords specifies the list
of stop words either explicitly as a comma-separated list of string
literals, or by the keyword at
followed by a literal URI.
If the URI specifies a list of stop words that is not found in the statically
known stop word lists, an error is raised [err:FTST0008].
Whether the stop word
list is resolved from the statically known stop word lists or given explicitly,
no tokenization is performed on the stop words: they are used as they occur
in the list.
If the stop words option specifies a stop word list with a relative URI, that relative URI is resolved to an absolute URI using the base URI in the static context and that absolute URI is used to identify the stop word list.
Multiple stop word lists may be combined using "union" or "except". The keywords "union" and "except" are applied from left to right. If "union" is specified, every string occurring in the lists specified by the left-hand side or the right-hand side is a stop word. If "except" is specified, only strings occurring in the list specified by the left-hand side but not in the list specified by the right-hand side are stop words.
The "stop words default" option specifies that an implementation-defined collection of stop words is used.
The "no stop words" option specifies that no stop words are used. This is equivalent to specifying an empty list of stop words.
The default is "no stop words".
Note:
Some implementations may apply stop word lists during indexing and be unable to comply with query-time requests to not apply those stop words. An implementation may still support stop-word options (and therefore not raise [err:FTST0006]) by applying any additional stop words specified in the query. Pre-application of irrevocable stop word lists falls under implementation-defined tokenization behavior in this case, and a query that specifies "no stop words" may still have some words ignored. In addition, an implementation that applies irrevocable stop word lists at indexing time may therefore, as part of the implementation-defined tokenization, fail to count those stop words in the token counts. Since the query strings will be tokenized in accordance with the same rules, those stop words would likewise not count in the position counts for the query string. Thus, irrevocable stop words of this sort are invisible to the normal rules of full-text matching defined in this specification, and are handled purely as a tokenization issue. The examples in this specification assume that stop words are not removed at tokenization in this way.
The following expression returns true, because the document contains the phrase "propagating few errors":
/books/book[@number="1"]//p contains text "propagating of errors" using stop words ("a", "the", "of")
Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. Hence, it is irrelevant for the above-mentioned match whether "few" is a stop word or not, and on the other hand we do not want the query above to match "propagating" followed by 2 stop words, or even a sequence of 3 stop words in the document.
Similarly, the following expression also returns true, because the document contains the text "completion, while propagating few errors":
/books/book[@number="1"]//p contains text "in the propagating of" using stop words ("a", "in", "the", "of")
This expression, however, returns false, because the p element in the document ends with "errors." so there are not enough tokens to match the stop words in the query:
/books/book[@number="1"]//p contains text "propagating few errors of the" using stop words ("a", "in", "the", "of")
The following expression returns false. In this case specifying "few" as a stop word has no effect, since "few" does not appear in the query. Although the words "propagating" and "errors" appear in the text being searched, the phrase "propagating errors" cannot be matched, since that phrase does not occur.
/books/book[@number="1"]//p contains text "propagating errors" using stop words ("few")
The following expression returns false, because "of" is not in the p
element between "propagating" and "errors":
/books/book[@number="1"]//p contains text "propagating of errors" using no stop words
The following expression uses the stop words list specified at the
URL. Assuming that the specified stop word list contains the word
"then", this query
is reduced to a query on the phrase "planning X conducting", allowing any
token as a substitute for X. It returns a book
element,
because its content
element contains "planning then
conducting". It would also return the book
if the
phrases "planning and conducting" and "planning before conducting"
had been in its content
:
doc("http://bstore1.example.com/full-text.xml") /books/book[.//content contains text "planning then conducting" using stop words at "http://bstore1.example.com/StopWordList.xml"]
The following expression returns book
s containing "planning then
conducting", but not does not return book
s containing "planning
and conducting", since it is exempting "then" from being a stop word:
doc("http://bstore1.example.com/full-text.xml") /books/book[.//content contains text "planning then conducting" using stop words at "http://bstore1.example.com/StopWordList.xml" except ("the", "then")]
[Definition: An extension option is a match option that acts in an implementation-defined way. ]
[252] | FTExtensionOption | ::= | "option" EQName
StringLiteral
|
An extension option consists of an identifying QName and a StringLiteral. Typically, a particular option will be recognized by some implementations and not by others. The syntax is designed so that option declarations can be successfully parsed by all implementations.
The QName of an extension option must resolve to a namespace URI and local name, using the statically known namespaces.
Note:
There is no default namespace for options.
Each implementation recognizes an implementation-defined set of namespace URIs used to denote extension options.
If the namespace part of the QName is not a namespace recognized by the implementation as one used to denote extension option, then the extension option is ignored.
Otherwise, the effect of the extension option, including its error behavior, is implementation-defined. For example, if the local part of the QName is not recognized, or if the StringLiteral does not conform to the rules defined by the implementation for the particular extension option, the implementation may choose whether to report an error, ignore the extension option, or take some other action.
Implementations may impose rules on where particular extension options may appear relative to other match options, and the interpretation of an option declaration may depend on its position.
An extension option must not be used to change the syntax accepted by the processor, or to suppress the detection of static errors. However, it may be used without restriction to modify the set of tokens in the query or how they are matched against tokens in the text being searched. An extension option has the same scope as other match options.
The following examples illustrate several possible uses for extension options:
This extension option is set as part of the static context of all full-text expressions in the module and might be used to ensure that queries are insensitive to Arabic short-vowels.
declare namespace exq = "http://example.org/XQueryImplementation"; declare ft-option using option exq:diacritics "short-vowel insensitive";
This extension option applies only to the matching in the full-text selection in which it is found and might be used to specify how compound words should be matched.
declare namespace exq = "http://example.org/XQueryImplementation"; //para[. contains text ("Kinder" ftand "Platz" distance exactly 1 words) using stemming using option exq:compounds "distance=1" ]
Full-text selections can be combined with the logical connectives
ftor
(full-text or), ftand
(full-text and), not in
(mild not),
and ftnot
(unary full-text not).
[219] | FTOr | ::= |
FTAnd ( "ftor" FTAnd )* |
[220] | FTAnd | ::= |
FTMildNot ( "ftand" FTMildNot )* |
[221] | FTMildNot | ::= |
FTUnaryNot ( "not" "in" FTUnaryNot )* |
[222] | FTUnaryNot | ::= | ("ftnot")? FTPrimaryWithOptions
|
[Definition: An
or-selection combines two full-text selections using the
ftor
operator.]
An or-selection finds all matches that satisfy at least one of the operand full-text selections.
The following expression returns the book
element written by
"Millicent":
//book[.//author contains text "Millicent" ftor "Voltaire"]
[Definition: An
and-selection combines two full-text selections using the
ftand
operator.]
An and-selection finds matches that satisfy all of the operand full-text selections simultaneously. A match of an and-selection is formed by combining matches for each of the operand full-text selections as described in 4.2.6.2 FTAnd.
For example, "usability" ftand "testing"
will find two
matches
in //book[@number="1"]/title
: each of the two matches for the
FTWords selection "usability"
(the two occurrences of
"usability" in the string value of the title element) is combined
with the single match for the FTWords "testing"
(only one
occurrence of "testing" in the title).
Since the above and-selection has at least one match, the following
expression will return "true".
//book[@number="1"]/title contains text ("usability" ftand "testing")
The following expression returns false, because "Millicent" and "Montana" are not
contained by the same author
element in any book
element:
//book/author contains text "Millicent" ftand "Montana"
No author
element in any book
element
contains both "Millicent" and "Montana". Therefore, for any such
author
element, there are either one match for the
FTWords "Millicent"
and zero matches for the FTWords
"Montana"
, or vice versa, or no matches for both
of them. In any of these cases, the and-selection will have zero
matches.
[Definition: A
mild-not selection combines two full-text selections
using the not in
operator.]
The not in
operator is a milder form of the operator combination
ftand ftnot
. The selection A not in B
matches a token
sequence that matches A
, but not when it is a part of a
match of B
.
In contrast, A ftand ftnot B
only finds matches when the token
sequence contains A
and does not contain B
.
As an example, consider a search for "Mexico" not in "New Mexico"
.
This may return, among others, a document
which is all about "Mexico" but mentions at the end that "New Mexico
was named after Mexico". The occurrence of "Mexico" in "New Mexico" is not
considered, but other occurrences of "Mexico" are matched. Note that this
document would not be matched by the full-text selection
"Mexico" ftand ftnot "New Mexico"
.
A match to a mild-not selection must contain at least one token that satisfies the first condition and does not satisfy the second condition. If it contains a token that satisfies both the first and the second condition, the token is not considered as a match.
The following expression returns true, because "usability" appears in the
title
and the p
elements and the token within
the phrase "Usability Testing" in the title
element is not
considered:
/books/book contains text "usability" not in "usability testing"
If either operand of a mild-not selection yields an AllMatches that contains a Match that contains a StringExclude, then a dynamic error [err:FTDY0017] is raised.
Note:
This situation can arise if the operand contains
a not-selection
or
a cardinality constraint (FTTimes) involving
exactly
,
at most
, or
from ... to
.
[Definition: A
not-selection is a full-text selection starting with the prefix
operator ftnot
.]
A not-selection selects matches that do not satisfy the operand full-text selection. Details about how such matches are constructed are given in 4.2.6.3 FTUnaryNot.
The following expression returns the empty sequence, because all book
elements contain "usability":
//book[. contains text ftnot "usability"]
The following expression returns true, because book
elements contain
"improving" and "usability" but not "improving usability":
//book contains text "improving" ftand "usability" ftand ftnot "improving usability"
The following expression returns book
elements containing "web site
usability" but not "usability testing":
//book[title/@shortTitle contains text "web site usability" ftand ftnot "usability testing"]
[231] | FTPosFilter | ::= |
FTOrder | FTWindow | FTDistance | FTScope | FTContent
|
[Definition: Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.]
Recall that the grammar rule for FTSelection allows an arbitrary number of positional filters to follow an FTOr. In a group of multiple adjacent positional filters, FTOrder filters are applied first, and then the other positional filters are applied from left to right, skipping the FTOrder filters. That is, the first filter is applied to the result of the FTOr, the second is applied to the result of that first application, and so on.
An FTOr consists of
one or more FTAnds (separated by ftor
),
each of which could be
an FTPosFilter applied to an embedded FTOr, enclosed in parentheses.
[232] | FTOrder | ::= | "ordered" |
[Definition: An ordered selection consists of a full-text selection followed by the postfix operator "ordered".] An ordered selection constrains the order of tokens and phrases to be the same as the order in which they are written in the operand selection.
The default is unordered. Unordered is in effect when ordered is not specified in the query. Unordered cannot be written explicitly in the query.
An ordered selection selects matches which satisfy the operand full-text selection and which also satisfy the following constraint: the order that the matching tokens or phrases have in the text being searched is the same order that the corresponding query tokens or phrases have in the operand selection. In both cases, the ordering is determined from the minimum start positions of the constituent tokens.
The following expression returns true, because titles of book
elements
contain "web site" and "usability" in the order in which they are written in
the query, i.e., "web site" must precede "usability":
//book/title contains text ("web site" ftand "usability") ordered
The following expression returns false, because although "Montana" and "Millicent"
both appear in the book
element, they do not appear in the order they
are written in the query:
//book[@number="1"] contains text ("Montana" ftand "Millicent") ordered
[233] | FTWindow | ::= | "window" AdditiveExpr
FTUnit
|
[235] | FTUnit | ::= | "words" | "sentences" | "paragraphs" |
[Definition: A
window selection consists of a full-text selection followed
by one of the (complex) postfix operators derived from FTWindow.]
A window selection selects matches which satisfy the operand full-text
selection and for which the matched tokens and phrases, more precisely the
individual StringIncludes of that match, are found
within a number of FTUnits (words, sentences, and paragraphs).
The number of FTUnits is
specified by an AdditiveExpr that is converted as though it were an argument to a
function with the expected type of xs:integer
.
A window selection may cross element boundaries. The size of the window is not affected by the presence or absence of element boundaries. Stop words are included in the computation of the window size whether they are ignored by the query or not.
A window selection examines the matches generated by the preceding portion of the FTSelection, and selects those for which the matched tokens and phrases (more precisely, the individual StringIncludes of that match) are all found within a window whose size is a specified number of FTUnits (words, sentences, or paragraphs); for each such window, the window selection then generates a match containing the merge of those StringIncludes, plus any StringExcludes that fall within the window.
The following expression returns true, because "web", "site", and "usability" are
within a window of 5 tokens in the title
element:
/books/book/title contains text "web" ftand "site" ftand "usability" window 5 words
The following expression returns true, because "web" and "site" in the order they are written in the query and either "usability" or "testing" are within a window of at most 10 tokens:
/books/book contains text ("web" ftand "site" ordered) ftand ("usability" ftor "testing") window 10 words
The following expression returns false, because the
instances of "web site" and "usability" in the title
element are
not within a window of 3. The phrase "Web Site Usability" in the attribute
does not apply because the attribute is not part of the string value of the node.
A similar query with a window of 5 would return true.
/books/book//title contains text "web site" ftand "usability" window 3 words
The following expression returns the sample book
element,
because its number
attribute is 1 and it contains a
window of 2 words which contains an occurrence of "efficient"
but not an occurrence of "and". There is just one such matching window
in the sample text and it contains "enable efficient".
/books/book[@number="1" and . contains text "efficient" ftand ftnot "and" window 2 words]
The following expression returns the empty sequence, because in the selected
book
element, there is no occurrence of "efficient"
within a window of 3 tokens which would not also contain an occurrence
of "and":
/books/book[@number="1" and . contains text "efficient" ftand ftnot "and" window 3 words]
In order to allow meaningful results for nested positional filters, e.g., a window selection embedded inside a distance selection, the resulting matches for window selections are formed from the input matches that satisfy the window constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. This is explained in more detail in Section 3.6.3 Distance Selection.
[234] | FTDistance | ::= | "distance" FTRange
FTUnit
|
[230] | FTRange | ::= | ("exactly" AdditiveExpr) |
[Definition: A distance selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.]
A distance selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases satisfy the specified distance conditions.
Distances in the search context are measured in units of tokens, sentences, or paragraphs. Roughly speaking, the distance between two matches is the number of intervening units, so a distance of zero tokens (sentences, paragraphs) means no intervening tokens (sentences, paragraphs). More precisely, given two matches, we first determine their order by sorting on starting position and if necessary on ending position. Let M1 be the "earlier" and M2 be the "later". (If there are overlapping tokens involved, the designations "earlier" and "later" may not be intuitively obvious.) Then the distance between the two is M2's starting position minus M1's ending position, minus 1.
When computing distances in the search context, a distance selection may cross element boundaries; they affect the distance computed only to the extent that they affect the tokenization of the search context. Stop words are counted in those computations whether they are ignored or not.
When a distance selection applies a distance condition to more than two matches, the distance condition is required to hold on each successive pair of matches.
An FTDistance expresses a distance condition in terms of
an FTUnit and an FTRange.
An FTUnit can be
words
, sentences
, or paragraphs
,
where words
refers to a distance measured in tokens.
An FTRange specifies a range of integer values
by providing a minimum and/or maximum value for some integer quantity.
(Here, where the FTRange appears in an FTDistance, that quantity is a distance.
When it appears in an FTTimes, the quantity is a number of occurrences.)
Each one of the AdditiveExpr
specified in an FTRange is converted as though it were an
argument to a function with the expected parameter type of
xs:integer
.
Let the value of the first (or only) operand be M. If "from" is specified, let the value of the second operand be N.
If "exactly" is specified, then the range is the closed interval [M, M]. If "at least" is specified, then the range is the half-closed interval [M, unbounded). If "at most" is specified, then the range is the half-closed interval (unbounded, M]. If "from-to" is specified, then the range is the closed interval [M, N]. Note: If M is greater than N, the range is empty.
Here are some examples of FTRanges:
'exactly 0' specifies the range [0, 0].
'at least 1' specifies the range [1,unbounded).
'at most 1' specifies the range (unbounded, 1].
'from 5 to 10' specifies the range [5, 10].
The following expression returns false, because "completion" and "errors" are less than 11 tokens apart:
/books/book contains text ("completion" ftand "errors" distance at least 11 words)
The following expression returns true:
/books/book contains text "web" ftand "site" ftand "usability" distance at most 2 words
The search context contains two occurrences of the phrase
"the usability of a web site"
(once in the <title> and once in the <content>).
In this phrase,
the tokens "usability" and "web" have a distance of 2 words,
and the tokens "web" and "site" have a distance of 0 words,
both of which satisfy the constraint distance at most 2 words
.
(The tokens "usability" and "site" have a distance of 3 words,
but this does not cause the distance filter to fail,
because these are not successive matches.)
Thus, the full-text selection yields two matches,
and the whole expression yields true.
(The phrase "Improving Web Site Usability" would also satisfy the given full-text selection,
but in the sample document it occurs in an attribute value,
and so does not contribute to the string value or the tokenization of the book element.)
The following expression returns the empty sequence, because between any token "usability" and the token in any occurrence of the phrase "web site" that is the nearest to the token "usability" there is always more than one intervening token:
/books/book[.//p contains text "web site" ftand "usability" distance at most 1 words]
The following expression returns the book
title, because for
the occurrences of the tokens "web" and "users" in the note
element only one intervening token appears:
/books/book[. contains text "web" ftand "users" distance at most 1 words]/title
In order to allow meaningful results for nested positional filters, e.g., a distance selection embedded inside another distance selection, the resulting matches for distance selections are formed from the input matches that satisfy the distance constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. Thus, a distance selection that embeds a window or a distance selection takes the result of the embedded selection as a single unit.
The following gives an example of nested distance selections:
/books/book contains text ((("richard" ftand "nixon") distance at most 2 words) ftand (("george" ftand "bush") distance at most 2 words) distance at least 20 words)
This expression allows to find book
elements that contain, for instance,
"Richard M. Nixon" and "George W. Bush" at least 20 words apart. The
matches for the inner distance selections are treated as single units
(represented by StringIncludes) by the outer distance
selection. Suppose such phrases are present in
the search context, then the outer distance selection
enforces a constraint on the number of intervening tokens ("at least
20") between the
last token of "Richard M. Nixon" and the first token of "George
W. Bush".
[236] | FTScope | ::= | ("same" | "different") FTBigUnit
|
[237] | FTBigUnit | ::= | "sentence" | "paragraph" |
[Definition: A scope selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.]
A scope selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are contained in the same scope or in different scopes.
Possible scopes are sentences and paragraphs.
By default, there are no restrictions on the scope of the matches.
The following expression returns false, because the tokens "usability" and "Marigold" are not contained within the same sentence:
//book contains text "usability" ftand "Marigold" same sentence
The following expression returns true, because the tokens "usability" and "Marigold" are contained within different sentences:
//book contains text "usability" ftand "Marigold" different sentence
The following expression returns a book
element, because it contains
"usability" and "testing" in the same paragraph:
//book[. contains text "usability" ftand "testing" same paragraph]
The following expression returns a book
element, because "site" and
"errors" appear in the same sentence:
//book[. contains text "site" ftand "errors" same sentence]
It is possible that both "same sentence" and "different sentence" conditions are simultaneously safisfied for several tokens and/or phrases within the same document fragment. This can be observed if there are occurrences of the tokens and/or phrases both within the same sentence and within difference sentences. For example, consider the following document fragment.
<introduction> ... The usability of a Web site is how well the site supports the user in achieving specified goals. ... Expert reviews and usability testing are methods of identifying problems in layout, terminology, and navigation. ... </introduction>
This sample will satisfy both conditions ("usability" ftand "reviews")
different sentence
and ("usability" ftand "reviews") same
sentence
. The tokens "usability" and "reviews" occur both in different sentences
(the first and second shown sentences) and in the same sentence (the second shown
sentences.)
The above observation also holds for the "same paragraph" and "different paragraph" conditions.
[238] | FTContent | ::= | ("at" "start") | ("at" "end") | ("entire" "content") |
[Definition: An anchoring selection consists of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".]
An anchoring selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are the first, last, or all tokens in the tokenized form of the items being searched.
Using the "at start" operator, tokens or phrases are matched, if they cover the first token position in the tokenized string value of the item being searched.
Using the "at end" operator, tokens or phrases are matched, if they cover the last token position in the tokenized string value of the item being searched.
Using the "entire content" operator, tokens or phrases are matched, if they cover all token positions of the tokenized string value of the item being searched.
The following expression returns each title
element starting with the
phrase "improving the usability of a web site":
/books//title[. contains text "improving the usability of a web site" at start]
The following expression returns the p
element of the sample,
because it ends with the phrase
"propagating few errors":
/books//p[. contains text "propagat.*" using wildcards ftand "few errors" distance at most 2 words at end]
Since the distance operator doesn't imply an ordering, the last example
would also yield a match if the p
element ended with, say,
"few errors are propagated".
The following expression returns each note
element whose entire content
is "this book has been approved by the web site users association":
/books//note[. contains text "this book has been approved by the web site users association" entire content]
The following example returns true because
both the content
and the note
elements match:
/books//* contains text "Association" at end
[253] | FTIgnoreOption | ::= | "without" "content" UnionExpr
|
The ignore option specifies a set of nodes whose contents are ignored. It is applicable only to a top-level FTSelection (see FTContainsExpr). [Definition: Ignored nodes are the set of nodes whose content are ignored.] Ignored nodes are identified by the XQuery expression UnionExpr. The value of the UnionExpr must be a sequence of zero or more nodes; otherwise a type error is raised [err:XPTY0004]XP30.
Let I1, I2, ..., In
be the sequence of items of the search context and let
N1, N2, ..., Nk
be the sequence of nodes that
UnionExpr evaluates to. For each Ij (j=1..n)
a copy is
made that omits each node Ni (i=1..k)
.
Those copies form the new search context. If
UnionExpr evaluates to an empty sequence no nodes are omitted.
In the following fragment, if $x//annotation
is ignored,
"Web Usability" will be found 2 times: once in the title
element and once in the editor
element. The 2 occurrences
in the 2 annotation
elements are ignored. On the other
hand, "expert" will not be found, as it appears only in an
annotation
element.
let $x := <book> <title>Web Usability and Practice</title> <author>Montana <annotation> this author is an expert in Web Usability</annotation> Marigold </author> <editor>Véra Tudor-Medina on Web <annotation> best editor on Web Usability</annotation> Usability </editor> </book>
By default, no element content is ignored.
Note:
Nodes MAY be ignored during indexing and during query processing. The ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.
[Definition: An extension selection is a full-text selection whose semantics are implementation-defined.] Typically, a particular extension will be recognized by some implementations and not by others. The syntax is designed so that extension selections can be successfully parsed by all implementations, and so that fallback behavior can be defined for implementations that do not recognize a particular extension.
[227] | FTExtensionSelection | ::= |
Pragma+ "{" FTSelection? "}" |
[108] | Pragma | ::= | "(#" S? EQName (S
PragmaContents)? "#)" |
[109] | PragmaContents | ::= | (Char* - (Char* '#)' Char*)) |
An extension selection consists of one or more pragmas followed by a full-text selection enclosed in curly braces. See
Section
3.14 Extension ExpressionsXQ for information on
pragmas in general.
A pragma is denoted by the delimiters (#
and #)
, and
consists of an identifying QName followed by implementation-defined
content.
The content of a pragma may consist of any string of characters that does not contain the ending delimiter #)
. The QName of a
pragma must resolve to a namespace URI and local name, using the statically known namespaces.
Note:
Since there is no default namespace for pragmas, a pragma QName must include a namespace prefix.
Each implementation recognizes an implementation-defined set of namespace URIs used to denote pragmas.
If the namespace part of a pragma QName is not recognized by the implementation as a pragma namespace, then the pragma is ignored. If all the pragmas in an FTExtensionSelection are ignored, then full-text extension selection is just the full-text selection enclosed in curly braces; if this full-text selection is absent, then a static error is raised [err:XQST0079]XQ30.
If an implementation recognizes the namespace of one or more pragmas in an FTExtensionSelection, then the value of the FTExtensionSelection, including its error behavior, is implementation-defined. For example, an implementation that recognizes the namespace of a pragma QName, but does not recognize the local part of the QName, might choose either to raise an error or to ignore the pragma.
It is a static error [err:XQST0013]XQ30 if an implementation recognizes a pragma but determines that its content is invalid.
If an implementation recognizes a pragma, it must report any static errors in the following full-text selection even if it will not apply that selection.
The following examples illustrate three ways in which extension selections might be used.
A pragma can be used to furnish a hint for how to evaluate the following full-text selection, without actually changing the result. For example:
declare namespace exq = "http://example.org/XQueryImplementation"; /books/book/author[name contains text (# exq:use-index #) {'Berners-Lee'}]
An implementation that recognizes the exq:use-index
pragma might use an
index to evaluate the full-text selection that follows. An implementation that
does not recognize this pragma would evaluate the full-text selection in its normal
way.
A pragma might be used to modify the semantics of the following full-text selection in ways that would not (in the absence of the pragma) be conformant with this specification. For example, a pragma might be used to change distance counting so that adjacent words are at a distance of 1 (otherwise they would be at a distance of 0):
declare namespace exq = "http://example.org/XQueryImplementation"; /books/book[.//p contains text (# exq:distance #) { "web site" ftand "usability" distance at most 1 words }]
Such changes to the language semantics must be scoped to the expression contained within the curly braces following the pragma.
A pragma might contain syntactic constructs that are evaluated in place of the following full-text selection. In this case, the following selection itself (if it is present) provides a fallback for use by implementations that do not recognize the pragma. For example:
declare namespace exq = "http://example.org/XQueryImplementation"; //city[. contains text (# exq:classifier with class 'Animals' #) {"animal" using thesaurus at "http://example.org/thesaurus.xml" relationship "RT"}]
Here an implementation that recognizes the pragma will return the result of
evaluating the proprietary syntax with class 'animals'
,
while an implementation that does not recognize the pragma will instead
return the result of the thesaurus option.
If no fallback expression is required, or
if none is feasible, then the expression between the curly braces may be
omitted, in which case implementations that do not recognize the pragma will
raise a static error.
This section describes the formal semantics of XQuery and XPath Full Text 3.1. The figure below shows how XQuery and XPath Full Text 3.1 integrates with XQuery 3.1 and XPath 3.1.
The following diagram represents the interaction of XQuery and XPath Full Text 3.1 with the rest of XQuery 3.1 and XPath 3.1. It illustrates how full-text expressions can be nested within XQuery 3.1 and XPath 3.1 expressions and vice versa.
Step 1 represents the composability of XQuery 3.1 and XPath 3.1 expressions and the fact that such expressions evaluate to a sequence of XDM items. This process is outside the scope of this document and will not be discussed further.
Step 2 shows how XQuery 3.1 and XPath 3.1 expressions can be nested within full-text expressions. If an XQuery 3.1 and XPath 3.1 expression is nested on the left-hand side of an FTContains expression or within FTWords, the sequence of XDM items that result from evaluation of that XQuery 3.1 or XPath 3.1 expression are converted to their tokenized form, as described in Tokenization. If the XQuery 3.1 and XPath 3.1 expression is nested within another type of FTSelection, the items in its result sequence are converted to atomic values, as discussed in FTSelections.
Step 3 represents the composability of FTSelections. Each FTSelection operates on zero or more AllMatches and returns an AllMatches. The process is described in the Evaluation of FTSelections section.
Step 4 shows how XQuery and XPath Full Text 3.1 and scoring expressions can be nested into XQuery 3.1 and XPath 3.1 expressions. The sections 4.3 FTContainsExpr and 4.4 Scoring describe how this is achieved.
Note:
In the list above and throughout the rest of this section, bold typeface has been used to distinguish the concepts that are part of the AllMatches model.
The functions and schemas defined in this section are considered to be within the fts: namespace (as discussed in section 1.3 A word about namespaces). These functions and schemas are used only for describing the semantics. There is no requirement that an implementation of this specification must use the functions, schemas, or algorithms described in this section of this specification. The only requirement is that implementations must achieve the same results that an implementation that does use these functions, schemas, and algorithms would achieve.
Note that by using XQuery 3.1 and XPath 3.1 to specify the formal semantics, we avoid the need to introduce new formalism. We simply reuse the formal semantics of XQuery 3.1 and XPath 3.1.
[Definition: Formally, tokenization is the process of converting an XDM item to a collections of tokens, taking any structural information of the item into account to identify token, sentence, and paragraph boundaries. Each token is assigned a starting and ending position.]
Tokenization, including the definition of the term "token", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization MUST conform to these constraints:
Each token MUST consist of one or more characters.
Tokenization of an item that is neither a map nor an array MUST include only tokens derived from the string value of that item. The string value is defined in [XQuery and XPath Data Model (XDM) 3.1] in Section 2.6.5 String ValuesDM; for element nodes it does not include the contents of attributes, but for attribute nodes it does.
Tokenization of a map or array item MUST include only tokens derived from the combined string value of all of the values contained in the map or array. Thus for map, the keys MUST NOT be included in tokenization. This rule is applied recursively for all nested map or array values. The XQuery that expresses this semantic is:
(: suppose local:implementation-defined-tokens is the implementation defined tokenizer for XDM types that are not map or array :) declare function local:fulltext-tokens($context-item as item()) as xs:string* { let $tokens := typeswitch ($context-item) case $val as map(*) return local:json-tokens($context-item) case $val as array(*) return local:json-tokens($context-item) default return local:implementation-defined-tokens($context-item) return $tokens } declare function local:json-tokens($json-doc as function(*)) as xs:string* { for $val in $json-doc?* return local:fulltext-tokens($val) }
The tokenizer SHOULD, when tokenizing two equal items, identify the same tokens in each. The cases where it does not are implementation-defined.
The starting and ending position of a token MUST be integers, and the starting position MUST be less than or equal to the ending position.
In the tokenization of an item,
consider the range of token positions
from the smallest starting position to the largest ending position;
every token position in that range must be covered by some token in the tokenization.
That is, for every token position P
,
there must exist some token T
such that
T's starting position <= P <= T's ending position
.
The tokenizer MUST preserve the containment hierarchy (paragraphs contain sentences contain tokens) by adhering to the following constraints:
Each token is contained in at most one sentence and at most one paragraph. (In particular, this means that no tokens of any sentence are contained in any other sentence, and no tokens of any paragraph are contained in any other paragraph.)
All tokens of a sentence are contained in at most one paragraph.
The range of token positions from the smallest starting position to the largest ending position in a sentence does not overlap with the token position range from any other sentence.
The range of token positions from the smallest starting position to the largest ending position in a paragraph does not overlap with the token position range from any other paragraph.
Useful information for tokenizer implementors may be found in [UAX29].
Note:
Usually, the starting and ending positions of a token are the same. For some languages, some tokenizers may identify overlapping tokens. For example, the German word "Donaudampfschifffahrtskapitaensmuetze" might be tokenized into the following tokens: "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfschiff", "kapitaen", "muetze", "kapitaensmuetze", "schifffahrt", "dampfschifffahrt", and perhaps others. In the face of overlapping tokens, it is implementation-dependent what positions a tokenizer assigns to each such token. For example, a tokenizer might assign the same position value to each of the tokens "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfshiff", etc. In that case, the distance between each (overlapping) token assigned the same position is -1. Tokenizers might retain additional information about those overlapping tokens that allows the full-text implementation to distinguish among them.
Consider the sentence "Ich sehe den Dampfschifffahrtskapitän auf dem Fluß." If an implementation tokenizes "Dampfschifffahrtskapitän" as overlapping tokens at the same position, then the implementation could still determine that the query "'Schifffahrt Dampf' window 0 words ordered" fails to match the sentence because phrase matching is implementation-defined and may make use of additional implementation-dependent token information.
Even more complex situations can arise. Consider, for example,
the German sentence "Er stellte sie vor." A sophisticated tokenizer
might construct the token "vorstellen" covering positions 2 through 4,
which overlaps the token "sie" at position 3. For the purposes of
distance calculations, tokens are considered in the order of their
starting positions, so the distance between "vorstellen" and
"sie" would be 3-4-1=-2. (See fts:wordDistance
, below.)
For example, the following example must return false, because the 'secret' only occurs within an attribute and a comment, neither of which contributes characters to the string value of the 'p' element node:
<p kind='secret'>Sensitive material <!-- secret --></p> contains text 'secret'
The following document may lead to overlapping tokens to account for the ambiguity caused by the hyphen:
<p>I will re- sign tommorow.</p>
The following document fragment is the source document for examples in this section. A sample tokenization is used for the examples in this section. The results might be different for other tokenizations.
Unless stated otherwise, the results assume a case-insensitive match.
<offers> <offer id="1000" price="10000"> Ford Mustang 2000, 65K, excellent condition, runs great, AC, CC, power all </offer> <offer id="1001" price="8000"> Honda Accord 1999, 78K, A/C, cruise control, runs and looks great, excellent condition </offer> <offer id="1005" price="5500"> Ford Mustang, 1995, 150K highway mileage, no rust, excellent condition </offer> </offers>
In this sample tokenization, tokens are delimited by punctuation and whitespace symbols.
The token "Ford" is at relative position 1.
The token "Mustang" is at relative position 2.
The token "2000" is at relative position 3.
Relative position numbers are assigned sequentially through the end of the document.
Hence in this example each token occupies exactly one position, and no overlapping of tokens occurs. The relative positions of tokens are shown below in parentheses.
<offers> <offer id="1000" price="10000"> Ford(1) Mustang(2) 2000(3), 65K(4), excellent(5) condition(6), runs(7) great(8), AC(9), CC(10), power(11) all(12) </offer> <offer id="1001" price="8000"> Honda(13) Accord(14) 1999(15), 78K(16), A(17)/C(18), cruise(19) control(20), runs(21) and(22) looks(23) great(24), excellent(25) condition(26) </offer> <offer id="1005" price="5500"> Ford(27) Mustang(28), 1995(29), 150K(30) highway(31) mileage(32), little(33) rust(34), excellent(35) condition(36) </offer> </offers>
The relative positions of paragraphs are determined similarly. In this sample tokenization, the paragraph delimiters are start tags and end tags.
The tokens in the first 'offer' element are assigned relative paragraph number 1.
The tokens from the next 'offer' element are assigned relative paragraph number 2.
Relative paragraph numbers are assigned sequentially through the end of the document.
The relative positions of sentences are determined similarly using sentence delimiters.
Implementations may provide for the means to ignore or side-step
certain structural elements when performing tokenization. In the
following example, the implementation has decided to ignore the
markup for <bold>
and prune out the entire
subtree headed by <deleted>
.
<para><deleted>This sentence was deleted.</deleted> This <bold>entire paragraph</bold> is one sentence as far as the tokenizer is concerned. </para>
Using the same notation as before, this sample tokenization is shown below. All the tokens marked with a token position also have the same sentence and paragraph relative positions. Note that there are no tokens marked for the ignored subtree.
<para><deleted>This sentence was deleted.</deleted> This(1) <bold>entire(2) paragraph(3)</bold> is(4) one(5) sentence(6) as(7) far(8) as(9) the(10) tokenizer(11) is(12) concerned(13). </para>
The following is a sample JSON document.
{ "offers" : [ { "id": 1000, "price" : 10000, "description" : "Ford Mustang 2000", "note" : null }, { "id": 1001, "price" : 11000, "description" : "Honda Accord 1999", "sold" : true } ] }
After being parsed by fn:parse-json, this JSON document is converted to the following MAP document. The tokenization SHOULD be as follows.
map { "offers" : [ map { "id": 1000 (1), "price" : 10000 (2), "description" : "Ford (3) Mustang (4) 2000 (5)", "note" : () }, map { "id": 1001 (6), "price" : 11000 (7), "description" : "Honda (8) Accord (9) 1999 (10)", "sold" : fn:true() (11) } ] }
Note the token for fn:true() is of course its string value "true" as defined by XDM.
[Definition: A QueryItem is a sequence of QueryTokenInfos representing the collection of tokens derived from tokenizing one query string. ]
[Definition: A QueryTokenInfo is the identity of a token inside a query string. ] Each QueryTokenInfo is associated with a position that captures the relative position of the query string in the query.
[Definition: A TokenInfo represents a contiguous collection of tokens from an XML document. ] Each TokenInfo is associated with:
startPos
: the smallest starting position
of a token in the sequence
endPos
: the largest ending position
of any token of the sequence
startSent
: the relative position of the
sentence containing the token with the smallest starting
position
or zero if the tokenizer does not report
sentences
endSent
: the relative position of the sentence
containing the token with the largest ending position
or zero if the tokenizer does not report
sentences
startPara
: the relative position of the
paragraph containing the token with the smallest starting
position or zero if the tokenizer does not report
paragraphs
endPara
: the relative position of the paragraph
containing the token with the largest ending position or
zero if the tokenizer does not report paragraphs
The following matching function is the central implementation-defined primitive performing the full-text retrieval.
declare function fts:matchTokenInfos ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $stopWords as xs:string*, $queryTokens as element(fts:queryToken)* ) as element(fts:tokenInfo)* external;
The above function returns the TokenInfos in items in
$searchContext
that match the query string represented by
the sequence $queryTokens
, when using the match
options in $matchOptions
and stop words in
$stopWords
. If $queryTokens
is a
sequence of more than one query token, each returned
TokenInfo must represent a phrase matching that sequence.
Note:
While this matching function assumes a tokenized
representation of the query strings, it does not assume a tokenized
representation of the input items in $searchContext
,
i.e. the texts being searched.
Hence, the tokenization of the search context is implicit in
this function and coupled to the retrieval of matches.
Of course, this does not imply that tokenization of the
search context cannot be done a priori.
The tokenization of each item in $searchContext
does not
necessarily take into account the match options in
$matchOptions
or the query tokens in
$queryTokens
.
This allows implementations to tokenize and index input data
without the knowledge of particular match options
used in full-text queries.
The XQuery 3.1 and XPath 3.1 Data Model is inadequate to support fully composable FTSelections. Full-text operations, such as FTSelections, operate on linguistic units, such as positions of tokens, and which are not captured in the XQuery 3.1 and XPath 3.1 Data Model (XDM).
XQuery and XPath Full Text adds relative token, sentence, and paragraph position numbers via AllMatches. AllMatches make FTSelections fully composable.
[Definition: An AllMatches describes the possible results of an FTSelection.] The UML Static Class diagram of AllMatches is shown on the diagram given below.
The AllMatches object contains zero or more Matches.
[Definition: Each Match describes one result to the FTSelection.] The result is described in terms of zero or more StringIncludes and zero or more StringExcludes.
[Definition: A StringMatch is a possible match of a sequence of query tokens with a corresponding sequence of tokens in a document. A StringMatch may be a StringInclude or StringExclude.] The queryPos attribute specifies the position of the query token in the query. This attribute is needed for FTOrders. The matched document token sequence is described in the TokenInfo associated with the StringMatch.
[Definition: A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.]
[Definition: A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.]
Intuitively, AllMatches specifies the TokenInfos that a search context item contains and does not contain to satisfy an FTSelection.
The AllMatches structure resembles the Disjunctive Normal Form (DNF) in propositional and first-order logic. The AllMatches is a disjunction of Matches. Each Match is a conjunction of StringIncludes, and StringExcludes.
Since in most of the examples below the tokens span only a single
position, we characterize the TokenInfo instance by simply giving this position,
written as "Pos:X". This should be read as the value for both, the
startPos
and the endPos
attribute. Furthermore, for expository reasons, we
include in each StringMatch example an attribute
"query string", set to the original
query string, in order to facilitate the association
from which query string that match came from.
The simplest example of an FTSelection is an FTWords such
as "Mustang"
. The
AllMatches corresponding to this FTWords is given below.
As shown, the AllMatches consists of two Matches. Each
Match represents one possible result of the FTWords
"Mustang"
. The result represented by the first
Match, represented as a StringInclude, contains the token
"Mustang" at position 2. The result described by the second Match
contains the token "Mustang" at position 28.
A more complex example of an FTSelection is an FTWords
such as "Ford Mustang"
. The AllMatches for this
FTWords is given below.
There are two possible results for this FTWords, and these are represented by the two Matches. Each of the Matches requires two tokens to be matched. The first Match is obtained by matching "Ford" at position 1 and matching "Mustang" at position 2. Similarly, the second Match is obtained by matching "Ford" at position 27 and "Mustang" at position 28.
An even more complex example of an FTSelection is an
FTSelection such as "Mustang"
ftand ftnot "rust"
that searches for
"Mustang" but not "rust". The AllMatches for this
FTSelection is given below.
This example introduces StringExclude. StringExclude corresponds to negation in DNF (Disjunctive Normal Form). It specifies that the result described by the corresponding Match must not match the token at the specified position. In this example, the first Match specifies that "Mustang" is matched at position 2, and that the token "rust" at position 34 is not matched.
AllMatches has a well-defined hierarchical structure. Therefore, the AllMatches can be easily modeled in XML. This XML representation and those which follow formally describe the semantics of FTSelections. For example, the XML representation of AllMatches formally specifies how an FTSelection operates on zero or more AllMatches to produce a resulting AllMatches.
The XML schema for representing AllMatches is given below.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="allMatches"> <xs:sequence> <xs:element ref="fts:match" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="stokenNum" type="xs:integer" use="required" /> </xs:complexType> <xs:element name="allMatches" type="fts:allMatches"/> <xs:complexType name="match"> <xs:sequence> <xs:element ref="fts:stringInclude" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="fts:stringExclude" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="stringInclude" type="fts:stringMatch" /> <xs:element name="stringExclude" type="fts:stringMatch" /> <xs:element name="match" type="fts:match"/> <xs:complexType name="stringMatch"> <xs:sequence> <xs:element ref="fts:tokenInfo"/> </xs:sequence> <xs:attribute name="queryPos" type="xs:integer" use="required"/> <xs:attribute name="isContiguous" type="xs:boolean" use="required"/> </xs:complexType> <xs:complexType name="tokenInfo"> <xs:attribute name="startPos" type="xs:integer" use="required"/> <xs:attribute name="endPos" type="xs:integer" use="required"/> <xs:attribute name="startSent" type="xs:integer" use="required"/> <xs:attribute name="endSent" type="xs:integer" use="required"/> <xs:attribute name="startPara" type="xs:integer" use="required"/> <xs:attribute name="endPara" type="xs:integer" use="required"/> </xs:complexType> <xs:element name="tokenInfo" type="fts:tokenInfo"/> <xs:complexType name="queryItem"> <xs:sequence> <xs:element ref="fts:queryToken" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="queryTokenInfo"> <xs:attribute name="word" type="xs:string" use="required"/> <xs:attribute name="queryPos" type="xs:integer" use="required"/> </xs:complexType> <xs:element name="queryToken" type="fts:queryTokenInfo"/> </xs:schema>
The stokenNum
attribute in
AllMatches is related to the representation of the semantics as XQuery functions.
Therefore, it is not considered part of the AllMatches model.
The stokenNum
attribute stores
the number of query tokens used when evaluating the AllMatches. This
value is used to compute the correct value for the queryPos
attribute in new StringMatches.
FTSelections are fully composable and may be nested arbitrarily under other FTSelections. Each FTSelection may be associated with match options (such as stemming and stop words) and score weights. Since score weights are solely interpreted by the formal semantics scoring function, they do not influence the semantics of FTSelections. Therefore, score weights are not considered in the formal semantics.
The XML structures defined by the following schema
represent FTSelections
within the semantic functions of section 4 Semantics.
This representation is used for definitional purposes only
and should not be confused with
the XML representation for queries in Appendix E XML Syntax (XQueryX) for XQuery and XPath Full Text 3.1.
Every FTSelection
is represented as an XML element. Every nested FTSelection is
represented as a nested descendant element.
For binary FTSelections, e.g., FTAnd, the nested FTSelections
are represented in <left>
and <right>
descendant elements. For unary FTSelections, a
<selection>
descendant element is used. Additional
characteristics of FTSelections, e.g., the distance unit for
FTDistance, are stored in attributes.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:include schemaLocation="AllMatches.xsd" /> <xs:include schemaLocation="MatchOptions.xsd" /> <xs:complexType name="ftSelection"> <xs:sequence> <xs:choice> <xs:element name="ftWords" type="fts:ftWords"/> <xs:element name="ftAnd" type="fts:ftAnd"/> <xs:element name="ftOr" type="fts:ftOr"/> <xs:element name="ftUnaryNot" type="fts:ftUnaryNot"/> <xs:element name="ftMildNot" type="fts:ftMildNot"/> <xs:element name="ftOrder" type="fts:ftOrder"/> <xs:element name="ftScope" type="fts:ftScope"/> <xs:element name="ftContent" type="fts:ftContent"/> <xs:element name="ftDistance" type="fts:ftDistance"/> <xs:element name="ftWindow" type="fts:ftWindow"/> <xs:element name="ftTimes" type="fts:ftTimes"/> </xs:choice> <xs:element ref="fts:matchOptions" minOccurs="0"/> <xs:element name="weight" type="xs:double" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="selection" type="fts:ftSelection"/> <xs:complexType name="ftWords"> <xs:sequence> <xs:element ref="fts:queryItem" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="type" type="fts:ftWordsType" use="required"/> </xs:complexType> <xs:element name="queryItem" type="fts:queryItem"/> <xs:complexType name="ftAnd"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOr"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftUnaryNot"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftMildNot"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOrder"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftScope"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:scopeType" use="required"/> <xs:attribute name="scope" type="fts:scopeSelector" use="required"/> </xs:complexType> <xs:complexType name="ftContent"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:contentMatchType" use="required"/> </xs:complexType> <xs:complexType name="ftDistance"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftWindow"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="size" type="xs:integer" use="required"/> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftTimes"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftWords"/> </xs:sequence> </xs:complexType> <xs:simpleType name="ftWordsType"> <xs:restriction base="xs:string"> <xs:enumeration value="any"/> <xs:enumeration value="all"/> <xs:enumeration value="phrase"/> <xs:enumeration value="any word"/> <xs:enumeration value="all word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeType"> <xs:restriction base="xs:string"> <xs:enumeration value="same"/> <xs:enumeration value="different"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeSelector"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="distanceType"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> <xs:enumeration value="word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="contentMatchType"> <xs:restriction base="xs:string"> <xs:enumeration value="at start"/> <xs:enumeration value="at end"/> <xs:enumeration value="entire content"/> </xs:restriction> </xs:simpleType> </xs:schema>
evaluate
functionThe semantics for the evaluation of
FTSelections is defined using the fts:evaluate
function. The function takes three
parameters: (1) an FTSelection, 2) a search
context item, and 3) the default set of match options
that apply to the evaluation of the FTSelection.
The fts:evaluate
function returns the
AllMatches that is the result of evaluating the
FTSelection. When fts:evaluate
is applied to some
FTSelection X, it calls the function
fts:ApplyX
to build the resulting AllMatches.
If X is applied on nested FTSelections, the
fts:evaluate
function is recursively called on these nested
FTSelections and the returned AllMatches are used in the evaluation of
fts:ApplyX
.
The semantics for the fts:evaluate
function
is given below.
declare function fts:evaluate ( $ftSelection as element(*, fts:ftSelection), $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokenNum as xs:integer ) as element(fts:allMatches) { if (fn:count($ftSelection/fts:matchOptions) > 0) then (: First we deal with all match options that the :) (: FTSelection might bear: we add the match options :) (: to the current match options structure, and :) (: pass the new structure to the recursive call. :) let $newFTSelection := <fts:selection>{$ftSelection/* [fn:not(self::fts:matchOptions)]}</fts:selection> return fts:evaluate($newFTSelection, $searchContext, fts:replaceMatchOptions($matchOptions, $ftSelection/fts:matchOptions), $queryTokenNum) else if (fn:count($ftSelection/fts:weight) > 0) then (: Weight has no bearing on semantics -- just :) (: call "evaluate" on nested FTSelection :) let $newFTSelection := $ftSelection/*[fn:not(self::fts:weight)] return fts:evaluate($newFTSelection, $searchContext, $matchOptions, $queryTokenNum) else typeswitch ($ftSelection/*[1]) case $nftSelection as element(fts:ftWords) return (: Apply the FTWords in the search context :) fts:ApplyFTWords($searchContext, $matchOptions, $nftSelection/@type, $nftSelection/fts:queryItem, $queryTokenNum + 1) case $nftSelection as element(fts:ftAnd) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTAnd($left, $right) case $nftSelection as element(fts:ftOr) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTOr($left, $right) case $nftSelection as element(fts:ftUnaryNot) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTUnaryNot($nested) case $nftSelection as element(fts:ftMildNot) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTMildNot($left, $right) case $nftSelection as element(fts:ftOrder) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTOrder($nested) case $nftSelection as element(fts:ftScope) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTScope($nftSelection/@type, $nftSelection/@scope, $nested) case $nftSelection as element(fts:ftContent) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTContent($searchContext, $nftSelection/@type, $nested) case $nftSelection as element(fts:ftDistance) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTDistance($nftSelection/@type, $nftSelection/fts:range, $nested) case $nftSelection as element(fts:ftWindow) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTWindow($nftSelection/@type, $nftSelection/@size, $nested) case $nftSelection as element(fts:ftTimes) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTTimes($nftSelection/fts:range, $nested) default return <fts:allMatches stokenNum="0" /> };
For
concreteness, assume that the FTSelection was
invoked inside an contains text
expression such
as searchContext contains text ftSelection
. In order to
determine the
AllMatches result of ftSelection
, the
fts:evaluate
function is invoked as
follows: fts:evaluate($ftSelection,
$searchContext, $matchOptions, 0)
, where
$ftSelection
is the XML representation of the
ftSelection
and
$searchContext
is bound to the result of
the evaluation of the XQuery expression
searchContext
.
Initially, the
$queryTokensNum
is 0, i.e., no
query tokens have been processed.
The variable $matchOptions
is bound to the
list of match options as defined in the static context (see
Appendix C Static Context Components).
Match options embedded in
$ftSelection
modify the match options collection as
evaluation proceeds.
Given the invocation of: fts:evaluate($ftSelection,
$searchContext, $matchOptions)
, evaluation proceeds as
follows. First, $ftSelection
is checked to see whether
1) it contains a match option,
2) it contains a weight specification,
3) it is an FTWords, or
4) none of the above hold.
If $ftSelection
contains one or more match options,
these are combined with the inherited match options
via a call to fts:replaceMatchOptions
(see 4.2.5 Match Options Semantics).
The evaluate
function is then invoked on the
nested FTSelection with the new set of match options,
and the result of that call is returned.
If $ftSelection
contains a weight
specification, then the specification is ignored because it
does not alter the semantics. The evaluate
function is recursively called on the nested FTSelection and the
resulting AllMatches is returned.
If $ftSelection
is an FTWords, then
it does
not have any nested FTSelections. Consequently, this is the base
of the recursive call, and the AllMatches result of the FTWords
is computed and returned. The AllMatches is computed by invoking
the ApplyFTWords
function with the current
search context and other necessary information.
If $ftSelection
contains neither a match
option nor a weight specification and is not an FTWords, the
FTSelection performs a full-text operation, such as
ftand
, ftor
, window
.
These operations are fully-compositional and may be
invoked on nested FTSelections. Consequently, evaluation proceeds
as follows.
First, the evaluate
function is
recursively invoked on each nested FTSelection.
The result of
evaluating each nested FTSelection is an AllMatches.
The AllMatches are transformed into the resulting
AllMatches by applying the full-text operation corresponding to
FTSelection1
which is
generically named applyX
for some type of
FTSelection X in the code.
For example, let
FTSelection1
be FTSelection2 ftand
FTSelection3
. Here FTSelection2
and
FTSelection3
may themselves be arbitrarily nested
FTSelections. Thus, evaluate
is invoked on
FTSelection2
and FTSelection3
, and the
resulting AllMatches are transformed to the final AllMatches
using the ApplyFTAnd
function corresponding to
ftand
.
The semantics of the ApplyX
function for
each FTSelection kind X is given below.
An FTWords that consists of a single
query string consisting of a sequence of token to be
matched as a phrase is evaluated by
the applyQueryTokensAsPhrase
function. Its parameters
are 1) the search context, 2) the list of match options, 3) the query
string to be matched as a sequence of fts:queryToken
items, and 4) the position where the latter query string occurs in the
query.
(: simplified version not dealing with special match options :) declare function fts:applyQueryTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokens as element(fts:queryToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$queryPos}"> { for $tokenInfo in fts:matchTokenInfos( $searchContext, $matchOptions, (), $queryTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$tokenInfo} </fts:stringInclude> </fts:match> } </fts:allMatches> };
If after the application of all the match options, the sequence of query tokens returned for an FTWords is empty, an empty AllMatches is returned.
The AllMatches corresponding to an
FTWords is a set of Matches. Each of the Matches
is associated with a starting and an ending position indicating where the corresponding
query tokens were found. For example, the AllMatches
result for the FTWords "Mustang" is given
below. To simplify the presentation in the figures we write
Pos: N
, if the attributes
startPos
and endPos
are the same
with N
being that position.
There are five variations of FTWords depending on how the tokens and phrases in the nested XQuery 3.1 and XPath 3.1 expression are matched.
When any word
is specified, at
least one token in the tokenization of the nested expression must be
matched.
When all word
is specified, all
tokens in the tokenization of the nested expression must be
matched.
When phrase
is specified, all
tokens in the tokenization of the nested expression must be
matched as a phrase.
When any
is specified, at least one
string atomic value in the nested expression must be
matched as a phrase.
When all
is specified, all
string atomic values in the nested expression must be
matched as a phrase.
The semantics for FTWords when any word
is specified
is given below. Since FTWords
does not have nested FTSelections, the
ApplyFTWords
function does not take
AllMatches parameters corresponding to nested
FTSelection results.
declare function fts:MakeDisjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTOr($curRes, $firstAllMatches) return fts:MakeDisjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAnyWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of query string has already occurred. :) (: Get sequence of QueryTokens over all query items. :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryTokens) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $queryToken at $pos in $queryTokens return fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeDisjunction($firstAllMatches, $restAllMatches) };
The tokenized query strings are passed to
ApplyFTWordsAnyWord as a sequence of
fts:queryItem
, each containing the tokens of
a single query string. A single flattened sequence of all
tokens (of type fts:queryToken
) over all
query items is constructed. For each of these,
the result of FTWords is computed using
applyQueryTokensAsPhrase
. Finally, the
disjunction of all resulting AllMatches is computed.
The semantics for FTWords when all word
is specified is similar to the above, however composes a
conjunction. It is given below.
declare function fts:MakeConjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTAnd($curRes, $firstAllMatches) return fts:MakeConjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAllWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of query strings has already occurred. :) (: Get sequence of QueryTokens over all query items :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryTokens) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $queryToken at $pos in $queryTokens return fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeConjunction($firstAllMatches, $restAllMatches) };
The semantics for FTWords if phrase
is specified
is given below.
declare function fts:ApplyFTWordsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Get sequence of QueryTokenInfos over all query items :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryTokens) eq 0) then <fts:allMatches stokenNum="0" /> else fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryTokens, $queryPos) };
The ApplyFTWordsPhrase
function
also flattens the sequence of query items to a sequence of
query tokens, but then calls
applyQueryTokensAsPhrase
on that
entire sequence, instead of calling it on each query token
individually. Hence, the sequence of all query tokens is
matched as a single phrase and the computed TokenInfos
are returned.
The semantics for FTWords when any
is specified is
given below.
declare function fts:ApplyFTWordsAny ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($queryItems) eq 0) then <fts:allMatches stokenNum="0" /> else let $firstQueryItem := $queryItems[1] let $restQueryItem := fn:subsequence($queryItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstQueryItem, $queryPos) let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAny($searchContext, $matchOptions, $restQueryItem, $newQueryPos) return fts:ApplyFTOr($firstAllMatches, $restAllMatches) };
The FTWords with any
specified forms the disjunction of the AllMatches that
are the result of the matching of each query item as a phrase.
The semantics for FTWords when all
is specified
is given below.
declare function fts:ApplyFTWordsAll ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($queryItems) = 0) then <fts:allMatches stokenNum="0" /> else let $firstQueryItem := $queryItems[1] let $restQueryItem := fn:subsequence($queryItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstQueryItem, $queryPos) return if ($restQueryItem) then let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAll($searchContext, $matchOptions, $restQueryItem, $newQueryPos) return fts:ApplyFTAnd($firstAllMatches, $restAllMatches) else $firstAllMatches };
The difference between all
and
any
is the use of conjunction instead of
disjunction.
The ApplyFTWords
function combines
all of these functions.
declare function fts:ApplyFTWords ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $type as fts:ftWordsType, $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if ($type eq "any word") then fts:ApplyFTWordsAnyWord($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "all word") then fts:ApplyFTWordsAllWord($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "phrase") then fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "any") then fts:ApplyFTWordsAny($searchContext, $matchOptions, $queryItems, $queryPos) else fts:ApplyFTWordsAll($searchContext, $matchOptions, $queryItems, $queryPos) };
XQuery 3.1 functions are used to define the semantics of FTMatchOptions. These functions operate on an XML representation of the FTMatchOptions. The representation closely follows the syntax. Each FTMatchOption is represented by an XML element. Additional characteristics of the match option are represented as attributes. The schema is given below.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="ftMatchOptions"> <xs:sequence> <xs:element ref="fts:thesaurus" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:stopwords" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:case" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:diacritics" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:stem" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:wildcard" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:language" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:element name="matchOptions" type="fts:ftMatchOptions"/> <xs:element name="case" type="fts:ftCaseOption" /> <xs:element name="diacritics" type="fts:ftDiacriticsOption" /> <xs:element name="thesaurus" type="fts:ftThesaurusOption" /> <xs:element name="stem" type="fts:ftStemOption" /> <xs:element name="wildcard" type="fts:ftWildCardOption" /> <xs:element name="language" type="fts:ftLanguageOption" /> <xs:element name="stopwords" type="fts:ftStopWordOption" /> <xs:complexType name="ftCaseOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="case insensitive"/> <xs:enumeration value="case sensitive"/> <xs:enumeration value="lowercase"/> <xs:enumeration value="uppercase"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftDiacriticsOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="diacritics insensitive"/> <xs:enumeration value="diacritics sensitive"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftThesaurusOption"> <xs:sequence> <xs:element name="thesaurusName" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="relationship" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="range" type="fts:ftRangeSpec" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="thesaurusIndicator"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="using"/> <xs:enumeration value="no"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> <xs:complexType name="ftRangeSpec"> <xs:attribute name="type" type="fts:rangeSpecType" use="required"/> <xs:attribute name="m" type="xs:integer"/> <xs:attribute name="n" type="xs:integer" use="required"/> </xs:complexType> <xs:simpleType name="rangeSpecType"> <xs:restriction base="xs:string"> <xs:enumeration value="exactly"/> <xs:enumeration value="at least"/> <xs:enumeration value="at most"/> <xs:enumeration value="from to"/> </xs:restriction> </xs:simpleType> <xs:complexType name="ftStemOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="stemming"/> <xs:enumeration value="no stemming"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftWildCardOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="wildcards"/> <xs:enumeration value="no wildcards"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftLanguageOption"> <xs:sequence> <xs:element name="value" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftStopWordOption"> <xs:sequence> <xs:choice> <xs:element name="default-stopwords"> <xs:complexType /> </xs:element> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:element name="oper" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:choice> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:attribute name="type"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="union"/> <xs:enumeration value="except"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema>
The previous section described FTSelections without
giving any details about how FTMatchOptions need to be
interpreted. All processing of FTMatchOptions was
delegated to the function
matchTokenInfos
, which is
implementation-defined. In this section, further details
on the semantics of FTMatchOptions are given.
The extension is achieved by modifying an existing function and adding functions that are specific to the FTMatchOptions.
Modifications in the semantics of existing functions
The semantics of most of the FTSelections remains unmodified. The modifications are to the method for matching a sequence of query tokens.
declare function fts:applyQueryTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokens as element(fts:queryToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { let $thesaurusOption := $matchOptions/fts:thesaurus[1] return if ($thesaurusOption and $thesaurusOption/@thesaurusIndicator eq "using") then let $noThesaurusOptions := <fts:matchOptions>{ $matchOptions/*[fn:not(self::fts:thesaurus)] }</fts:matchOptions> let $lookupRes := fts:applyThesaurusOption($thesaurusOption, $noThesaurusOptions, $queryTokens) return fts:ApplyFTWordsAny($searchContext, $noThesaurusOptions, $lookupRes, $queryPos) else (: from here on we have a single sequence of query tokens :) (: which is to be matched a phrase; no alternatives anymore :) <fts:allMatches stokenNum="{$queryPos}"> { for $pos in fts:matchTokenInfos( $searchContext, $matchOptions, fts:applyStopWordOption($matchOptions/fts:stopwords), $queryTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$pos} </fts:stringInclude> </fts:match> } </fts:allMatches> };
Two FTMatchOptions need to be processed differently than the rest of the FTMatchOptions as shown in the function above.
Unlike all other FTMatchOptions the semantics
of the FTThesaurusOption cannot be formulated as an operation on
individual query tokens, because a thesaurus lookup may return
alternative query items for a whole phrase, i.e., a sequence of
query tokens. Since
the result of a thesaurus lookup is a sequence of alternatives,
there must be a higher level of processing. The above call to
applyThesaurusOption
returns for the given
sequence of query tokens (representing a phrase) all thesaurus
expansions for the selected thesaurus, relationship and level
range as a sequence of query items. The
alternative expansions are evaluated as a disjunction using
the fts:ApplyFTWordsAny
.
The matching of the alternatives is performed with
FTThesaurusOption turned off to avoid double expansions,
i.e., expansion of an already expanded token.
For the semantics of the FTStopWordOption the list of stop words needs to be computed as demanded by the special syntax for stop word lists involving the operators "union" and "except".
Semantics of new FTMatchOptions functions
The expansion of FTSelections also includes adding additional functions that are specific to the FTMatchOptions.
The evaluate
function above handles match options
occurring in the query structure by using a call to the function
replaceMatchOptions
which is defined below. The latter
function replaces match options from the list given by the first argument
with match options of the same group in the list given by the second
argument, if any. If an option is present in the second list but not in
the first list, the option is included to the resulting list too.
Intuitively, the replaceMatchOptions
computes the
effective match options for a given FTSelection. The function uses
the options specified specifically for the current FTSelection (
$ftSelection/fts:matchOptions
to override any options
of the same group declared up the query tree ($matchOptions
).
declare function fts:replaceMatchOptions ( $matchOptions as element(fts:matchOptions), $newMatchOptions as element(fts:matchOptions) ) as element(fts:matchOptions) { <fts:matchOptions> { (if ($newMatchOptions/fts:thesaurus) then $newMatchOptions/fts:thesaurus else $matchOptions/fts:thesaurus), (if ($newMatchOptions/fts:stopwords) then $newMatchOptions/fts:stopwords else $matchOptions/fts:stopwords), (if ($newMatchOptions/fts:case) then $newMatchOptions/fts:case else $matchOptions/fts:case), (if ($newMatchOptions/fts:diacritics) then $newMatchOptions/fts:diacritics else $matchOptions/fts:diacritics), (if ($newMatchOptions/fts:stem) then $newMatchOptions/fts:stem else $matchOptions/fts:stem), (if ($newMatchOptions/fts:wildcard) then $newMatchOptions/fts:wildcard else $matchOptions/fts:wildcard), (if ($newMatchOptions/fts:language) then $newMatchOptions/fts:language else $matchOptions/fts:language) } </fts:matchOptions> };
This function determines how match options of the same group overwrite each other, so that only one option of the same group remains.
The details of the semantics of the remaining FTMatchOptions
are determined by the implementation-defined function
matchTokenInfos
.
FTMatchOption functions which are necessary to support match option processing are given below.
declare function fts:resolveStopWordsUri ( $uri as xs:string? ) as xs:string* external; declare function fts:lookupThesaurus ( $tokens as element(fts:queryToken)*, $thesaurusName as xs:string?, $relationship as xs:string?, $range as element(fts:range)?, $noThesaurusOptions as element(fts:matchOptions) ) as element(fts:queryItem)* external;
The function resolveStopWordsUri
is used to resolve any URI to a sequence of strings to be
used as stop words.
The function lookupThesaurus
finds
all expansions related to $tokens
in the thesaurus $thesaurusName
using the relationship
$relationship
within the optional number of levels
$range
. If $tokens
consists of
more than one query token, it is regarded as a
phrase.
The current match options other than the thesaurus option
are also passed to the function,
via $noThesaurusOptions
,
allowing the implementation to apply any of those match options
(whichever it deems relevant)
to the input or output of the actual thesaurus lookup.
The thesaurus function returns a sequence of expansion alternatives. Each alternative is regarded as a new search phrase and is represented as a query item. Alternatives are treated as though they are connected with a disjunction (FTOr).
FTMatchOptions of type FTCaseOption are passed in the
$matchOptions
parameter to
matchTokenInfos
. If the FTCaseOption is
"lowercase" the returned TokenInfos must span
only tokens that are all lowercase. If the
FTCaseOption is
"uppercase" the returned TokenInfos must span
only tokens that are all uppercase. If the
FTCaseOption is "case insensitive" the
function must return all TokenInfos matching the query
tokens when disregarding character case. If the
FTCaseOption is "case sensitive" the
function must return all TokenInfos that also accord with
the query tokens in character case.
FTMatchOptions of type FTDiacriticsOption are passed in the
$matchOptions
parameter to
matchTokenInfos
. If the
FTDiacriticsOption is "diacritics insensitive" the function must
return all TokenInfos matching
the query tokens when disregarding diacritical marks. If the
FTDiacriticsOption is "diacritics sensitive" the function must
return all TokenInfos that
also accord with the query tokens in diacritical marks.
FTMatchOptions of type FTStemOption are passed in the
$matchOptions
parameter to
matchTokenInfos
. It is
implementation-defined what the effect of the option
"stemming" is on matching tokens, however, it is expected that
this option allows to match linguistic variants of the query
tokens. If the FTStemOption is "no stemming" the
returned TokenInfos must span exact matches (i.e. not
including linguistic variations) of the query tokens.
The semantics for the FTThesaurusOption is given below.
declare function fts:applyThesaurusOption ( $matchOption as element(fts:thesaurus), $noThesaurusOptions as element(fts:matchOptions), $queryTokens as element(fts:queryToken)* ) as element(fts:queryItem)* { if ($matchOption/@thesaurusIndicator = "using") then fts:lookupThesaurus( $queryTokens, $matchOption/fts:thesaurusName, $matchOption/fts:relationship, $matchOption/fts:range, $noThesaurusOptions ) else if ($matchOption/@thesaurusIndicator = "no") then <fts:queryItem> {$queryTokens} </fts:queryItem> else () };
Stop words interact with FTDistance and FTWindow. The semantics for the FTStopWordOption is given below.
declare function fts:applyStopWordOption ( $stopWordOption as element(fts:stopwords)? ) as xs:string* { if ($stopWordOption) then let $swords := typeswitch ($stopWordOption/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopWordsUri($e/text()) case element(fts:default-stopwords) return fts:resolveStopWordsUri(()) default return () return fts:calcStopWords( $swords, $stopWordOption/fts:oper ) else () }; declare function fts:calcStopWords ( $stopWords as xs:string*, $opers as element(fts:oper)* ) as xs:string* { if ( fn:empty($opers) ) then $stopWords else let $swords := typeswitch ($opers[1]/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopWordsUri($e/text()) default return () return if ($opers[1]/@type eq "union") then fts:calcStopWords( ($stopWords, $swords), $opers[fn:position() gt 2] ) else (: "except" :) fts:calcStopWords( $stopWords[fn:not(.)=$swords], $opers[fn:position() gt 2] ) };
Given the applicable setting of the Stop Word Option,
the function fts:applyStopWordOption
calls fts:calcStopWords
to compute the set of stop words,
and returns that set as an instance of xs:string*
.
This then is passed to fts:matchTokenInfos
,
which uses it to affect the matching of tokens.
The fts:calcStopWords
function uses
the function fts:resolveStopWordsUri
to resolve any URI
to a sequence of strings.
FTMatchOptions of type FTWildCardOption are passed in the
$matchOptions
parameter to
matchTokenInfos
. If the
FTWildCardOption is "wildcards" the function must
return all TokenInfos in the search context that span tokens,
such that those tokens are wildcard
expansions of the corresponding query token. The wildcard
expansions are described in Section 3.2.7 FTWildCardOption. If the
FTWildCardOption is "no wildcards" all query tokens
must be matched literally.
The parameters of the ApplyFTOr
function are the two AllMatches parameters
corresponding to the results of the two nested
FTSelections.
The semantics is given
below.
declare function fts:ApplyFTOr ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}"> {$allMatches1/fts:match,$allMatches2/fts:match} </fts:allMatches> };
The ApplyFTOr
function creates a new AllMatches
in which
Matches are the union of those found
in the input AllMatches.
Each Match represents one possible result of the corresponding
FTSelection. Thus, a Match from either
of the AllMatches is a result.
For example, consider the FTSelection
"Mustang" ftor "Honda"
. The
AllMatches corresponding to
"Mustang" and "Honda" are given below.
The AllMatches produced by
ApplyFTOr
is given below.
The parameters of the ApplyFTAnd
function are the two AllMatches
corresponding to the results of the two nested
FTSelections.
The semantics is given below.
declare function fts:ApplyFTAnd ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}" > { for $sm1 in $allMatches1/fts:match for $sm2 in $allMatches2/fts:match return <fts:match> {$sm1/*, $sm2/*} </fts:match> } </fts:allMatches> };
The result of the conjunction is a new AllMatches that contains the "Cartesian product" of the matches of the participating FTSelections. Every resulting Match is formed by the combination of the StringInclude components and StringExclude from the AllMatches of the nested FTSelection . Thus every match contains the positions to satisfy a Match from both original FTSelections and excludes the positions that violate the same Matches.
For example, consider the FTSelection
"Mustang" ftand "rust"
. The
source AllMatches are give below.
The AllMatches produced by ApplyFTAnd
is
given below.
The ApplyFTUnaryNot
function
has one AllMatches parameter corresponding to the
result of the nested FTSelection to be negated.
The
semantics is given below.
declare function fts:InvertStringMatch ( $strm as element(*,fts:stringMatch) ) as element(*,fts:stringMatch) { if ($strm instance of element(fts:stringExclude)) then <fts:stringInclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringInclude> else <fts:stringExclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringExclude> }; declare function fts:UnaryNotHelper ( $matches as element(fts:match)* ) as element(fts:match)* { if (fn:empty($matches)) then <fts:match/> else for $sm in $matches[1]/* for $rest in fts:UnaryNotHelper( fn:subsequence($matches, 2) ) return <fts:match> { fts:InvertStringMatch($sm), $rest/* } </fts:match> }; declare function fts:ApplyFTUnaryNot ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { fts:UnaryNotHelper($allMatches/fts:match) } </fts:allMatches> };
The generation of the resulting AllMatches of an FTUnaryNot resembles the transformation of a negation of prepositional formula in DNF back to DNF. The negation of AllMatches requires the inversion of all the StringMatches within the AllMatches.
In the InvertStringMatch
function above,
this inversion occurs as follows.
The function fts:invertStringMatch
inverts a StringInclude into a StringExclude and
vice versa.
The function fts:UnaryNotHelper
transforms
the source Matches into the resulting
Matches by forming the combinations of the
inversions of a StringInclude or StringExclude
component over the source Matches into new Matches.
For example, consider the FTSelection
ftnot ("Mustang" ftor "Honda")
. The
source AllMatches is given below:
The FTUnaryNot transforms the StringIncludes to StringExcludes as illustrated below.
The parameters of the ApplyFTMildNot
function are the two AllMatches parameters corresponding
to the results of the two nested FTSelections.
The semantics is given below.
declare function fts:CoveredIncludePositions ( $match as element(fts:match) ) as xs:integer* { for $strInclude in $match/fts:stringInclude return $strInclude/fts:tokenInfo/@startPos to $strInclude/fts:tokenInfo/@endPos }; declare function fts:ApplyFTMildNot ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches1//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), "Invalid expression on the left-hand side of a not-in") else if (fn:count($allMatches2//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), "Invalid expression on the right-hand side of a not-in") else if (fn:count($allMatches2//fts:stringInclude) eq 0) then $allMatches1 else <fts:allMatches stokenNum="{$allMatches1/@stokenNum}"> { $allMatches1/fts:match[ every $matches2 in $allMatches2/fts:match satisfies let $posSet1 := fts:CoveredIncludePositions(.) let $posSet2 := fts:CoveredIncludePositions($matches2) return some $pos in $posSet1 satisfies fn:not($pos = $posSet2) ] } </fts:allMatches> };
The resulting AllMatches contains Matches of the first operand that do not mention in their StringInclude components positions in a StringInclude component in the AllMatches of the second operand.
For example, consider the FTSelection
("Ford" not in "Ford
Mustang")
. The
source AllMatches for the left-hand side argument is given below.
The source AllMatches for the right-hand side argument is given below.
The FTMildNot will transform these to an empty AllMatches because both position 1 and position 27 from the first AllMatches contain only TokenInfos from StringInclude components of the second AllMatches.
The ApplyFTOrder
function
has one AllMatches parameter corresponding to
the result of the nested FTSelections.
The semantics is given below.
declare function fts:ApplyFTOrder ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies (($stringInclude1/fts:tokenInfo/@startPos <= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos <= $stringInclude2/@queryPos)) or (($stringInclude1/fts:tokenInfo/@startPos>= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos >= $stringInclude2/@queryPos)) return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies (($stringExcl/fts:tokenInfo/@startPos <= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos <= $stringIncl/@queryPos)) or (($stringExcl/fts:tokenInfo/@startPos >= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos >= $stringIncl/@queryPos)) return $stringExcl } </fts:match> } </fts:allMatches> };
The resulting AllMatches contains the Matches for which the starting positions in the StringInclude elements are in the order of the query positions of their query strings. StringExcludes that preserve the order (with respect to their starting positions) are also retained.
For example, consider the FTSelection
("great" ftand "condition")
ordered
. The source AllMatches is given below.
The AllMatches for FTOrder are given below.
The parameters of the ApplyFTScope
function are
1) the type of the scope (same or different), 2) the
linguistic unit (sentence or paragraph), and 2) one
AllMatches parameter corresponding to the result of the
nested FTSelections.
The function
definitions depend on the type of the scope (paragraph,
sentence) and the scope predicate (same, different).
The semantics of same sentence
is given below.
declare function fts:ApplyFTScopeSameSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@startSent and $stringInclude1/fts:tokenInfo/@startSent = $stringInclude1/fts:tokenInfo/@endSent and $stringInclude2/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@endSent and $stringInclude1/fts:tokenInfo/@startSent > 0 and $stringInclude2/fts:tokenInfo/@startSent > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startSent = 0 or ($stringExcl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@endSent and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@startSent) ) return $stringExcl } </fts:match> } </fts:allMatches> };
An AllMatches returned by the scope same sentence
contains those Matches whose StringIncludes span only a single
sentence and all span the same sentence. In these Matches only
those StringExcludes are retained that also only span a single
sentence, which is, in case there are StringIncludes in that Match,
the same as the one spanned by the StringIncludes.
The semantics of different sentence
is given below.
declare function fts:ApplyFTScopeDifferentSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where count($match/fts:stringInclude) > 1 and ( every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 is $stringInclude2 or ( ( $stringInclude1/fts:tokenInfo/@startSent != $stringInclude2/fts:tokenInfo/@startSent or $stringInclude1/fts:tokenInfo/@startSent != $stringInclude1/fts:tokenInfo/@endSent or $stringInclude2/fts:tokenInfo/@startSent != $stringInclude2/fts:tokenInfo/@endSent ) and $stringInclude1/fts:tokenInfo/@startSent > 0 and $stringInclude2/fts:tokenInfo/@endSent > 0 ) ) return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies ($stringIncl/fts:tokenInfo/@startSent != $stringExcl/fts:tokenInfo/@startSent or $stringIncl/fts:tokenInfo/@startSent != $stringIncl/fts:tokenInfo/@endSent or $stringExcl/fts:tokenInfo/@startSent != $stringExcl/fts:tokenInfo/@endSent ) and $stringIncl/fts:tokenInfo/@startSent > 0 and $stringExcl/fts:tokenInfo/@endSent > 0 return $stringExcl } </fts:match> } </fts:allMatches> };
An AllMatches returned by the scope different sentence
contains those Matches that have
at least two StringIncludes,
no two of which begin and end all in the same sentence.
In these Matches only those StringExcludes are retained that do not
conflict with any of the StringIncludes.
The semantics of same paragraph
is analogous to same
sentence
and is given below.
declare function fts:ApplyFTScopeSameParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@startPara and $stringInclude1/fts:tokenInfo/@startPara = $stringInclude1/fts:tokenInfo/@endPara and $stringInclude2/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@endPara and $stringInclude1/fts:tokenInfo/@startPara > 0 and $stringInclude2/fts:tokenInfo/@endPara > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startPara = 0 or ($stringExcl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@endPara and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@startPara) ) return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of different paragraph
is analogous to
different sentence
and is given below.
declare function fts:ApplyFTScopeDifferentParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where count($match/fts:stringInclude) > 1 and ( every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 is $stringInclude2 or ( ( $stringInclude1/fts:tokenInfo/@startPara != $stringInclude2/fts:tokenInfo/@startPara or $stringInclude1/fts:tokenInfo/@startPara != $stringInclude1/fts:tokenInfo/@endPara or $stringInclude2/fts:tokenInfo/@startPara != $stringInclude2/fts:tokenInfo/@endPara ) and $stringInclude1/fts:tokenInfo/@startPara > 0 and $stringInclude2/fts:tokenInfo/@endPara > 0 ) ) return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies ($stringIncl/fts:tokenInfo/@startPara != $stringExcl/fts:tokenInfo/@startPara or $stringIncl/fts:tokenInfo/@startPara != $stringIncl/fts:tokenInfo/@endPara or $stringExcl/fts:tokenInfo/@startPara != $stringExcl/fts:tokenInfo/@endPara ) and $stringIncl/fts:tokenInfo/@startPara > 0 and $stringExcl/fts:tokenInfo/@endPara > 0 return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics for the general case is given below.
declare function fts:ApplyFTScope ( $type as fts:scopeType, $selector as fts:scopeSelector, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "same" and $selector eq "sentence") then fts:ApplyFTScopeSameSentence($allMatches) else if ($type eq "different" and $selector eq "sentence") then fts:ApplyFTScopeDifferentSentence($allMatches) else if ($type eq "same" and $selector eq "paragraph") then fts:ApplyFTScopeSameParagraph($allMatches) else fts:ApplyFTScopeDifferentParagraph($allMatches) };
For example, consider the FTSelection
("Mustang" ftand "Honda") same
paragraph
. The source AllMatches is given below.
The FTScope returns an empty AllMatches because neither Match contains TokenInfos from a single sentence.
The parameters of the ApplyFTContent
function are 1) the search context,
2) the type of the content
match
(at start
, at end
, or entire content
),
and 3) one
AllMatches parameter corresponding to the result of the
nested FTSelections.
The evaluation of ApplyFTContent
depends on the type of the content match:
entire content
retains those Matches such that
for every token position in the search context,
some StringInclude in the Match covers that token position.
at start
retains those Matches that contain a StringInclude
that covers the lowest token position in the search context.
at end
retains those Matches that contain a StringInclude
that covers the highest token position in the search context.
The semantics is given below.
declare function fts:ApplyFTContent ( $searchContext as item(), $type as fts:contentMatchType, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { $allMatches/fts:match[ let $start_pos := fts:getLowestTokenPosition($searchContext), $end_pos := fts:getHighestTokenPosition($searchContext), $match := . return if ($type eq "entire content") then every $pos in $start_pos to $end_pos satisfies some $si in $match/fts:stringInclude[data(@isContiguous)] satisfies fts:TokenInfoCoversTokenPosition($si/fts:tokenInfo, $pos) else let $pos := if ($type eq "at start") then $start_pos else (: $type eq "at end" :) $end_pos return some $ti in $match/fts:stringInclude/fts:tokenInfo satisfies fts:TokenInfoCoversTokenPosition($ti, $pos) ] } </fts:allMatches> };
ApplyFTContent
depends on the helper function
fts:TokenInfoCoversTokenPosition
,
which ascertains whether the given $tokenInfo
covers a particular $tokenPosition
.
declare function fts:TokenInfoCoversTokenPosition( $tokenInfo as element(fts:tokenInfo), $tokenPosition as xs:integer ) as xs:boolean { ($tokenPosition >= $tokenInfo/@startPos) and ($tokenPosition <= $tokenInfo/@endPos) };
ApplyFTContent
also depends on two functions
whose definitions are implementation-dependent:
getLowestTokenPosition
and
getHighestTokenPosition
return (respectively)
the first and last token positions
of the item $searchContext
.
declare function fts:getLowestTokenPosition( $searchContext as item() ) as xs:integer external; declare function fts:getHighestTokenPosition( $searchContext as item() ) as xs:integer external;
Note that the way @isContiguous
is calculated in joinIncludes
and used in ApplyFTContent
can lead to counter-intuitive results.
For example, consider the following query:
"one two three four" contains text ("one" ftand "three" window 3 words) ftand ("two" ftand "four" window 3 words) entire content
Even though the four query tokens do cover
all of the search context's token positions,
the query yields false,
because the Match
that ApplyFTContent
receives as input
has two StringIncludes, each of which is non-contiguous.
Before we define the semantics functions of the FTWindow and FTDistance
operations, we introduce the auxiliary function joinIncludes
that will
be used in their definitions. joinIncludes
takes a sequence of
StringIncludes of a Match and transforms it into either the empty sequence, in
case the input sequence was empty, or otherwise a single StringInclude
representing the span from the first position of the match to the last. For the
purpose of being able to evaluate an "entire content" operator further up in the
tree, we pre-evaluate whether all possible positions between first and last are
covered in the input StringIncludes and store that boolean in the
attribute "isContiguous".
declare function fts:joinIncludes( $strIncls as element(fts:stringInclude)* ) as element(fts:stringInclude)? { if (fn:empty($strIncls)) then $strIncls else let $posSet := fts:CoveredIncludePositions(<fts:match>$strIncls</fts:match>), $minPos := fn:min($strIncls/fts:tokenInfo/@startPos), $maxPos := fn:max($strIncls/fts:tokenInfo/@endPos), $isContiguous := ( every $pos in $minPos to $maxPos satisfies ($pos = $posSet) ) and ( every $strIncl in $strIncls satisfies $strIncl/@isContiguous ) return <fts:stringInclude queryPos="{$strIncls[1]/@queryPos}" isContiguous="{$isContiguous}"> <fts:tokenInfo startPos ="{$minPos}" endPos ="{$maxPos}" startSent="{fn:min($strIncls/fts:tokenInfo/@startSent)}" endSent ="{fn:max($strIncls/fts:tokenInfo/@endSent)}" startPara="{fn:min($strIncls/fts:tokenInfo/@startPara)}" endPara ="{fn:max($strIncls/fts:tokenInfo/@endPara)}"/> </fts:stringInclude> };
The parameters of the
ApplyFTWindow
function are
1) the unit of
type fts:distanceType
, 2) a size, and 3) one
AllMatches parameter
corresponding to the result of the nested FTSelections.
For each unit
type a function is defined as follows.
The semantics of window N words
is given below.
declare function fts:ApplyFTWordWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPos), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPos) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPos >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPos <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };
The semantics of window N sentences
is given below.
declare function fts:ApplyFTSentenceWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startSent), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endSent) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startSent >= $windowStartPos and $stringExclude/fts:tokenInfo/@endSent <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };
The semantics of window N paragraphs
is given below.
declare function fts:ApplyFTParagraphWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPara), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPara) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPara >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPara <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };
The resulting AllMatches contains Matches of the operand that satisfy the condition that there exists a sequence of the specified number of consecutive (token, sentence, or paragraph) positions, such that all StringIncludes are within that window, and the StringExcludes retained are also within that window. For each Match that satisfies the window condition the StringIncludes are joined into a single StringInclude. This enables further window or distance operations to be applied to the result in a way that that result is taken as a single entity.
The semantics for the general function is given below.
declare function fts:ApplyFTWindow ( $type as fts:distanceType, $size as xs:integer, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then fts:ApplyFTWordWindow($allMatches, $size) else if ($type eq "sentence") then fts:ApplyFTSentenceWindow($allMatches, $size) else fts:ApplyFTParagraphWindow($allMatches, $size) };
For example, consider the FTWindow selection
("Ford Mustang" ftand
"excellent") window 10 words
.
The Matches of the source AllMatches for
("Ford Mustang" ftand
"excellent")
are given below.
The result for the FTWindow selection consists of only the first, the fifth, and the sixth Matches because their respective window sizes are 5, 4, and 9.
The parameters of the
ApplyFTDistance
function are
1) one
AllMatches parameter corresponding to the result of the
nested FTSelections, 2) the unit of the distance (tokens,
sentences, paragraphs), and 3) the range specified.
The resulting AllMatches contains Matches of the operand
that satisfy the condition
that the distance for every pair of consecutive StringIncludes
is within the specified interval,
where the distance is measured (in tokens, sentences, or paragraphs)
from the end of the preceding StringInclude
to the start of the next.
An invocation of
the ApplyFTDistance
function
will call one of twelve helper functions,
each of which handles a particular unit of distance
and type of range.
declare function fts:ApplyFTDistance ( $type as fts:distanceType, $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then if ($range/@type eq "exactly") then fts:ApplyFTWordDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTWordDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTWordDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTWordDistanceFromTo( $allMatches, $range/@m, $range/@n) else if ($type eq "sentence") then if ($range/@type eq "exactly") then fts:ApplyFTSentenceDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTSentenceDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTSentenceDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTSentenceDistanceFromTo( $allMatches, $range/@m, $range/@n) else if ($range/@type eq "exactly") then fts:ApplyFTParagraphDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTParagraphDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTParagraphDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTParagraphDistanceFromTo( $allMatches, $range/@m, $range/@n) };
Word Distance
The semantics of case word distance exactly N
is given below.
declare function fts:ApplyFTWordDistanceExactly( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $idx in 1 to fn:count($sorted) - 1 satisfies fts:wordDistance( $sorted[$idx]/fts:tokenInfo, $sorted[$idx+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of word distance at least N
is given
below.
declare function fts:ApplyFTWordDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of word distance at most N
is given
below.
declare function fts:ApplyFTWordDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of word distance from M to N
is given
below.
declare function fts:ApplyFTWordDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The preceding four helper functions all rely on
fts:wordDistance
,
which returns the number of token positions
that occur between two TokenInfos.
For example,
two tokens with consecutive positions have a distance of 0 tokens,
and two overlapping tokens have a distance of -1 tokens.
declare function fts:wordDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPos - $sorted[1]/@endPos - 1 };
Sentence Distance
The semantics of sentence distance exactly N
is given below.
declare function fts:ApplyFTSentenceDistanceExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of sentence distance at least N
is given below.
declare function fts:ApplyFTSentenceDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of sentence distance at most N
is given below.
declare function fts:ApplyFTSentenceDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of sentence distance from M to N
is given below.
declare function fts:ApplyFTSentenceDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The preceding four helper functions all rely on
fts:sentenceDistance
,
which returns the number of sentences between two TokenInfos.
declare function fts:sentenceDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startSent - $sorted[1]/@endSent - 1 };
Paragraph Distance
The semantics of paragraph distance exactly N
is given below.
declare function fts:ApplyFTParagraphDistanceExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of paragraph distance at least N
is given below.
declare function fts:ApplyFTParagraphDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of paragraph distance at most N
is given below.
declare function fts:ApplyFTParagraphDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The semantics of paragraph distance from M to N
is given below.
declare function fts:ApplyFTParagraphDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };
The preceding four helper functions all rely on
fts:paraDistance
,
which returns the number of paragraphs between two TokenInfos.
declare function fts:paraDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPara - $sorted[1]/@endPara - 1 };
For example, consider the FTDistance selection
("Ford Mustang" ftand
"excellent") distance at most 3 words
.
The Matches of the source AllMatches for
("Ford Mustang" ftand
"excellent")
are given below.
The result for the FTDistance selection
consists of only the first Match (with positions 1, 2, and 5) and
the fifth Match (with positions 25, 27, and 28), because only
for these Matches the word distance between consecutive
TokenInfos is always less than or equal to 3.
For the first Match,
the word distance between the two TokenInfos
is 2 (startPos
5 - endPos
2 - 1),
and for the fifth Match,
it's 1 (startPos
27 - endPos
25 - 1).
The parameters of the ApplyFTTimes
function are 1) an FTRange specification, and 2)
a parameter corresponding to the result of the nested
FTWords.
The function definitions depend on the range specification FTRange to limit the number of occurrences.
The general semantics is given below.
declare function fts:FormCombinations ( $sms as element(fts:match)*, $k as xs:integer ) as element(fts:match)* (: Find all combinations of exactly $k elements from $sms, and for each such combination, construct a match whose children are copies of all the children of all the elements in the combination. Return the sequence of all such matches. :) { if ($k eq 0) then <fts:match/> else if (fn:count($sms) lt $k) then () else if (fn:count($sms) eq $k) then <fts:match>{$sms/*}</fts:match> else let $first := $sms[1], $rest := fn:subsequence($sms, 2) return ( (: all the combinations that don't involve $first :) fts:FormCombinations($rest, $k), (: and all the combinations that do involve $first :) for $combination in fts:FormCombinations($rest, $k - 1) return <fts:match> { $first/*, $combination/* } </fts:match> ) }; declare function fts:FormCombinationsAtLeast ( $sms as element(fts:match)*, $times as xs:integer) as element(fts:match)* (: Find all combinations of $times or more elements from $sms, and for each such combination, construct a match whose children are copies of all the children of all the elements in the combination. Return the sequence of all such matches. :) { for $k in $times to fn:count($sms) return fts:FormCombinations($sms, $k) }; declare function fts:FormRange ( $sms as element(fts:match)*, $l as xs:integer, $u as xs:integer, $stokenNum as xs:integer ) as element(fts:allMatches) { if ($l > $u) then <fts:allMatches stokenNum="0" /> else let $am1 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinationsAtLeast($sms, $l)} </fts:allMatches> let $am2 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinationsAtLeast($sms, $u+1)} </fts:allMatches> return fts:ApplyFTAnd($am1, fts:ApplyFTUnaryNot($am2)) };
The semantics of occurs exactly N times
is given
below.
declare function fts:ApplyFTTimesExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, $n, $n, $allMatches/@stokenNum) };
The semantics of occurs at least N times
is given below.
declare function fts:ApplyFTTimesAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> {fts:FormCombinationsAtLeast($allMatches/fts:match, $n)} </fts:allMatches> };
The semantics of occurs at most N times
is given
below.
declare function fts:ApplyFTTimesAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, 0, $n, $allMatches/@stokenNum) };
The semantics of occurs from M to N times
is given below.
declare function fts:ApplyFTTimesFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, $m, $n, $allMatches/@stokenNum) };
The way to ensure that
there are at least N different matches of an
FTSelection is to ensure that at least N of
its Matches occur simultaneously. This is similar to
forming their conjunction by combining N or more distinct
Matches into one simple match. Therefore, the AllMatches
for the selection condition specifying the range qualifier
at least N
contains the possible
combinations of N or more simple matches of the
operand.
This operation
is performed in the function
fts:FormCombinationsAtLeast
.
The range [L, U] is represented by the condition
at least L and not at least U+1
. This transformation
is performed in the function
fts:FormRange
.
The semantics for the general case is given below.
declare function fts:ApplyFTTimes ( $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'XPST0003')) else if ($range/@type eq "exactly") then fts:ApplyFTTimesExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTTimesAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTTimesAtMost($allMatches, $range/@n) else fts:ApplyFTTimesFromTo($allMatches, $range/@m, $range/@n) };
The above function performs a sanity check to ensure that the nested AllMatches is a result of the evaluation of FTWords as defined in the grammar rule for FTPrimary . Otherwise, an error [err:XPST0003]XP30 is raised.
For example, consider the FTTimes selection
"Mustang" occurs at least 2 times
. The source
AllMatches of the FTWords
selection "Mustang"
is given below.
The result consists of the pairs of the Matches.
Consider an FTContainsExpr expression of the form
SearchContext contains text FTSelection
,
where SearchContext
is an XQuery 3.1
expression that returns a sequence of items.
The FTContainsExpr returns true if and only if
one of those items
satisfies the FTSelection
.
If the FTContainsExpr is of the form SearchContext
contains text FTSelection without content IgnoreExpr
for
some XQuery 3.1 expression IgnoreExpr
, then
any nodes returned by IgnoreExpr
are (notionally) pruned from each search context item
before attempting to satisfy the FTSelection
.
More formally, evaluation of an FTContainsExpr proceeds according to the following steps. Where appropriate, the explanation includes references to arcs labelled "FTn" in the processing model diagram (Figure 1) in 2.1 Processing Model.
For each XQuery/XPath expression nested within the FTContainsExpr, evaluate it with respect to the same dynamic context as the FTContainsExpr (FT1). Specifically:
Evaluate the search context expression (SearchContext
),
resulting in the sequence of search context items.
Evaluate the ignore option (IgnoreExpr
) if any,
resulting in the set of ignored nodes.
At each FTWordsValue,
evaluate the literal/expression and convert the result to xs:string*
.
At each weight specification,
evaluate the expression and convert the result to xs:double
.
At each FTWindow and FTRange,
evaluate the AdditiveExpr(s) and convert each to xs:integer
.
Using the settings of the match option components
in the FTContainsExpr's static context,
construct an element(fts:matchOptions)
structure.
Based on the parse-tree of the FTContainsExpr's FTSelection
and the results of steps 1c-1e,
construct an element(*,fts:ftSelection)
structure.
We refer to this as the "operator tree" below.
In this process:
Construct the operator tree from the top down, propagating FTMatchOptions down to FTWordsValues.
Tokenize the query string(s) obtained at 1c. (FT2.1)
Call the function fts:FTContainsExpr
(see declaration below),
passing the following arguments to its parameters:
$searchContextItems
:
The sequence of items returned by SearchContext
,
calculated in step 1a.
$ignoreNodes
:
The sequence of items returned by IgnoreExpr
(in 1b),
if that expression is present,
or the empty sequence otherwise.
$ftSelection
:
The XML node representation of FTSelection
(constructed in step 2).
$defOptions
:
The XML representation of the match options
in the FTContainsExpr's static context
(constructed in step 3).
Within the function, for each search context item:
Delete the ignored nodes from the search
context item.
[fts:FTContainsExpr
calls fts:reconstruct
.]
Traverse the operator tree from the top down,
propagating FTMatchOptions down to FTWordsValues.
[fts:evaluate
calls
itself and fts:replaceMatchOptions
.]
At each FTWordsValue, using the prevailing FTMatchOptions:
Tokenize the search context obtained at 4a. (FT2.2)
(Whether this pays any attention to FTMatchOptions is
up to the implementation.)
[This happens within fts:matchTokenInfos
.]
Match the search context tokens and the query tokens,
yielding an
element(fts:tokenInfo)*
structure.
[This happens within fts:matchTokenInfos
.]
Convert that into an element(fts:allMatches)
. (FT3)
[This happens in fts:applyQueryTokensAsPhrase
.]
Traverse the operator tree from the bottom up. At each point, the AllMatches instances produced by subtrees are taken as input, and a new AllMatches instance is obtained as output. (FT4) [This is most of the section 4 code.]
If the topmost AllMatches instance contains a Match with no StringExcludes,
then the search context item
satisfies the full-text condition given by the FTSelection,
and the call to fts:FTContainsExpr
returns true
.
[This is handled by the QuantifiedExpr in fts:FTContainsExpr
.]
[Note that the section 4 code doesn't implement 4b-4d as three sequential steps. Instead, they are different aspects of a single traversal of the operator tree.]
If none of the topmost AllMatches provides a successful match,
then fts:FTContainsExpr
returns false
.
The boolean value returned by the call to fts:FTContainsExpr
is the value of the FTContainsExpr.
(FT5)
declare function fts:FTContainsExpr ( $searchContextItems as item()*, $ignoreNodes as node()*, $ftSelection as element(*,fts:ftSelection), $defOptions as element(fts:matchOptions) ) as xs:boolean { some $searchContext in $searchContextItems satisfies let $newSearchContext := fts:reconstruct( $searchContext, $ignoreNodes ) return if (fn:empty($newSearchContext)) then fn:false() else let $allMatches := fts:evaluate($ftSelection, $newSearchContext, $defOptions, 0) return some $match in $allMatches/fts:match satisfies fn:count($match/fts:stringExclude) eq 0 }; declare function fts:reconstruct ( $n as item(), $ignore as node()* ) as item()? { typeswitch ($n) case node() return if (some $i in $ignore satisfies $n is $i) then () else if ($n instance of element()) then let $nodeName := fn:node-name($n) let $nodeContent := for $nn in $n/node() return fts:reconstruct($nn,$ignore) return element {$nodeName} {$nodeContent} else if ($n instance of document-node()) then document { for $nn in $n/node() return fts:reconstruct($nn, $ignore) } else $n default return $n };
This section addresses the semantics of
scoring variables in XQuery 3.1
for
and
let
clauses and XPath 3.1
for
expressions.
Scoring variables associate a numeric score with the result of the evaluation of XQuery 3.1 and XPath 3.1 expressions. This numeric score tries to estimate the value of a result item to the user information need expressed using the XQuery 3.1 and XPath 3.1 expression. The numeric score is computed using an implementation-dependent scoring algorithm.
There are numerous scoring algorithms used in practice. Most of the scoring algorithms take as inputs a query and a set of results to the query. In computing the score, these algorithms rely on the structure of the query to estimate the relevance of the results.
In the context of defining the semantics of XQuery and XPath Full Text, passing the structure of the query poses a problem. The query may contain XQuery 3.1 and XPath 3.1 expressions and XQuery and XPath Full Text 3.1 expressions in particular. The semantics of XQuery 3.1 and XPath 3.1 expressions is defined using (among other things) functions that take as arguments sequences of items and return sequences of items. They are not aware of what expression produced a particular sequence, i.e., they are not aware of the expression structure.
To define the semantics of scoring in XQuery and XPath Full Text 3.1 using XQuery 3.1, expressions that produce the query result (or the functions that implement the expressions) must be passed as arguments. In other words, second-order functions are necessary. Currently XQuery 3.1 and XPath 3.1 do not provide such functions.
Nevertheless, in the interest of the exposition, assume
that such second-order functions are present. In particular, that
there are two semantic second-order function
fts:score
and fts:scoreSequence
that take one argument (an expression) and return the
score value of this expression, respectively a sequence
of score values, one for each item to which the expression
evaluates. The scores must satisfy scoring properties.
A for
clause containing a score variable
for $result score $score in Expr ...
is evaluated as though it is replaced by the following the set of clauses.
let $scoreSeq := fts:scoreSequence(Expr) for $result at $i in Expr let $score := $scoreSeq[$i] ...
Here, $scoreSeq
and $i
are
new variables, not appearing elsewhere, and
fts:scoreSequence
is the
second-order function.
Similarly, a let
clause containing a score variable
let score $score := Expr ...
is evaluated as though it is replaced by the following clause.
let $score := fts:score(Expr) ...
This section presents a more complex example for the evaluation of FTContainsExpr. This example uses the same
sample document fragment and assigns it $doc
.
Consider the following FTContainsExpr.
$doc contains text ( ( "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) window 11 words ) ftand ftnot "rust" ) same paragraph
Begin by evaluating the FTSelection to AllMatches.
( ( "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) window 11 words ) ftand ftnot "rust" ) same paragraph
Step 1: Evaluate the FTWords
"Mustang"
.
Step 2: Evaluate the FTWords
{"great", "excellent"} any word
.
Step 2.1: Match the token "great"
Step 2.2 Match the token "excellent"
Step 2.3 - Combine the above AllMatches as if FTOr is used, i.e., by forming a union of the Matches.
Step 3 - Apply the FTTimes
{("great", "excellent")} any word occurs at least 2 times
forming two pairs of Matches.
Step 4 - Apply the FTAnd
"Mustang"
ftand
({("great", "excellent")} any word occurs at least 2 times)
forming all possible pairs of StringMatches.
Step 5 - Apply the FTWindow
(
"Mustang"
ftand
({("great", "excellent")} any word occurs at least 2 times)
window 11 words
)
, filtering out Matches
for which the window is not less than or equal to 11 tokens.
Step 6 - Evaluate FTWords
"rust"
.
Step 7 - Apply the FTUnaryNot
ftnot "rust"
,
transforming the StringInclude
into a
StringExclude
.
Step 8 - Apply the FTAnd
(
(
"Mustang"
ftand
({("great", "excellent")} any word occurs at least 2 times)
window 11 words
)
ftand
ftnot "rust"
)
, forming all
possible combintations of three StringMatches from the first
AllMatches and one StringMatch from the second AllMatches.
Step 9: Apply the FTScope, filtering out
Matches whose TokenInfos are not within the same paragraph
(assuming the <offer>
elements determine
paragraph boundaries).
The resulting AllMatches contains a Match
that does not contain a StringExclude. Therefore, the
sample FTContainsExpr returns true
.
This section defines the conformance criteria for a XQuery and XPath Full Text 3.1 processor.
In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: MAY means that an item is truly optional.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.]
An XQuery and XPath Full Text 3.1 processor that claims to conform to this specification MUST include a claim of Minimal Conformance as defined in 5.1 Minimal Conformance. In addition to a claim of Minimal Conformance, it MAY claim conformance to one or more optional features defined in 5.2 Optional Features
Minimal Conformance to this specification MUST include all of the following items:
Minimal support for XQuery [XQuery 3.1: An XML Query Language] or XPath [XML Path Language (XPath) 3.1] . The optional features of XQuery [XQuery 3.1: An XML Query Language] or XPath [XML Path Language (XPath) 3.1] MAY be supported.
Support for everything specified in this document except those operators and match options specified in 5.2 Optional Features to be optional. If an implementation does not provide a given optional operator or match option, it MUST implement any requirements specified in 5.2 Optional Features for implementations that do not provide that operator or match option.
A definition of every item specified to be implementation-defined in I Checklist of Implementation-Defined Features.
Note:
Implementations are not required to define items specified to be implementation-dependent
It is optional whether the implementation supports the FTMildNot. If it does not support FTMildNot and encounters one in a full-text query, then it MUST raise an error [err:FTST0001].
The unrestricted form of negation in FTUnaryNot, that can negate every kind of FTSelection, is optional. Implementations may choose to support the negation operation in a restricted form, enforcing one or both of the following restrictions.
[Definition: Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.]
[Definition: Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)]
Consider the following example FTSelections.
1. ftnot "web" 2. "web" ftand ( ftnot "information" ftor "retrieval" ) 3. "web" ftand ftnot("information" ftand "retrieval") 4. "web" ftand ftnot("information" ftand "retrieval" window 5 words) 5. "web" ftand ("information" ftand ftnot "retrieval" window 5 words)
The first two FTSelections both violate restriction 1, while the third and
the fourth are conform with both restrictions. The fifth one violates
restriction 2, while obeying restriction 1. Note that in the last example
the FTSelection to which the window operation is applied is
"information" ftand ftnot "retrieval"
, which contains an FTUnaryNot
expression.
If the implementation does enforce one or both of these restrictions on FTUnaryNot and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0002].
Support for the "sentences" alternative of FTUnit and the "sentence" alternative of FTBigUnit is optional. Similarly, support for the "paragraphs" alternative of FTUnit and the "paragraph" alternative of FTBigUnit is optional. If an implementation does not support one or more choices of FTUnit or FTBigUnit and encounters an unsupported FTUnit or FTBigUnit in a full-text query, then it MUST raise an error [err:FTST0003].
The unrestricted form of the FTOrder postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTOrder.
[Definition: Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.]
If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0010].
It is optional whether the implementation supports the FTScope operator. If it does not support FTScope and encounters one in a full-text query, then it MUST raise an error [err:FTST0004].
The unrestricted form of the FTWindow postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTWindow.
[Definition: Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.]
If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0011].
The unrestricted form of the FTDistance postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTDistance.
[Definition: Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.]
If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0011].
It is optional whether the implementation supports the FTTimes operator. If it does not support FTTimes and encounters one in a full-text query, then it MUST raise an error [err:FTST0005].
It is optional whether the implementation supports the FTContent operator. If it does not support FTContent and encounters one in a full-text query, then it MUST raise an error [err:FTST0012].
It is optional whether the implementation supports the "lowercase" and "uppercase" choices for the FTCaseOption. If it does not support these choices for the FTCaseOption and encounters an unsupported choice in a full-text query, then it MUST raise an error [err:FTST0015].
It is optional whether the implementation supports the FTStopWordOption. If it does not support FTStopWordOption and encounters one in a full-text query, then it MUST raise an error [err:FTST0006].
It is optional whether the implementation supports the FTStopWordOption in the body of the query. If it supports FTStopWordOption in the prolog, but not in the body of a query, and encounters one in the body of a query it MUST raise an error [err:FTST0006].
It is optional whether the implementation supports the StringLiteral alternative of FTStopWords in the FTStopWordOption. If it does not support the StringLiteral alternative of FTStopWords and encounters such an alternative in a full-text query, then it MUST raise an error [err:FTST0006].
It is optional whether the implementation supports the unrestricted form of FTLanguageOption. Implementations may choose to enforce the following restriction on the use of FTLanguageOption.
[Definition: Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.]
If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0013].
The implementation may constrain the set of ignored nodes. If the operand of FTIgnoreOption violates the implementation-defined restriction on that operand, it MUST raise an error [err:FTST0007].
The implementation may restrict the allowable expressions used to compute scores. The restrictions are implementation-defined.
If the implementation does enforce such restrictions and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0014].
An implementation may constrain the range of valid weights to non-negative values. If an implementation does enforce this restriction and encounters a full-text query that uses a negative weight, it MUST raise an error [err:FTDY0016].
The EBNF in this document and in this section is aligned with the current XML Query 3.1 grammar (see http://www.w3.org/TR/xquery-31/).
[1] | Module | ::= |
VersionDecl? (LibraryModule | MainModule) | |
[2] | VersionDecl | ::= | "xquery" (("encoding" StringLiteral) | ("version" StringLiteral ("encoding" StringLiteral)?)) Separator
| |
[3] | MainModule | ::= |
Prolog
QueryBody
| |
[4] | LibraryModule | ::= |
ModuleDecl
Prolog
| |
[5] | ModuleDecl | ::= | "module" "namespace" NCName "=" URILiteral
Separator
| |
[6] | Prolog | ::= | ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((ContextItemDecl | AnnotatedDecl | OptionDecl) Separator)* | |
[7] | Separator | ::= | ";" | |
[8] | Setter | ::= |
BoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDecl | DecimalFormatDecl
| |
[9] | BoundarySpaceDecl | ::= | "declare" "boundary-space" ("preserve" | "strip") | |
[10] | DefaultCollationDecl | ::= | "declare" "default" "collation" URILiteral
| |
[11] | BaseURIDecl | ::= | "declare" "base-uri" URILiteral
| |
[12] | ConstructionDecl | ::= | "declare" "construction" ("strip" | "preserve") | |
[13] | OrderingModeDecl | ::= | "declare" "ordering" ("ordered" | "unordered") | |
[14] | EmptyOrderDecl | ::= | "declare" "default" "order" "empty" ("greatest" | "least") | |
[15] | CopyNamespacesDecl | ::= | "declare" "copy-namespaces" PreserveMode "," InheritMode
| |
[16] | PreserveMode | ::= | "preserve" | "no-preserve" | |
[17] | InheritMode | ::= | "inherit" | "no-inherit" | |
[18] | DecimalFormatDecl | ::= | "declare" (("decimal-format" EQName) | ("default" "decimal-format")) (DFPropertyName "=" StringLiteral)* | |
[19] | DFPropertyName | ::= | "decimal-separator" | "grouping-separator" | "infinity" | "minus-sign" | "NaN" | "percent" | "per-mille" | "zero-digit" | "digit" | "pattern-separator" | "exponent-separator" | |
[20] | Import | ::= |
SchemaImport | ModuleImport
| |
[21] | SchemaImport | ::= | "import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)? | |
[22] | SchemaPrefix | ::= | ("namespace" NCName "=") | ("default" "element" "namespace") | |
[23] | ModuleImport | ::= | "import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)? | |
[24] | NamespaceDecl | ::= | "declare" "namespace" NCName "=" URILiteral
| |
[25] | DefaultNamespaceDecl | ::= | "declare" "default" ("element" | "function") "namespace" URILiteral
| |
[26] | FTOptionDecl | ::= | "declare" "ft-option" FTMatchOptions
| |
[27] | AnnotatedDecl | ::= | "declare" Annotation* (VarDecl | FunctionDecl) | |
[28] | Annotation | ::= | "%" EQName ("(" Literal ("," Literal)* ")")? | |
[29] | VarDecl | ::= | "variable" "$" VarName
TypeDeclaration? ((":=" VarValue) | ("external" (":=" VarDefaultValue)?)) | |
[30] | VarValue | ::= |
ExprSingle
| |
[31] | VarDefaultValue | ::= |
ExprSingle
| |
[32] | ContextItemDecl | ::= | "declare" "context" "item" ("as" ItemType)? ((":=" VarValue) | ("external" (":=" VarDefaultValue)?)) | |
[33] | FunctionDecl | ::= | "function" EQName "(" ParamList? ")" ("as" SequenceType)? (FunctionBody | "external") | /* xgc: reserved-function-namesXQ31 */ |
[34] | ParamList | ::= |
Param ("," Param)* | |
[35] | Param | ::= | "$" EQName
TypeDeclaration? | |
[36] | FunctionBody | ::= |
EnclosedExpr
| |
[37] | EnclosedExpr | ::= | "{" Expr "}" | |
[38] | OptionDecl | ::= | "declare" "option" EQName
StringLiteral
| |
[39] | QueryBody | ::= |
Expr
| |
[40] | Expr | ::= |
ExprSingle ("," ExprSingle)* | |
[41] | ExprSingle | ::= |
FLWORExpr
| |
[42] | FLWORExpr | ::= |
InitialClause
IntermediateClause* ReturnClause
| |
[43] | InitialClause | ::= |
ForClause | LetClause | WindowClause
| |
[44] | IntermediateClause | ::= |
InitialClause | WhereClause | GroupByClause | OrderByClause | CountClause
| |
[45] | ForClause | ::= | "for" ForBinding ("," ForBinding)* | |
[46] | ForBinding | ::= | "$" VarName
TypeDeclaration? AllowingEmpty? PositionalVar? FTScoreVar? "in" ExprSingle
| |
[47] | AllowingEmpty | ::= | "allowing" "empty" | |
[48] | PositionalVar | ::= | "at" "$" VarName
| |
[49] | FTScoreVar | ::= | "score" "$" VarName
| |
[50] | LetClause | ::= | "let" LetBinding ("," LetBinding)* | |
[51] | LetBinding | ::= | (("$" VarName
TypeDeclaration?) | FTScoreVar) ":=" ExprSingle
| |
[52] | WindowClause | ::= | "for" (TumblingWindowClause | SlidingWindowClause) | |
[53] | TumblingWindowClause | ::= | "tumbling" "window" "$" VarName
TypeDeclaration? "in" ExprSingle
WindowStartCondition
WindowEndCondition? | |
[54] | SlidingWindowClause | ::= | "sliding" "window" "$" VarName
TypeDeclaration? "in" ExprSingle
WindowStartCondition
WindowEndCondition
| |
[55] | WindowStartCondition | ::= | "start" WindowVars "when" ExprSingle
| |
[56] | WindowEndCondition | ::= | "only"? "end" WindowVars "when" ExprSingle
| |
[57] | WindowVars | ::= | ("$" CurrentItem)? PositionalVar? ("previous" "$" PreviousItem)? ("next" "$" NextItem)? | |
[58] | CurrentItem | ::= |
EQName
| |
[59] | PreviousItem | ::= |
EQName
| |
[60] | NextItem | ::= |
EQName
| |
[61] | CountClause | ::= | "count" "$" VarName
| |
[62] | WhereClause | ::= | "where" ExprSingle
| |
[63] | GroupByClause | ::= | "group" "by" GroupingSpecList
| |
[64] | GroupingSpecList | ::= |
GroupingSpec ("," GroupingSpec)* | |
[65] | GroupingSpec | ::= |
GroupingVariable (TypeDeclaration? ":=" ExprSingle)? ("collation" URILiteral)? | |
[66] | GroupingVariable | ::= | "$" VarName
| |
[67] | OrderByClause | ::= | (("order" "by") | ("stable" "order" "by")) OrderSpecList
| |
[68] | OrderSpecList | ::= |
OrderSpec ("," OrderSpec)* | |
[69] | OrderSpec | ::= |
ExprSingle
OrderModifier
| |
[70] | OrderModifier | ::= | ("ascending" | "descending")? ("empty" ("greatest" | "least"))? ("collation" URILiteral)? | |
[71] | ReturnClause | ::= | "return" ExprSingle
| |
[72] | QuantifiedExpr | ::= | ("some" | "every") "$" VarName
TypeDeclaration? "in" ExprSingle ("," "$" VarName
TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle
| |
[73] | SwitchExpr | ::= | "switch" "(" Expr ")" SwitchCaseClause+ "default" "return" ExprSingle
| |
[74] | SwitchCaseClause | ::= | ("case" SwitchCaseOperand)+ "return" ExprSingle
| |
[75] | SwitchCaseOperand | ::= |
ExprSingle
| |
[76] | TypeswitchExpr | ::= | "typeswitch" "(" Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle
| |
[77] | CaseClause | ::= | "case" ("$" VarName "as")? SequenceTypeUnion "return" ExprSingle
| |
[78] | SequenceTypeUnion | ::= |
SequenceType ("|" SequenceType)* | |
[79] | IfExpr | ::= | "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
| |
[80] | TryCatchExpr | ::= |
TryClause
CatchClause+ | |
[81] | TryClause | ::= | "try" "{" TryTargetExpr "}" | |
[82] | TryTargetExpr | ::= |
Expr
| |
[83] | CatchClause | ::= | "catch" CatchErrorList "{" Expr "}" | |
[84] | CatchErrorList | ::= |
NameTest ("|" NameTest)* | |
[85] | OrExpr | ::= |
AndExpr ( "or" AndExpr )* | |
[86] | AndExpr | ::= |
ComparisonExpr ( "and" ComparisonExpr )* | |
[87] | ComparisonExpr | ::= |
FTContainsExpr ( (ValueComp
| |
[88] | FTContainsExpr | ::= |
StringConcatExpr ( "contains" "text" FTSelection
FTIgnoreOption? )? | |
[89] | StringConcatExpr | ::= |
RangeExpr ( "||" RangeExpr )* | |
[90] | RangeExpr | ::= |
AdditiveExpr ( "to" AdditiveExpr )? | |
[91] | AdditiveExpr | ::= |
MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )* | |
[92] | MultiplicativeExpr | ::= |
UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )* | |
[93] | UnionExpr | ::= |
IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )* | |
[94] | IntersectExceptExpr | ::= |
InstanceofExpr ( ("intersect" | "except") InstanceofExpr )* | |
[95] | InstanceofExpr | ::= |
TreatExpr ( "instance" "of" SequenceType )? | |
[96] | TreatExpr | ::= |
CastableExpr ( "treat" "as" SequenceType )? | |
[97] | CastableExpr | ::= |
CastExpr ( "castable" "as" SingleType )? | |
[98] | CastExpr | ::= |
ArrowExpr ( "cast" "as" SingleType )? | |
[99] | ArrowExpr | ::= |
UnaryExpr ( "=>" ArrowFunctionSpecifier
ArgumentList )* | |
[100] | UnaryExpr | ::= | ("-" | "+")* ValueExpr
| |
[101] | ValueExpr | ::= |
ValidateExpr | ExtensionExpr | SimpleMapExpr
| |
[102] | GeneralComp | ::= | "=" | "!=" | "<" | "<=" | ">" | ">=" | |
[103] | ValueComp | ::= | "eq" | "ne" | "lt" | "le" | "gt" | "ge" | |
[104] | NodeComp | ::= | "is" | "<<" | ">>" | |
[105] | ValidateExpr | ::= | "validate" (ValidationMode | ("type" TypeName))? "{" Expr "}" | |
[106] | ValidationMode | ::= | "lax" | "strict" | |
[107] | ExtensionExpr | ::= |
Pragma+ "{" Expr? "}" | |
[108] | Pragma | ::= | "(#" S? EQName (S
PragmaContents)? "#)" | /* ws: explicitXQ31 */ |
[109] | PragmaContents | ::= | (Char* - (Char* '#)' Char*)) | |
[110] | SimpleMapExpr | ::= |
PathExpr ("!" PathExpr)* | |
[111] | PathExpr | ::= | ("/" RelativePathExpr?) | /* xgc: leading-lone-slashXQ31 */ |
[112] | RelativePathExpr | ::= |
StepExpr (("/" | "//") StepExpr)* | |
[113] | StepExpr | ::= |
PostfixExpr | AxisStep
| |
[114] | AxisStep | ::= | (ReverseStep | ForwardStep) PredicateList
| |
[115] | ForwardStep | ::= | (ForwardAxis
NodeTest) | AbbrevForwardStep
| |
[116] | ForwardAxis | ::= | ("child" "::") | |
[117] | AbbrevForwardStep | ::= | "@"? NodeTest
| |
[118] | ReverseStep | ::= | (ReverseAxis
NodeTest) | AbbrevReverseStep
| |
[119] | ReverseAxis | ::= | ("parent" "::") | |
[120] | AbbrevReverseStep | ::= | ".." | |
[121] | NodeTest | ::= |
KindTest | NameTest
| |
[122] | NameTest | ::= |
EQName | Wildcard
| |
[123] | Wildcard | ::= | "*" | /* ws: explicitXQ31 */ |
[124] | PostfixExpr | ::= |
PrimaryExpr (Predicate | ArgumentList | Lookup)* | |
[125] | ArgumentList | ::= | "(" (Argument ("," Argument)*)? ")" | |
[126] | PredicateList | ::= |
Predicate* | |
[127] | Predicate | ::= | "[" Expr "]" | |
[128] | Lookup | ::= | "?" KeySpecifier
| |
[129] | KeySpecifier | ::= |
NCName | IntegerLiteral | ParenthesizedExpr | "*" | |
[130] | ArrowFunctionSpecifier | ::= |
EQName | VarRef | ParenthesizedExpr
| |
[131] | PrimaryExpr | ::= |
Literal
| |
[132] | Literal | ::= |
NumericLiteral | StringLiteral
| |
[133] | NumericLiteral | ::= |
IntegerLiteral | DecimalLiteral | DoubleLiteral
| |
[134] | VarRef | ::= | "$" VarName
| |
[135] | VarName | ::= |
EQName
| |
[136] | ParenthesizedExpr | ::= | "(" Expr? ")" | |
[137] | ContextItemExpr | ::= | "." | |
[138] | OrderedExpr | ::= | "ordered" "{" Expr "}" | |
[139] | UnorderedExpr | ::= | "unordered" "{" Expr "}" | |
[140] | FunctionCall | ::= |
EQName
ArgumentList
| /* xgc: reserved-function-namesXQ31 */ |
/* gn: parensXQ31 */ | ||||
[141] | Argument | ::= |
ExprSingle | ArgumentPlaceholder
| |
[142] | ArgumentPlaceholder | ::= | "?" | |
[143] | NodeConstructor | ::= |
DirectConstructor
| |
[144] | DirectConstructor | ::= |
DirElemConstructor
| |
[145] | DirElemConstructor | ::= | "<" QName
DirAttributeList ("/>" | (">" DirElemContent* "</" QName
S? ">")) | /* ws: explicitXQ31 */ |
[146] | DirAttributeList | ::= | (S (QName
S? "=" S? DirAttributeValue)?)* | /* ws: explicitXQ31 */ |
[147] | DirAttributeValue | ::= | ('"' (EscapeQuot | QuotAttrValueContent)* '"') | /* ws: explicitXQ31 */ |
[148] | QuotAttrValueContent | ::= |
QuotAttrContentChar
| |
[149] | AposAttrValueContent | ::= |
AposAttrContentChar
| |
[150] | DirElemContent | ::= |
DirectConstructor
| |
[151] | CommonContent | ::= |
PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr
| |
[152] | DirCommentConstructor | ::= | "<!--" DirCommentContents "-->" | /* ws: explicitXQ31 */ |
[153] | DirCommentContents | ::= | ((Char - '-') | ('-' (Char - '-')))* | /* ws: explicitXQ31 */ |
[154] | DirPIConstructor | ::= | "<?" PITarget (S
DirPIContents)? "?>" | /* ws: explicitXQ31 */ |
[155] | DirPIContents | ::= | (Char* - (Char* '?>' Char*)) | /* ws: explicitXQ31 */ |
[156] | CDataSection | ::= | "<![CDATA[" CDataSectionContents "]]>" | /* ws: explicitXQ31 */ |
[157] | CDataSectionContents | ::= | (Char* - (Char* ']]>' Char*)) | /* ws: explicitXQ31 */ |
[158] | ComputedConstructor | ::= |
CompDocConstructor
| |
[159] | CompDocConstructor | ::= | "document" "{" Expr "}" | |
[160] | CompElemConstructor | ::= | "element" (EQName | ("{" Expr "}")) "{" ContentExpr? "}" | |
[161] | ContentExpr | ::= |
Expr
| |
[162] | CompAttrConstructor | ::= | "attribute" (EQName | ("{" Expr "}")) "{" Expr? "}" | |
[163] | CompNamespaceConstructor | ::= | "namespace" (Prefix | ("{" PrefixExpr "}")) "{" URIExpr "}" | |
[164] | Prefix | ::= |
NCName
| |
[165] | PrefixExpr | ::= |
Expr
| |
[166] | URIExpr | ::= |
Expr
| |
[167] | CompTextConstructor | ::= | "text" "{" Expr "}" | |
[168] | CompCommentConstructor | ::= | "comment" "{" Expr "}" | |
[169] | CompPIConstructor | ::= | "processing-instruction" (NCName | ("{" Expr "}")) "{" Expr? "}" | |
[170] | FunctionItemExpr | ::= |
NamedFunctionRef | InlineFunctionExpr
| |
[171] | NamedFunctionRef | ::= |
EQName "#" IntegerLiteral
| /* xgc: reserved-function-namesXQ31 */ |
[172] | InlineFunctionExpr | ::= |
Annotation* "function" "(" ParamList? ")" ("as" SequenceType)? FunctionBody
| |
[173] | MapConstructor | ::= | "map" "{" (MapConstructorEntry ("," MapConstructorEntry)*)? "}" | |
[174] | MapConstructorEntry | ::= |
MapKeyExpr ":" MapValueExpr
| |
[175] | MapKeyExpr | ::= |
ExprSingle
| |
[176] | MapValueExpr | ::= |
ExprSingle
| |
[177] | ArrayConstructor | ::= |
SquareArrayConstructor | CurlyArrayConstructor
| |
[178] | SquareArrayConstructor | ::= | "[" (ExprSingle ("," ExprSingle)*)? "]" | |
[179] | CurlyArrayConstructor | ::= | "array" "{" Expr? "}" | |
[180] | UnaryLookup | ::= | "?" KeySpecifier
| |
[181] | SingleType | ::= |
SimpleTypeName "?"? | |
[182] | TypeDeclaration | ::= | "as" SequenceType
| |
[183] | SequenceType | ::= | ("empty-sequence" "(" ")") | |
[184] | OccurrenceIndicator | ::= | "?" | "*" | "+" | /* xgc: occurrence-indicatorsXQ31 */ |
[185] | ItemType | ::= |
KindTest | ("item" "(" ")") | FunctionTest | MapTest | ArrayTest | AtomicOrUnionType | ParenthesizedItemType
| |
[186] | AtomicOrUnionType | ::= |
EQName
| |
[187] | KindTest | ::= |
DocumentTest
| |
[188] | AnyKindTest | ::= | "node" "(" ")" | |
[189] | DocumentTest | ::= | "document-node" "(" (ElementTest | SchemaElementTest)? ")" | |
[190] | TextTest | ::= | "text" "(" ")" | |
[191] | CommentTest | ::= | "comment" "(" ")" | |
[192] | NamespaceNodeTest | ::= | "namespace-node" "(" ")" | |
[193] | PITest | ::= | "processing-instruction" "(" (NCName | StringLiteral)? ")" | |
[194] | AttributeTest | ::= | "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")" | |
[195] | AttribNameOrWildcard | ::= |
AttributeName | "*" | |
[196] | SchemaAttributeTest | ::= | "schema-attribute" "(" AttributeDeclaration ")" | |
[197] | AttributeDeclaration | ::= |
AttributeName
| |
[198] | ElementTest | ::= | "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")" | |
[199] | ElementNameOrWildcard | ::= |
ElementName | "*" | |
[200] | SchemaElementTest | ::= | "schema-element" "(" ElementDeclaration ")" | |
[201] | ElementDeclaration | ::= |
ElementName
| |
[202] | AttributeName | ::= |
EQName
| |
[203] | ElementName | ::= |
EQName
| |
[204] | SimpleTypeName | ::= |
TypeName
| |
[205] | TypeName | ::= |
EQName
| |
[206] | FunctionTest | ::= |
Annotation* (AnyFunctionTest
| |
[207] | AnyFunctionTest | ::= | "function" "(" "*" ")" | |
[208] | TypedFunctionTest | ::= | "function" "(" (SequenceType ("," SequenceType)*)? ")" "as" SequenceType
| |
[209] | MapTest | ::= |
AnyMapTest | TypedMapTest
| |
[210] | AnyMapTest | ::= | "map" "(" "*" ")" | |
[211] | TypedMapTest | ::= | "map" "(" AtomicOrUnionType "," SequenceType ")" | |
[212] | ArrayTest | ::= |
AnyArrayTest | TypedArrayTest
| |
[213] | AnyArrayTest | ::= | "array" "(" "*" ")" | |
[214] | TypedArrayTest | ::= | "array" "(" SequenceType ")" | |
[215] | ParenthesizedItemType | ::= | "(" ItemType ")" | |
[216] | URILiteral | ::= |
StringLiteral
| |
[217] | FTSelection | ::= |
FTOr
FTPosFilter* | |
[218] | FTWeight | ::= | "weight" "{" Expr "}" | |
[219] | FTOr | ::= |
FTAnd ( "ftor" FTAnd )* | |
[220] | FTAnd | ::= |
FTMildNot ( "ftand" FTMildNot )* | |
[221] | FTMildNot | ::= |
FTUnaryNot ( "not" "in" FTUnaryNot )* | |
[222] | FTUnaryNot | ::= | ("ftnot")? FTPrimaryWithOptions
| |
[223] | FTPrimaryWithOptions | ::= |
FTPrimary
FTMatchOptions? FTWeight? | |
[224] | FTPrimary | ::= | (FTWords
FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection
| |
[225] | FTWords | ::= |
FTWordsValue
FTAnyallOption? | |
[226] | FTWordsValue | ::= |
StringLiteral | ("{" Expr "}") | |
[227] | FTExtensionSelection | ::= |
Pragma+ "{" FTSelection? "}" | |
[228] | FTAnyallOption | ::= | ("any" "word"?) | ("all" "words"?) | "phrase" | |
[229] | FTTimes | ::= | "occurs" FTRange "times" | |
[230] | FTRange | ::= | ("exactly" AdditiveExpr) | |
[231] | FTPosFilter | ::= |
FTOrder | FTWindow | FTDistance | FTScope | FTContent
| |
[232] | FTOrder | ::= | "ordered" | |
[233] | FTWindow | ::= | "window" AdditiveExpr
FTUnit
| |
[234] | FTDistance | ::= | "distance" FTRange
FTUnit
| |
[235] | FTUnit | ::= | "words" | "sentences" | "paragraphs" | |
[236] | FTScope | ::= | ("same" | "different") FTBigUnit
| |
[237] | FTBigUnit | ::= | "sentence" | "paragraph" | |
[238] | FTContent | ::= | ("at" "start") | ("at" "end") | ("entire" "content") | |
[239] | FTMatchOptions | ::= | ("using" FTMatchOption)+ | |
[240] | FTMatchOption | ::= |
FTLanguageOption
| |
[241] | FTCaseOption | ::= | ("case" "insensitive") | |
[242] | FTDiacriticsOption | ::= | ("diacritics" "insensitive") | |
[243] | FTStemOption | ::= | "stemming" | ("no" "stemming") | |
[244] | FTThesaurusOption | ::= | ("thesaurus" (FTThesaurusID | "default")) | |
[245] | FTThesaurusID | ::= | "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")? | |
[246] | FTLiteralRange | ::= | ("exactly" IntegerLiteral) | |
[247] | FTStopWordOption | ::= | ("stop" "words" FTStopWords
FTStopWordsInclExcl*) | |
[248] | FTStopWords | ::= | ("at" URILiteral) | |
[249] | FTStopWordsInclExcl | ::= | ("union" | "except") FTStopWords
| |
[250] | FTLanguageOption | ::= | "language" StringLiteral
| |
[251] | FTWildCardOption | ::= | "wildcards" | ("no" "wildcards") | |
[252] | FTExtensionOption | ::= | "option" EQName
StringLiteral
| |
[253] | FTIgnoreOption | ::= | "without" "content" UnionExpr
| |
[254] | EQName | ::= |
QName | URIQualifiedName
|
[255] | IntegerLiteral | ::= |
Digits
| |
[256] | DecimalLiteral | ::= | ("." Digits) | (Digits "." [0-9]*) | /* ws: explicitXQ31 */ |
[257] | DoubleLiteral | ::= | (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits
| /* ws: explicitXQ31 */ |
[258] | StringLiteral | ::= | ('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'") | /* ws: explicitXQ31 */ |
[259] | URIQualifiedName | ::= |
BracedURILiteral
NCName
| /* ws: explicitXQ31 */ |
[260] | BracedURILiteral | ::= | "Q" "{" (PredefinedEntityRef | CharRef | [^&{}])* "}" | /* ws: explicitXQ31 */ |
[261] | PredefinedEntityRef | ::= | "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" | /* ws: explicitXQ31 */ |
[262] | EscapeQuot | ::= | '""' | |
[263] | EscapeApos | ::= | "''" | |
[264] | ElementContentChar | ::= | (Char - [{}<&]) | |
[265] | QuotAttrContentChar | ::= | (Char - ["{}<&]) | |
[266] | AposAttrContentChar | ::= | (Char - ['{}<&]) | |
[267] | Comment | ::= | "(:" (CommentContents | Comment)* ":)" | /* ws: explicitXQ31 */ |
/* gn: commentsXQ31 */ | ||||
[268] | PITarget | ::= |
[http://www.w3.org/TR/REC-xml#NT-PITarget]XML
| /* xgc: xml-versionXQ31 */ |
[269] | CharRef | ::= |
[http://www.w3.org/TR/REC-xml#NT-CharRef]XML
| /* xgc: xml-versionXQ31 */ |
[270] | QName | ::= |
[http://www.w3.org/TR/REC-xml-names/#NT-QName]Names
| /* xgc: xml-versionXQ31 */ |
[271] | NCName | ::= |
[http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names
| /* xgc: xml-versionXQ31 */ |
[272] | S | ::= |
[http://www.w3.org/TR/REC-xml#NT-S]XML
| /* xgc: xml-versionXQ31 */ |
[273] | Char | ::= |
[http://www.w3.org/TR/REC-xml#NT-Char]XML
| /* xgc: xml-versionXQ31 */ |
The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of A EBNF for XQuery 3.1 Grammar with Full Text extensions.
[274] | Digits | ::= | [0-9]+ |
[275] | CommentContents | ::= | (Char+ - (Char* ('(:' | ':)') Char*)) |
The EBNF in this document and in this section is aligned with the current XPath 3.1 grammar (see http://www.w3.org/TR/xpath-31/).
[1] | XPath | ::= |
Expr
| |
[2] | ParamList | ::= |
Param ("," Param)* | |
[3] | Param | ::= | "$" EQName
TypeDeclaration? | |
[4] | FunctionBody | ::= |
EnclosedExpr
| |
[5] | EnclosedExpr | ::= | "{" Expr "}" | |
[6] | Expr | ::= |
ExprSingle ("," ExprSingle)* | |
[7] | ExprSingle | ::= |
ForExpr
| |
[8] | ForExpr | ::= |
SimpleForClause "return" ExprSingle
| |
[9] | SimpleForClause | ::= | "for" SimpleForBinding ("," SimpleForBinding)* | |
[10] | SimpleForBinding | ::= | "$" VarName
FTScoreVar? "in" ExprSingle
| |
[11] | LetExpr | ::= |
SimpleLetClause "return" ExprSingle
| |
[12] | SimpleLetClause | ::= | "let" SimpleLetBinding ("," SimpleLetBinding)* | |
[13] | SimpleLetBinding | ::= | "$" VarName ":=" ExprSingle
| |
[14] | FTScoreVar | ::= | "score" "$" VarName
| |
[15] | QuantifiedExpr | ::= | ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle
| |
[16] | IfExpr | ::= | "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
| |
[17] | OrExpr | ::= |
AndExpr ( "or" AndExpr )* | |
[18] | AndExpr | ::= |
ComparisonExpr ( "and" ComparisonExpr )* | |
[19] | ComparisonExpr | ::= |
FTContainsExpr ( (ValueComp
| |
[20] | FTContainsExpr | ::= |
StringConcatExpr ( "contains" "text" FTSelection
FTIgnoreOption? )? | |
[21] | StringConcatExpr | ::= |
RangeExpr ( "||" RangeExpr )* | |
[22] | RangeExpr | ::= |
AdditiveExpr ( "to" AdditiveExpr )? | |
[23] | AdditiveExpr | ::= |
MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )* | |
[24] | MultiplicativeExpr | ::= |
UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )* | |
[25] | UnionExpr | ::= |
IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )* | |
[26] | IntersectExceptExpr | ::= |
InstanceofExpr ( ("intersect" | "except") InstanceofExpr )* | |
[27] | InstanceofExpr | ::= |
TreatExpr ( "instance" "of" SequenceType )? | |
[28] | TreatExpr | ::= |
CastableExpr ( "treat" "as" SequenceType )? | |
[29] | CastableExpr | ::= |
CastExpr ( "castable" "as" SingleType )? | |
[30] | CastExpr | ::= |
ArrowExpr ( "cast" "as" SingleType )? | |
[31] | ArrowExpr | ::= |
UnaryExpr ( "=>" ArrowFunctionSpecifier
ArgumentList )* | |
[32] | UnaryExpr | ::= | ("-" | "+")* ValueExpr
| |
[33] | ValueExpr | ::= |
SimpleMapExpr
| |
[34] | GeneralComp | ::= | "=" | "!=" | "<" | "<=" | ">" | ">=" | |
[35] | ValueComp | ::= | "eq" | "ne" | "lt" | "le" | "gt" | "ge" | |
[36] | NodeComp | ::= | "is" | "<<" | ">>" | |
[37] | Pragma | ::= | "(#" S? EQName (S
PragmaContents)? "#)" | /* ws: explicitXP31 */ |
[38] | PragmaContents | ::= | (Char* - (Char* '#)' Char*)) | |
[39] | SimpleMapExpr | ::= |
PathExpr ("!" PathExpr)* | |
[40] | PathExpr | ::= | ("/" RelativePathExpr?) | /* xgc: leading-lone-slashXP31 */ |
[41] | RelativePathExpr | ::= |
StepExpr (("/" | "//") StepExpr)* | |
[42] | StepExpr | ::= |
PostfixExpr | AxisStep
| |
[43] | AxisStep | ::= | (ReverseStep | ForwardStep) PredicateList
| |
[44] | ForwardStep | ::= | (ForwardAxis
NodeTest) | AbbrevForwardStep
| |
[45] | ForwardAxis | ::= | ("child" "::") | |
[46] | AbbrevForwardStep | ::= | "@"? NodeTest
| |
[47] | ReverseStep | ::= | (ReverseAxis
NodeTest) | AbbrevReverseStep
| |
[48] | ReverseAxis | ::= | ("parent" "::") | |
[49] | AbbrevReverseStep | ::= | ".." | |
[50] | NodeTest | ::= |
KindTest | NameTest
| |
[51] | NameTest | ::= |
EQName | Wildcard
| |
[52] | Wildcard | ::= | "*" | /* ws: explicitXP31 */ |
[53] | PostfixExpr | ::= |
PrimaryExpr (Predicate | ArgumentList | Lookup)* | |
[54] | ArgumentList | ::= | "(" (Argument ("," Argument)*)? ")" | |
[55] | PredicateList | ::= |
Predicate* | |
[56] | Predicate | ::= | "[" Expr "]" | |
[57] | Lookup | ::= | "?" KeySpecifier
| |
[58] | KeySpecifier | ::= |
NCName | IntegerLiteral | ParenthesizedExpr | "*" | |
[59] | ArrowFunctionSpecifier | ::= |
EQName | VarRef | ParenthesizedExpr
| |
[60] | PrimaryExpr | ::= |
Literal
| |
[61] | Literal | ::= |
NumericLiteral | StringLiteral
| |
[62] | NumericLiteral | ::= |
IntegerLiteral | DecimalLiteral | DoubleLiteral
| |
[63] | VarRef | ::= | "$" VarName
| |
[64] | VarName | ::= |
EQName
| |
[65] | ParenthesizedExpr | ::= | "(" Expr? ")" | |
[66] | ContextItemExpr | ::= | "." | |
[67] | FunctionCall | ::= |
EQName
ArgumentList
| /* xgc: reserved-function-namesXP31 */ |
/* gn: parensXP31 */ | ||||
[68] | Argument | ::= |
ExprSingle | ArgumentPlaceholder
| |
[69] | ArgumentPlaceholder | ::= | "?" | |
[70] | FunctionItemExpr | ::= |
NamedFunctionRef | InlineFunctionExpr
| |
[71] | NamedFunctionRef | ::= |
EQName "#" IntegerLiteral
| /* xgc: reserved-function-namesXP31 */ |
[72] | InlineFunctionExpr | ::= | "function" "(" ParamList? ")" ("as" SequenceType)? FunctionBody
| |
[73] | MapConstructor | ::= | "map" "{" (MapConstructorEntry ("," MapConstructorEntry)*)? "}" | |
[74] | MapConstructorEntry | ::= |
MapKeyExpr ":" MapValueExpr
| |
[75] | MapKeyExpr | ::= |
ExprSingle
| |
[76] | MapValueExpr | ::= |
ExprSingle
| |
[77] | ArrayConstructor | ::= |
SquareArrayConstructor | CurlyArrayConstructor
| |
[78] | SquareArrayConstructor | ::= | "[" (ExprSingle ("," ExprSingle)*)? "]" | |
[79] | CurlyArrayConstructor | ::= | "array" "{" Expr? "}" | |
[80] | UnaryLookup | ::= | "?" KeySpecifier
| |
[81] | SingleType | ::= |
SimpleTypeName "?"? | |
[82] | TypeDeclaration | ::= | "as" SequenceType
| |
[83] | SequenceType | ::= | ("empty-sequence" "(" ")") | |
[84] | OccurrenceIndicator | ::= | "?" | "*" | "+" | /* xgc: occurrence-indicatorsXP31 */ |
[85] | ItemType | ::= |
KindTest | ("item" "(" ")") | FunctionTest | MapTest | ArrayTest | AtomicOrUnionType | ParenthesizedItemType
| |
[86] | AtomicOrUnionType | ::= |
EQName
| |
[87] | KindTest | ::= |
DocumentTest
| |
[88] | AnyKindTest | ::= | "node" "(" ")" | |
[89] | DocumentTest | ::= | "document-node" "(" (ElementTest | SchemaElementTest)? ")" | |
[90] | TextTest | ::= | "text" "(" ")" | |
[91] | CommentTest | ::= | "comment" "(" ")" | |
[92] | NamespaceNodeTest | ::= | "namespace-node" "(" ")" | |
[93] | PITest | ::= | "processing-instruction" "(" (NCName | StringLiteral)? ")" | |
[94] | AttributeTest | ::= | "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")" | |
[95] | AttribNameOrWildcard | ::= |
AttributeName | "*" | |
[96] | SchemaAttributeTest | ::= | "schema-attribute" "(" AttributeDeclaration ")" | |
[97] | AttributeDeclaration | ::= |
AttributeName
| |
[98] | ElementTest | ::= | "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")" | |
[99] | ElementNameOrWildcard | ::= |
ElementName | "*" | |
[100] | SchemaElementTest | ::= | "schema-element" "(" ElementDeclaration ")" | |
[101] | ElementDeclaration | ::= |
ElementName
| |
[102] | AttributeName | ::= |
EQName
| |
[103] | ElementName | ::= |
EQName
| |
[104] | SimpleTypeName | ::= |
TypeName
| |
[105] | TypeName | ::= |
EQName
| |
[106] | FunctionTest | ::= |
AnyFunctionTest
| |
[107] | AnyFunctionTest | ::= | "function" "(" "*" ")" | |
[108] | TypedFunctionTest | ::= | "function" "(" (SequenceType ("," SequenceType)*)? ")" "as" SequenceType
| |
[109] | MapTest | ::= |
AnyMapTest | TypedMapTest
| |
[110] | AnyMapTest | ::= | "map" "(" "*" ")" | |
[111] | TypedMapTest | ::= | "map" "(" AtomicOrUnionType "," SequenceType ")" | |
[112] | ArrayTest | ::= |
AnyArrayTest | TypedArrayTest
| |
[113] | AnyArrayTest | ::= | "array" "(" "*" ")" | |
[114] | TypedArrayTest | ::= | "array" "(" SequenceType ")" | |
[115] | ParenthesizedItemType | ::= | "(" ItemType ")" | |
[116] | URILiteral | ::= |
StringLiteral
| |
[117] | FTSelection | ::= |
FTOr
FTPosFilter* | |
[118] | FTWeight | ::= | "weight" "{" Expr "}" | |
[119] | FTOr | ::= |
FTAnd ( "ftor" FTAnd )* | |
[120] | FTAnd | ::= |
FTMildNot ( "ftand" FTMildNot )* | |
[121] | FTMildNot | ::= |
FTUnaryNot ( "not" "in" FTUnaryNot )* | |
[122] | FTUnaryNot | ::= | ("ftnot")? FTPrimaryWithOptions
| |
[123] | FTPrimaryWithOptions | ::= |
FTPrimary
FTMatchOptions? FTWeight? | |
[124] | FTPrimary | ::= | (FTWords
FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection
| |
[125] | FTWords | ::= |
FTWordsValue
FTAnyallOption? | |
[126] | FTWordsValue | ::= |
StringLiteral | ("{" Expr "}") | |
[127] | FTExtensionSelection | ::= |
Pragma+ "{" FTSelection? "}" | |
[128] | FTAnyallOption | ::= | ("any" "word"?) | ("all" "words"?) | "phrase" | |
[129] | FTTimes | ::= | "occurs" FTRange "times" | |
[130] | FTRange | ::= | ("exactly" AdditiveExpr) | |
[131] | FTPosFilter | ::= |
FTOrder | FTWindow | FTDistance | FTScope | FTContent
| |
[132] | FTOrder | ::= | "ordered" | |
[133] | FTWindow | ::= | "window" AdditiveExpr
FTUnit
| |
[134] | FTDistance | ::= | "distance" FTRange
FTUnit
| |
[135] | FTUnit | ::= | "words" | "sentences" | "paragraphs" | |
[136] | FTScope | ::= | ("same" | "different") FTBigUnit
| |
[137] | FTBigUnit | ::= | "sentence" | "paragraph" | |
[138] | FTContent | ::= | ("at" "start") | ("at" "end") | ("entire" "content") | |
[139] | FTMatchOptions | ::= | ("using" FTMatchOption)+ | |
[140] | FTMatchOption | ::= |
FTLanguageOption
| |
[141] | FTCaseOption | ::= | ("case" "insensitive") | |
[142] | FTDiacriticsOption | ::= | ("diacritics" "insensitive") | |
[143] | FTStemOption | ::= | "stemming" | ("no" "stemming") | |
[144] | FTThesaurusOption | ::= | ("thesaurus" (FTThesaurusID | "default")) | |
[145] | FTThesaurusID | ::= | "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")? | |
[146] | FTLiteralRange | ::= | ("exactly" IntegerLiteral) | |
[147] | FTStopWordOption | ::= | ("stop" "words" FTStopWords
FTStopWordsInclExcl*) | |
[148] | FTStopWords | ::= | ("at" URILiteral) | |
[149] | FTStopWordsInclExcl | ::= | ("union" | "except") FTStopWords
| |
[150] | FTLanguageOption | ::= | "language" StringLiteral
| |
[151] | FTWildCardOption | ::= | "wildcards" | ("no" "wildcards") | |
[152] | FTExtensionOption | ::= | "option" EQName
StringLiteral
| |
[153] | FTIgnoreOption | ::= | "without" "content" UnionExpr
| |
[154] | EQName | ::= |
QName | URIQualifiedName
|
[155] | IntegerLiteral | ::= |
Digits
| |
[156] | DecimalLiteral | ::= | ("." Digits) | (Digits "." [0-9]*) | /* ws: explicitXP31 */ |
[157] | DoubleLiteral | ::= | (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits
| /* ws: explicitXP31 */ |
[158] | StringLiteral | ::= | ('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'") | /* ws: explicitXP31 */ |
[159] | URIQualifiedName | ::= |
BracedURILiteral
NCName
| /* ws: explicitXP31 */ |
[160] | BracedURILiteral | ::= | "Q" "{" [^{}]* "}" | /* ws: explicitXP31 */ |
[161] | EscapeQuot | ::= | '""' | |
[162] | EscapeApos | ::= | "''" | |
[163] | Comment | ::= | "(:" (CommentContents | Comment)* ":)" | /* ws: explicitXP31 */ |
/* gn: commentsXP31 */ | ||||
[164] | QName | ::= |
[http://www.w3.org/TR/REC-xml-names/#NT-QName]Names
| /* xgc: xml-versionXP31 */ |
[165] | NCName | ::= |
[http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names
| /* xgc: xml-versionXP31 */ |
[166] | S | ::= |
[http://www.w3.org/TR/REC-xml#NT-S]XML
| /* xgc: xml-versionXP31 */ |
[167] | Char | ::= |
[http://www.w3.org/TR/REC-xml#NT-Char]XML
| /* xgc: xml-versionXP31 */ |
The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of B EBNF for XPath 3.1 Grammar with Full-Text extensions.
[168] | Digits | ::= | [0-9]+ |
[169] | CommentContents | ::= | (Char+ - (Char* ('(:' | ':)') Char*)) |
The following table describes the full-text components of the static context (as defined in Section 2.1.1 Static ContextXQ). The following aspects of each component are described:
Default initial value: This is the initial value of the component if it is not overridden or augmented by the implementation or by a query.
Can be overwritten or augmented by implementation: Indicates whether an XQuery implementation is allowed to replace the default initial value of the component by a different, implementation-defined value and/or to augment the default initial value by additional implementation-defined values.
Can be overwritten or augmented by a query: Indicates whether a query is allowed to replace and/or augment the initial value provided by default or by the implementation. If so, indicates how this is accomplished (for example, by a declaration in the prolog; as defined in Section 4 Modules and PrologsXQ).
Scope: Indicates where the component is applicable. "Global" indicates that the component applies globally, throughout all the modules used in a query. "Module" indicates that the component applies throughout a module (as defined in Section 4 Modules and PrologsXQ). "Lexical" indicates that the component applies within the expression in which it is defined (equivalent to "module", if the component is declared in a prolog.)
Consistency Rules: Indicates rules that must be observed in assigning values to the component.
Component | Default initial value | Can be overwritten or augmented by implementation? | Can be overwritten or augmented by a query? | Scope | Consistency rules |
---|---|---|---|---|---|
FTCaseOption |
case
insensitive
| overwriteable | overwriteable by prolog | lexical | Value must be
case insensitive , case sensitive ,
lowercase , or uppercase . |
FTDiacriticsOption |
diacritics insensitive
| overwriteable | overwriteable by prolog | lexical | Value must be diacritics insensitive or
diacritics sensitive . |
FTStemOption |
no stemming
| overwriteable | overwriteable by prolog | lexical | Value must be stemming or
no stemming . |
FTThesaurusOption |
no thesaurus
| overwriteable | overwriteable by prolog (refer to default to augment) | lexical | Each URI in the value must be found in the statically known thesauri. |
Statically known thesauri | none | augmentable | cannot be augmented or overwritten by prolog | module | Each URI uniquely identifies a thesaurus list. |
FTStopWordOption |
no stop words
| overwriteable | overwriteable by prolog (refer to default to augment) | lexical | Each URI in the value must be found in the statically known stop word lists. |
Statically known stop word lists | none | augmentable | cannot be augmented or overwritten by prolog | module | Each URI uniquely identifies a stop word list. |
FTLanguageOption | implementation-defined | overwriteable | overwriteable by prolog | lexical | Value must be castable to xs:language . |
Statically known languages | none | augmentable | cannot be augmented or overwritten by prolog | module | Each string uniquely identifies a language. |
FTWildCardOption |
no wildcards
| no | overwriteable by prolog | lexical | Value must be wildcards or no
wildcards . |
An implementation that does not support the FTMildNot operator must raise a static error if a full-text query contains a mild not.
An implementation that enforces one of the restrictions on FTUnaryNot must raise a static error if a full-text query does not obey the restriction.
An implementation that does not support one or more of the choices on FTUnit and FTBigUnit must raise a static error if a full-text query contains one of those choices.
An implementation that does not support the FTScope operator must raise a static error if a full-text query contains a scope.
An implementation that does not support the FTTimes operator must raise a static error if a full-text query contains a times.
An implementation that restricts the use of FTStopWordOption must raise a static error if a full-text query contains a stop word option that does not meet the restriction.
An implementation that restricts the use of FTIgnoreOption must raise a static error if a full-text query contains an ignore option that does not meet the restriction.
It is a static error if, during the static analysis phase, the query is found to contain a stop word option that refers to a stop word list that is not found in the statically known stop word lists.
It may be a static error if, during the static analysis phase, the query is found to contain a language identifier in a language option that the implementation does not support. The implementation may choose not to raise this error and instead provide some other implementation-defined behavior.
It is a static error if, during the static analysis phase, an expression is found to use an FTOrder operator that does not appear directly succeeding an FTWindow or an FTDistance operator and the implementation enforces this restriction.
An implementation may restrict the use of FTWindow and FTDistance to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor. If it a static error if, during the static analysis phase, an expression is found that violates this restriction and the implementation enforces this restriction.
An implementation that does not support the FTContent operator must raise a static error if a full-text query contains one.
It is a static error if, during the static analysis phase, an implementation that restricts the use of FTLanguageOption to a single language, encounters more than one distinct language option.
An implementation may constrain the form of the expression used to compute scores. It is a static error if, during the static analysis phase, such an implementation encounters a scoring expression that does not meet the restriction.
It is a static error if, during the static analysis phase, an implementation that restricts the choices of FTCaseOption encounters the "lowercase" or "uppercase" option.
It is a dynamic error if a weight value is not within the required range of values; it is also a dynamic error if an implementation that does not support negative weights encounters a negative weight value.
It is a dynamic error if an implementation encounters a mild not selection, one of whose operands evaluates to an AllMatches that contains a StringExclude
It is a static error if, during the static analysis phase, the query is found to contain a thesaurus option that refers to a thesaurus that is not found in the statically known thesauri.
It is a static error if, within a single FTMatchOptions, there is more than one match option of any given match option group.
It is a dynamic error if, when "wildcards" is in effect, a query string violates wildcard syntax.
It is a dynamic error if, in a function invocation, the argument corresponding to the specified function's collation parameter does not identify a supported collation.
It is a static error if an expression is not a valid instance of the grammar defined in A EBNF for XQuery 3.1 Grammar with Full Text extensions or of the grammar defined in B EBNF for XPath 3.1 Grammar with Full-Text extensions.
It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in Section 2.5.4 SequenceType MatchingXP.
It is a static error if an implementation recognizes a pragma but determines that its content is invalid.
It is a static error if an extension expression contains neither a pragma that is recognized by the implementation nor an expression enclosed in curly braces.
[XQueryX 3.1] defines an XML representation of [XQuery 3.1: An XML Query Language]. [XQuery and XPath Full Text 3.1 Requirements and Use Cases], section 5.4, XML Syntax, states "XQuery and XPath Full Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements." This appendix specifies XML Schemas that together define the XML representation of XQuery and XPath Full Text 3.1 by representing the abstract syntax found in A EBNF for XQuery 3.1 Grammar with Full Text extensions. Because XQuery and XPath Full Text 3.1 integrates seamlessly with XQuery 3.1, it follows that the XML Syntax for XQuery and XPath Full Text 3.1 must integrate well with the XML Syntax for XQuery 3.1.
The XML Schema specified in this appendix accomplishes integration by importing the XML Schema defined for XQueryX in Section 4 An XML Schema for the XQuery XML Syntax XQX31, incorporating all of its type and element definitions. It then extends that schema by adding definitions of new types and elements in a namespace belonging to the full-text specification.
The semantics of a Full Text XQueryX document are determined by the semantics of the XQuery Full Text expression that results from transforming the XQueryX document into XQuery Full Text syntax using the XSLT stylesheet that appears in section E.2 XQueryX stylesheet for XQuery and XPath Full Text 3.1. The "correctness" of that transformation is determined by asking the following the question: Can some Full Text XQueryX processor QX process some Full Text XQueryX document D1 to produce results R1, after which the stylesheet is used to translate D1 into an XQuery Full Text expression E1 that, when processed by some XQuery Full Text processor Q, produces results R2 that are equivalent (under some meaningful definition of "equivalent") to results R1?
The XML Schema that defines the complex types and elements for XQueryX in support of XQuery and XPath Full Text 3.1, including the ftContainsExpr, incorporates a second XML Schema that defines types and elements to support the ftMatchOption. Both XML Schemas are defined in this section.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <!-- Initial creation 2006-08-17: Jim Melton --> <!-- Added ftOptionDecl, ftScoreVariableBinding 2006-08-21: Jim Melton --> <!-- First version believed complete 2006-08-29: Jim Melton --> <!-- Cleaned up naming 2007-04-27: Mary Holstege --> <!-- Revised to align with updated syntax 2008-01-14: Jim Melton --> <!-- Moved ftOptionDecl: prolog part two to one 2008-01-24: Jim Melton --> <!-- Revised position of "weight" in grammar 2008-11-12: Jim Melton --> <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/> <xsd:include schemaLocation="./xpath-full-text-30-xqueryx-ftmatchoption-extensions.xsd"/> <xsd:element name="ftOptionDecl" substitutionGroup="xqx:prologPartOneItem"> <xsd:complexType> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType> </xsd:element> <!-- Create a new substitution group for full-text expressions --> <xsd:complexType name="ftExpr"> <xsd:complexContent> <xsd:extension base="xqx:expr"/> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftExpr" type="xqxft:ftExpr" abstract="true" substitutionGroup="xqx:expr"/> <!-- Represents an untyped variable for the "score" clause --> <xsd:element name="ftScoreVariableBinding" type="xqx:QName" substitutionGroup="xqx:forLetClauseItemExtensions"/> <!-- FTContains ("contains text") --> <!-- Represents the following grammar productions: --> <!-- FTContainsExpr ::= --> <!-- StringConcatExpr ( "contains" "text" FTSelection FTIgnoreOption? )? --> <xsd:complexType name="ftContainsExpr"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftRangeExpr" type="xqx:exprWrapper" /> <xsd:sequence minOccurs="0" maxOccurs="1"> <xsd:element name="ftSelectionExpr" type="xqxft:ftSelectionWrapper" /> <xsd:element name="ftIgnoreOption" type="xqxft:ftIgnoreOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContainsExpr" type="xqxft:ftContainsExpr" substitutionGroup="xqxft:ftExpr" /> <!-- FTProximity --> <!-- Represents the following grammar productions: --> <!-- FTPosFilter ::= --> <!-- FTOrder | FTWindow | FTDistance | FTScope | FTContent --> <xsd:complexType name="ftProximity" /> <xsd:element name="ftProximity" type="xqxft:ftProximity" abstract="true"/> <!-- some simple type definitions --> <!-- Represents the following grammar productions: --> <!-- FTUnit ::= "words" | "sentences" | "paragraphs" --> <xsd:simpleType name="ftUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> <xsd:enumeration value="word"/> </xsd:restriction> </xsd:simpleType> <!-- Represents the following grammar productions: --> <!-- FTBigUnit ::= "sentence" | "paragraph" --> <xsd:simpleType name="ftBigUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> </xsd:restriction> </xsd:simpleType> <!-- Represents the following grammar productions: --> <!-- FTContent ::= ("at" "start") | ("at" "end") | ("entire" "content")--> <xsd:simpleType name="contentLocation"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="at start"/> <xsd:enumeration value="at end"/> <xsd:enumeration value="entire content"/> </xsd:restriction> </xsd:simpleType> <!-- Represents the following grammar productions: --> <!-- FTScope ::= ("same" | "different") FTBigUnit --> <xsd:simpleType name="ftScopeType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="same"/> <xsd:enumeration value="different"/> </xsd:restriction> </xsd:simpleType> <!-- range-related definitions --> <xsd:complexType name="unaryRange"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="binaryRange"> <xsd:sequence> <xsd:element name="lower" type="xqx:exprWrapper" /> <xsd:element name="upper" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTRange ::= ("exactly" AdditiveExpr) --> <!-- | ("at" "least" AdditiveExpr) --> <!-- | ("at" "most" AdditiveExpr) --> <!-- | ("from" AdditiveExpr "to" AdditiveExpr) --> <xsd:complexType name="ftRange"> <xsd:choice> <xsd:element name="atLeastRange" type="xqxft:unaryRange" /> <xsd:element name="atMostRange" type="xqxft:unaryRange" /> <xsd:element name="exactlyRange" type="xqxft:unaryRange" /> <xsd:element name="fromToRange" type="xqxft:binaryRange" /> </xsd:choice> </xsd:complexType> <!-- ftPosFilter alternative: ordered --> <!-- Represents the following grammar productions: --> <!-- FTOrder ::= "ordered" --> <xsd:complexType name="ftOrdered"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftOrdered" type="xqxft:ftOrdered" substitutionGroup="xqxft:ftProximity"/> <!-- ftPosFilter alternative: window --> <!-- Represents the following grammar productions: --> <!-- FTWindow ::= "window" AdditiveExpr FTUnit --> <xsd:complexType name="ftWindow"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftWindow" type="xqxft:ftWindow" substitutionGroup="xqxft:ftProximity"/> <!-- ftPosFilter alternative: distance --> <!-- Represents the following grammar productions: --> <!-- FTDistance ::= "distance" FTRange FTUnit --> <xsd:complexType name="ftDistance"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftDistance" type="xqxft:ftDistance" substitutionGroup="xqxft:ftProximity"/> <!-- ftPosFilter alternative: scope --> <!-- Represents the following grammar productions: --> <xsd:complexType name="ftScope"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="type" type="xqxft:ftScopeType" /> <xsd:element name="unit" type="xqxft:ftBigUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftScope" type="xqxft:ftScope" substitutionGroup="xqxft:ftProximity"/> <!-- ftPosFilter alternative: FTContent --> <!-- Represents the following grammar productions: --> <xsd:complexType name="ftContent"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="location" type="xqxft:contentLocation" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContent" type="xqxft:ftContent" substitutionGroup="xqxft:ftProximity"/> <!-- ftPosFilter --> <!-- Represents the following grammar productions: --> <!-- FTPosFilter ::= --> <!-- FTOrder | FTWindow | FTDistance | FTScope | FTContent --> <xsd:complexType name="ftPosFilter"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftProximity" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <!-- FTSelection --> <!-- Represents the following grammar productions: --> <!-- FTSelection ::= FTOr FTPosFilter* --> <xsd:complexType name="ftSelection" > <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftSelectionSource" type="xqx:exprWrapper"/> <xsd:element name="ftPosFilter" type="xqxft:ftPosFilter" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftSelection" type="xqxft:ftSelection" substitutionGroup="xqxft:ftExpr" /> <xsd:complexType name="ftSelectionWrapper"> <xsd:sequence> <xsd:element ref="xqxft:ftSelection"/> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTIgnoreOption ::= "without" "content" UnionExpr --> <xsd:complexType name="ftIgnoreOption"> <xsd:sequence> <xsd:element ref="xqx:expr"/> </xsd:sequence> </xsd:complexType> <!-- Full-Text logical operators --> <xsd:element name="ftLogicalOp" type="xqx:binaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <!-- Represents the following grammar productions: --> <!-- FTOr ::= FTAnd ( "ftor" FTAnd )* --> <xsd:element name="ftOr" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <!-- Represents the following grammar productions: --> <!-- FTAnd ::= FTMildNot ( "ftand" FTMildNot )* --> <xsd:element name="ftAnd" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <!-- Represents the following grammar productions: --> <!-- FTMildNot ::= FTUnaryNot ( "not" "in" FTUnaryNot )* --> <xsd:element name="ftMildNot" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <!-- Represents the following grammar productions: --> <xsd:element name="ftLogicalNot" type="xqx:unaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <!-- Represents the following grammar productions: --> <!-- FTUnaryNot ::= ("ftnot")? FTPrimaryWithOptions --> <xsd:element name="ftUnaryNot" type="xqx:unaryOperatorExpr" substitutionGroup="xqxft:ftLogicalNot"/> <!-- Definitions associated with FTWords --> <!-- Represents the following grammar productions: --> <!-- FTTimes ::= "occurs" FTRange "times" --> <xsd:complexType name="ftTimes"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange"/> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTAnyallOption ::= ("any" "word"?) | ("all" "words"?) | "phrase" --> <xsd:simpleType name="ftAnyAllOption"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="any"/> <xsd:enumeration value="all"/> <xsd:enumeration value="any word"/> <xsd:enumeration value="all words"/> <xsd:enumeration value="phrase"/> </xsd:restriction> </xsd:simpleType> <!-- Represents the following grammar productions: --> <!-- FTWordsValue ::= StringLiteral | ("{" Expr "}") --> <xsd:complexType name="ftWordsAlternatives"> <xsd:choice> <xsd:element name="ftWordsLiteral" type="xqx:exprWrapper"/> <xsd:element name="ftWordsExpression" type="xqx:exprWrapper"/> </xsd:choice> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTWords ::= FTWordsValue FTAnyallOption? --> <xsd:complexType name="ftWords"> <xsd:sequence> <xsd:element name="ftWordsValue" type="xqxft:ftWordsAlternatives" /> <xsd:element name="ftAnyAllOption" type="xqxft:ftAnyAllOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- ... FTWordsValue FTAnyallOption? --> <xsd:group name="ftWordsWithTimes"> <xsd:sequence> <xsd:element name="ftWords" type="xqxft:ftWords" /> <xsd:element name="ftTimes" type="xqxft:ftTimes" minOccurs="0" /> </xsd:sequence> </xsd:group> <!-- Represents the following grammar productions: --> <!-- FTExtensionSelection ::= Pragma+ "{" FTSelection? "}" --> <xsd:complexType name="ftExtensionSelection"> <xsd:sequence> <xsd:element name="pragma" type="xqx:pragma" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="ftSelection" type="xqxft:ftSelection" minOccurs="0" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTPrimary ::= (FTWords FTTimes?) --> <!-- | ("(" FTSelection ")") --> <!-- | FTExtensionSelection --> <xsd:complexType name="ftPrimary"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr" > <xsd:choice> <xsd:element name="parenthesized" type="xqx:exprWrapper"/> <xsd:group ref="xqxft:ftWordsWithTimes" /> <xsd:element name="ftExtensionSelection" type="xqxft:ftExtensionSelection"/> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTPrimaryWithOptions ::= FTPrimary FTMatchOptions? FTWeight? --> <xsd:complexType name="ftPrimaryWithOptions"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftPrimary" type="xqxft:ftPrimary"/> <xsd:element ref="xqxft:ftMatchOptions" minOccurs="0" maxOccurs="1"/> <xsd:element name="weight" type="xqx:exprWrapper" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftPrimaryWithOptions" type="xqxft:ftPrimaryWithOptions" substitutionGroup="xqxft:ftExpr"/> </xsd:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <!-- Initial creation 2006-08-17: Jim Melton --> <!-- First version believed complete 2006-08-29: Jim Melton --> <!-- Cleaned up naming 2007-04-27: Mary Holstege --> <!-- Revised to align with updated syntax 2008-01-14: Jim Melton --> <!-- Comments added to clarify each element 2008-11-12: Jim Melton --> <!-- Add element decl for ftMatchOptions 2009-07-06: Michael Dyck --> <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/> <!-- FTMatchOption --> <!-- Represents the following grammar productions: --> <!-- FTMatchOption ::= FTLanguageOption --> <!-- | FTWildCardOption --> <!-- | FTThesaurusOption --> <!-- | FTStemOption --> <!-- | FTCaseOption --> <!-- | FTDiacriticsOption --> <!-- | FTStopWordOption --> <!-- | FTExtensionOption --> <xsd:complexType name="ftMatchOption" /> <xsd:element name="ftMatchOption" type="xqxft:ftMatchOption" abstract="true" /> <!-- Represents the following grammar productions: --> <!-- FTMatchOptions ::= ( "using" FTMatchOption )+ --> <xsd:complexType name="ftMatchOptions"> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType> <xsd:element name="ftMatchOptions" type="xqxft:ftMatchOptions"/> <!-- ftMatchOption alternative: case --> <!-- Represents the following grammar productions: --> <!-- FTCaseOption ::= ("case" "insensitive") --> <!-- | ("case" "sensitive") --> <!-- | "lowercase" --> <!-- | "uppercase" --> <xsd:complexType name="ftCaseOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="lowercase"/> <xsd:enumeration value="uppercase"/> <xsd:enumeration value="case sensitive"/> <xsd:enumeration value="case insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="case" type="xqxft:ftCaseOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: diacritics --> <!-- Represents the following grammar productions: --> <!-- FTDiacriticsOption ::= ("diacritics" "insensitive") --> <!-- | ("diacritics" "sensitive") --> <xsd:complexType name="ftDiacriticsOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="diacritics sensitive"/> <xsd:enumeration value="diacritics insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="diacritics" type="xqxft:ftDiacriticsOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: stemming --> <!-- Represents the following grammar productions: --> <!-- FTStemOption ::= ("stemming") | ("no" "stemming") --> <xsd:complexType name="ftStemOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="stemming" /> <xsd:enumeration value="no stemming" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stem" type="xqxft:ftStemOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: thesaurus --> <!-- Represents the following grammar productions: --> <!-- FTThesaurusID ::= "at" URILiteral ("relationship" StringLiteral)? --> <!-- (FTRange "levels")? --> <xsd:complexType name="ftThesaurusID"> <xsd:sequence> <xsd:element name="at" type="xsd:anyURI" /> <xsd:element name="relationship" type="xsd:string" minOccurs="0" /> <xsd:element name="levels" type="xqxft:ftRange" minOccurs="0" /> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- ... (FTThesaurusID | "default") --> <!-- ... "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")") --> <xsd:complexType name="thesaurusSpecSequence"> <xsd:sequence> <xsd:choice> <xsd:element name="default" /> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" /> </xsd:choice> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTThesaurusOption ::= --> <!-- ("thesaurus" (FTThesaurusID | "default")) --> <!-- | ("thesaurus" --> <!-- "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")") --> <!-- | ("no" "thesaurus") --> <xsd:complexType name="ftThesaurusOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="noThesauri" /> <xsd:element name="thesauri" type="xqxft:thesaurusSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="thesaurus" type="xqxft:ftThesaurusOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: stopwords --> <!-- Represents the following grammar productions: --> <!-- FTStopWords ::= ("at" URILiteral) --> <!-- | ("(" StringLiteral ("," StringLiteral)* ")") --> <xsd:complexType name="ftStopWords"> <xsd:choice> <xsd:element name="ref" type="xsd:anyURI" /> <xsd:element name="list"> <xsd:complexType> <xsd:sequence> <xsd:element ref="xqx:stringConstantExpr" minOccurs="1" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> <xsd:element name="ftStopWords" type="xqxft:ftStopWords" /> <!-- Represents the following grammar productions: --> <!-- ... "stop" "words" FTStopWords ... --> <!-- ... "stop" "words" "default" ... --> <xsd:group name="baseStopWords"> <xsd:choice> <xsd:element name="default" /> <xsd:element ref="xqxft:ftStopWords" /> </xsd:choice> </xsd:group> <!-- Represents the following grammar productions: --> <!-- FTStopWordsInclExcl ::= ("union" | "except") FTStopWords --> <xsd:complexType name="ftStopWordsInclExcl"> <xsd:choice> <xsd:element name="union" type="xqxft:ftStopWords" /> <xsd:element name="except" type="xqxft:ftStopWords" /> </xsd:choice> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- ... ("using" "stop" "words" FTStopWords FTStopWordsInclExcl*) ... --> <!-- ... ("using" "default" "stop" "words" FTStopWordsInclExcl*) ... --> <xsd:complexType name="stopWordsSpecSequence"> <xsd:sequence> <xsd:group ref="xqxft:baseStopWords" /> <xsd:element name="ftStopWordsInclExcl" type="xqxft:ftStopWordsInclExcl" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <!-- Represents the following grammar productions: --> <!-- FTStopWordOption ::= --> <!-- ("stop" "words" FTStopWords FTStopWordsInclExcl*) --> <!-- | ("stop" "words" "default" FTStopWordsInclExcl*) --> <!-- | ("no" "stop" "words") --> <xsd:complexType name="ftStopWordOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="noStopwords" /> <xsd:element name="stopwords" type="xqxft:stopWordsSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stopword" type="xqxft:ftStopWordOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: language --> <!-- Represents the following grammar productions: --> <!-- FTLanguageOption ::= "language" StringLiteral --> <xsd:complexType name="ftLanguageOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value" type="xsd:string" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="language" type="xqxft:ftLanguageOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- ftMatchOption alternative: wildcards --> <!-- Represents the following grammar productions: --> <!-- FTWildCardOption ::= ("wildcards") --> <!-- | ("no" "wildcards") --> <xsd:complexType name="ftWildCardOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption"> <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="wildcards" /> <xsd:enumeration value="no wildcards" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="wildcard" type="xqxft:ftWildCardOption" substitutionGroup="xqxft:ftMatchOption" /> <!-- Represents the following grammar productions: --> <!-- FTExtensionOption ::= "option" QName StringLiteral --> <xsd:complexType name="ftExtensionOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption"> <xsd:sequence> <xsd:element name="ftExtensionName" type="xqx:QName"/> <xsd:element name="ftExtensionValue" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftExtensionOption" type="xqxft:ftExtensionOption" substitutionGroup="xqxft:ftMatchOption" /> </xsd:schema>
The XSLT stylesheet that defines the semantics of XQueryX in support of XQuery and XPath Full Text 3.1 integrates seamlessly with the XQueryX XSLT stylesheet defined in Section B Transforming XQueryX to XQuery XQX31 by importing the XQueryX XSLT stylesheet. It provides additional templates that define the semantics of the XQueryX representation of XQuery and XPath Full Text 3.1 by transforming that XQueryX representation into the human readable syntax of XQuery and XPath Full Text 3.1.
<?xml version='1.0'?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" xmlns:xqx="http://www.w3.org/2005/XQueryX"> <!-- Initial creation 2006-08-17: Jim Melton --> <!-- Added ftOptionDecl, ftScoreVariableBinding 2006-08-21: Jim Melton --> <!-- First version believed complete 2006-08-29: Jim Melton --> <!-- Revised to align with 2008-01-24 draft 2008-02-08: Jim Melton --> <!-- Revised position of "weight" in grammar 2008-11-12: Jim Melton --> <!-- Various bug fixes 2009-07-14: Michael Dyck --> <!-- ftcontains => "contains text", Bug 7247 2009-09-17: Jim Melton --> <!-- with => using, stop words default, Bug 7271 2009-09-17: Jim Melton --> <!-- {} around weight values, around empty selection after pragmas 2010-09-07: Jim Melton --> <xsl:import href="http://www.w3.org/2005/XQueryX/xqueryx.xsl"/> <!-- ftOptionDecl --> <xsl:template match="xqxft:ftOptionDecl"> <xsl:text>declare ft-option </xsl:text> <xsl:apply-templates/> </xsl:template> <!-- ftScoreVariableBinding --> <xsl:template match="xqxft:ftScoreVariableBinding"> <xsl:text> score </xsl:text> <xsl:value-of select="$DOLLAR"/> <xsl:if test="@xqx:prefix"> <xsl:value-of select="@xqx:prefix"/> <xsl:value-of select="$COLON"/> </xsl:if> <xsl:value-of select="."/> </xsl:template> <!-- ftcontains --> <xsl:template match="xqxft:ftContainsExpr"> <xsl:apply-templates select="xqxft:ftRangeExpr"/> <xsl:text> contains text </xsl:text> <xsl:apply-templates select="xqxft:ftSelectionExpr"/> <xsl:apply-templates select="xqxft:ftIgnoreOption"/> </xsl:template> <xsl:template match="xqxft:value"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftRangeExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelectionExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftIgnoreOption"> <xsl:text>without content </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelection"> <xsl:apply-templates select="xqxft:ftSelectionSource"/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> <xsl:apply-templates select="xqxft:ftPosFilter"/> </xsl:template> <xsl:template match="xqxft:ftSelectionSource"> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftPosFilter"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> </xsl:template> <!-- FTProximity alternative: ordered --> <xsl:template match="xqxft:ftOrdered"> <xsl:text>ordered </xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- FTProximity alternative: window --> <xsl:template match="xqxft:ftWindow"> <xsl:text>window </xsl:text> <xsl:apply-templates select="xqxft:value"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:text>s</xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- FTProximity alternative: distance --> <xsl:template match="xqxft:ftDistance"> <xsl:text>distance </xsl:text> <xsl:apply-templates select="xqxft:ftRange"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:text>s</xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- FTProximity alternative: scope --> <xsl:template match="xqxft:ftScope"> <xsl:value-of select="xqxft:type"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- FTProximity alternative: content --> <xsl:template match="xqxft:ftContent"> <xsl:value-of select="xqxft:location"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:exactlyRange"> <xsl:text>exactly </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atLeastRange"> <xsl:text>at least </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atMostRange"> <xsl:text>at most </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:fromToRange"> <xsl:text>from </xsl:text> <xsl:apply-templates select="xqxft:lower"/> <xsl:text> to </xsl:text> <xsl:apply-templates select="xqxft:upper"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:lower"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:upper"> <xsl:apply-templates/> </xsl:template> <!-- ftMatchOption alternative: case --> <xsl:template match="xqxft:case"> <xsl:text> using </xsl:text> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- ftMatchOption alternative: diacritics --> <xsl:template match="xqxft:diacritics"> <xsl:text> using </xsl:text> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- ftMatchOption alternative: stemming --> <xsl:template match="xqxft:stem"> <xsl:text> using </xsl:text> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <!-- ftMatchOption alternative: thesaurus --> <xsl:template match="xqxft:thesaurus"> <xsl:text> using </xsl:text> <xsl:choose> <xsl:when test="xqxft:noThesauri"> <xsl:text>no thesaurus </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:thesauri"> <xsl:text> </xsl:text> <xsl:text>thesaurus </xsl:text> <xsl:choose> <xsl:when test="child::*[2]"> <xsl:call-template name="parenthesizedList"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:default"> <xsl:text>default </xsl:text> </xsl:template> <xsl:template match="xqxft:thesaurusID"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:at"> <xsl:text>at "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:relationship"> <xsl:text>relationship "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:levels"> <xsl:apply-templates/> <xsl:text> levels </xsl:text> </xsl:template> <!-- ftMatchOption alternative: stopword --> <xsl:template match="xqxft:stopword"> <xsl:text>using </xsl:text> <xsl:choose> <xsl:when test="xqxft:noStopwords"> <xsl:text>no stop words </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:stopwords"> <xsl:text> </xsl:text> <xsl:choose> <xsl:when test="xqxft:default"> <xsl:text>stop words default </xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>stop words </xsl:text> <xsl:apply-templates select="xqxft:ftStopWords"/> </xsl:otherwise> </xsl:choose> <xsl:apply-templates select="xqxft:ftStopWordsInclExcl"/> </xsl:template> <xsl:template match="xqxft:ftStopWords"> <xsl:call-template name="ftStopWords_type"/> </xsl:template> <xsl:template name="ftStopWords_type"> <xsl:choose> <xsl:when test="xqxft:ref"> <xsl:text>at "</xsl:text> <xsl:value-of select="xqxft:ref"/> <xsl:text>" </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:list"> <xsl:call-template name="parenthesizedList"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:FTStopWordsInclExcl"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:union"> <xsl:text>union </xsl:text> <xsl:call-template name="ftStopWords_type"/> </xsl:template> <xsl:template match="xqxft:except"> <xsl:text>except </xsl:text> <xsl:call-template name="ftStopWords_type"/> </xsl:template> <xsl:template match="xqxft:language"> <xsl:text>using language "</xsl:text> <xsl:apply-templates/> <xsl:text>"</xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:wildcard"> <xsl:text>using </xsl:text> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:ftAnd"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text> ftand </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftOr"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text> ftor </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftMildNot"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text> not in </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftUnaryNot"> <xsl:text>ftnot </xsl:text> <xsl:apply-templates select="xqx:operand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftPrimaryWithOptions"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftPrimary"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:parenthesized"> <xsl:text>( </xsl:text> <xsl:apply-templates/> <xsl:text> ) </xsl:text> </xsl:template> <xsl:template match="xqxft:ftWords"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsValue"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsLiteral"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsExpression"> <xsl:text> { </xsl:text> <xsl:apply-templates/> <xsl:text> } </xsl:text> </xsl:template> <xsl:template match="xqxft:ftAnyAllOption"> <xsl:value-of select="."/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftTimes"> <xsl:text>occurs </xsl:text> <xsl:apply-templates/> <xsl:text> times </xsl:text> </xsl:template> <xsl:template match="xqxft:ftExtensionSelection"> <xsl:apply-templates select="xqxft:pragma"/> <xsl:text> { </xsl:text> <xsl:apply-templates select="xqxft:ftSelection"/> <xsl:text> } </xsl:text> </xsl:template> <xsl:template match="xqxft:pragma"> <xsl:value-of select="$PRAGMA_BEGIN"/> <xsl:apply-templates select="xqx:pragmaName"/> <xsl:value-of select="$SPACE"/> <xsl:value-of select="xqx:pragmaContents"/> <xsl:value-of select="$PRAGMA_END"/> </xsl:template> <xsl:template match="xqxft:ftExtensionOption"> <xsl:text>using option </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftExtensionName"> <xsl:if test="@xqx:prefix"> <xsl:value-of select="@xqx:prefix"/> <xsl:value-of select="$COLON"/> </xsl:if> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftExtensionValue"> <xsl:text> "</xsl:text> <xsl:apply-templates/> <xsl:text>"</xsl:text> </xsl:template> <xsl:template match="xqxft:weight"> <xsl:text> weight { </xsl:text> <xsl:apply-templates/> <xsl:text> } </xsl:text> </xsl:template> </xsl:stylesheet>
The following example is based on the data and queries of one of the use cases in [XQuery and XPath Full Text 3.1 Requirements and Use Cases]. In this example, we show the English description of the query, the XQuery Full Text solution given in [XQuery and XPath Full Text 3.1 Requirements and Use Cases], a Full Text XQueryX solution, and the XQuery Full Text query that results from applying the Full Text XQueryX-to-XQuery Full Text transformation defined by the stylesheet in E.2 XQueryX stylesheet for XQuery and XPath Full Text 3.1 to the Full Text XQueryX solution. The latter XQuery Full Text expression is presented only as a sanity-check — the intent of the stylesheet is not to create the identical XQuery Full Text expression given in [XQuery and XPath Full Text 3.1 Requirements and Use Cases], but to produce a valid XQuery Full Text expression with the same semantics.
Comparison of the results of the Full Text XQueryX-to-XQuery Full Text transformation given in this document with the XQuery Full Text solutions in the [XQuery and XPath Full Text 3.1 Requirements and Use Cases] may be helpful in evaluating the correctness of the Full Text XQueryX solution in the example.
The XQuery Full Text Use Cases solution given for the example is provided only to assist readers of this document in understanding the Full Text XQueryX solution. There is no intent to imply that this document specifies a "compilation" or "transformation" of XQuery Full Text syntax into Full Text XQueryX syntax.
In the following example, note that path expressions are expanded to show their structure. Also, note that the prefix syntax for binary operators like "and" makes the precedence explicit. In general, humans find it easier to read an XML representation that does not expand path expressions, but it is less convenient for programmatic representation and manipulation. XQueryX is designed as a language that is convenient for production and modification by software, and not as a convenient syntax for humans to read and write.
Finally, please note that white space, including new lines, have been added to some of the Full Text XQueryX documents and XQuery Full Text expressions for readability. That additional white space is not necessarily produced by the Full Text XQueryX-to-XQuery Full Text transformation.
Here is Q4 from the [XQuery and XPath Full Text 3.1 Requirements and Use Cases], use case SCORE: Find all books with parts about "usability testing".
declare function local:filter ( $nodes as node()*, $exclude as element()* ) as node()* { for $node in $nodes except $exclude return typeswitch ($node) case $e as element() return element {node-name($e)} { $e/@*, filter( $e/node() except $exclude, $exclude ) } default return $node }; for $book in doc("http://bstore1.example.com/full-text.xml") /books/book let $irrelevantParts := for $part in $book//part let score $score := $part contains text "usability test.*" using wildcards where $score < 0.5 return $part where count($irrelevantParts) < count($book//part) return filter($book, $irrelevantParts)
<?xml version="1.0"?> <xqx:module xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2007/xpath-full-text http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsd http://www.w3.org/2005/XQueryX http://www.w3.org/2005/XQueryX/xqueryx.xsd"> <xqx:mainModule> <xqx:prolog> <xqx:functionDecl> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:paramList> <xqx:param> <xqx:varName>nodes</xqx:varName> <xqx:typeDeclaration> <xqx:anyKindTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> <xqx:param> <xqx:varName>exclude</xqx:varName> <xqx:typeDeclaration> <xqx:elementTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> </xqx:paramList> <xqx:typeDeclaration> <xqx:anyKindTest/> </xqx:typeDeclaration> <xqx:functionBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>node</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:exceptOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>nodes</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:returnClause> <xqx:typeswitchExpr> <xqx:argExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:argExpr> <xqx:typeswitchExprCaseClause> <xqx:variableBinding>e</xqx:variableBinding> <xqx:sequenceType> <xqx:elementTest/> </xqx:sequenceType> <xqx:resultExpr> <xqx:computedElementConstructor> <xqx:tagNameExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">node-name</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:tagNameExpr> <xqx:contentExpr> <xqx:sequenceExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:attributeName> <xqx:star/> </xqx:attributeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:pathExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">filter</xqx:functionName> <xqx:arguments> <xqx:exceptOp> <xqx:firstOperand> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:anyKindTest/> </xqx:stepExpr> </xqx:pathExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:sequenceExpr> </xqx:contentExpr> </xqx:computedElementConstructor> </xqx:resultExpr> </xqx:typeswitchExprCaseClause> <xqx:typeswitchExprDefaultClause> <xqx:resultExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:resultExpr> </xqx:typeswitchExprDefaultClause> </xqx:typeswitchExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:functionBody> </xqx:functionDecl> </xqx:prolog> <xqx:queryBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>book</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">doc</xqx:functionName> <xqx:arguments> <xqx:stringConstantExpr> <xqx:value>http://bstore1.example.com/full-text.xml</xqx:value> </xqx:stringConstantExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>books</xqx:nameTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>book</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqx:typedVariableBinding> <xqx:varName>irrelevantParts</xqx:varName> </xqx:typedVariableBinding> <xqx:letExpr> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>part</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqxft:ftScoreVariableBinding>score</xqxft:ftScoreVariableBinding> <xqx:letExpr> <xqxft:ftContainsExpr> <xqxft:ftRangeExpr> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqxft:ftRangeExpr> <xqxft:ftSelectionExpr> <xqxft:ftSelection> <xqxft:ftSelectionSource> <xqxft:ftPrimaryWithOptions> <xqxft:ftPrimary> <xqxft:ftWords> <xqxft:ftWordsValue> <xqxft:ftWordsLiteral> <xqx:stringConstantExpr> <xqx:value>usability test.*</xqx:value> </xqx:stringConstantExpr> </xqxft:ftWordsLiteral> </xqxft:ftWordsValue> </xqxft:ftWords> </xqxft:ftPrimary> <xqxft:wildcard> <xqxft:value>using wildcards</xqxft:value> </xqxft:wildcard> </xqxft:ftPrimaryWithOptions> </xqxft:ftSelectionSource> </xqxft:ftSelection> </xqxft:ftSelectionExpr> </xqxft:ftContainsExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>score</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:decimalConstantExpr> <xqx:value>0.5</xqx:value> </xqx:decimalConstantExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqx:returnClause> </xqx:flworExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:queryBody> </xqx:mainModule> </xqx:module>
Application of the stylesheet in E.2 XQueryX stylesheet for XQuery and XPath Full Text 3.1 to the Full Text XQueryX solution results in:
declare function local:filter($nodes as node()*, $exclude as element()*) as node() { ( for $node in ($nodes except $exclude) return ( typeswitch($node) case $e as element() return element {fn:node-name($e)} {( $e/child::attribute(*), fn:filter( ($e/child::node() except $exclude), $exclude ) )} default return $node ) ) }; ( for $book in fn:doc("http://bstore1.example.com/full-text.xml")/child::books/child::book let $irrelevantParts:= ( for $part in $book/descendant-or-self::part let score $score := $part contains text "usability test.*" using wildcards where ($score < 0.5) return $part ) where (fn:count($irrelevantParts) < fn:count($book/descendant-or-self::part)) return local:filter($book, $irrelevantParts) )
We would like to thank the members of the XQuery and XPath Full-Text group for their fruitful discussions.
We would like to thank the following people for their contributions on earlier drafts of this document.
Andrew Cencini, Microsoft - acencini@microsoft.com
Andrew Eisenberg, IBM - andrew.eisenberg@us.ibm.com
Nimish Khanolkar, Microsoft - nimishk@exchange.microsoft.com
Ashok Malhotra, Oracle - ashok.malhotra@oracle.com
Tapas Nayak, Microsoft - tapasnay@exchange.microsoft.com
Roland Seiffert, IBM - seiffert@de.ibm.com
An AllMatches describes the possible results of an FTSelection.
Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.
Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.
Ignored nodes are the set of nodes whose content are ignored.
Each Match describes one result to the FTSelection.
Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.
Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)
Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.
A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.
A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.
A QueryItem is a sequence of QueryTokenInfos representing the collection of tokens derived from tokenizing one query string.
A QueryTokenInfo is the identity of a token inside a query string.
The score of a full-text query result expresses its relevance to the search conditions.
A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.
Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.
A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.
A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.
A StringMatch is a possible match of a sequence of query tokens with a corresponding sequence of tokens in a document. A StringMatch may be a StringInclude or StringExclude.
A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.
A TokenInfo represents a contiguous collection of tokens from an XML document.
Formally, tokenization is the process of converting an XDM item to a collections of tokens, taking any structural information of the item into account to identify token, sentence, and paragraph boundaries. Each token is assigned a starting and ending position.
Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.
Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.
An anchoring selection consists of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".
An
and-selection combines two full-text selections using the
ftand
operator.
A cardinality selection consist of an FTWords followed by the FTTimes postfix operator.
A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.
A diacritics option modifies token and phrase matching by specifying how diacritics are considered.
A distance selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.
An extension option is a match option that acts in an implementation-defined way.
An extension selection is a full-text selection whose semantics are implementation-defined.
A full-text contains expression is a expression that evaluates a sequence of items against a full-text selection.
A full-text selection specifies the conditions of a full-text search.
Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.
Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.
A language option modifies token matching by specifying the language of search tokens and phrases.
Match options modify the set of tokens in the query, or how they are matched against tokens in the text.
The order in which effective match options for an FTWords are applied is called the match option application order.
Each of the alternatives of production FTMatchOption other than FTExtensionOption corresponds to one match option group.
MAY means that an item is truly optional.
A
mild-not selection combines two full-text selections
using the not in
operator.
MUST means that the item is an absolute requirement of the specification.
A
not-selection is a full-text selection starting with the prefix
operator ftnot
.
An
or-selection combines two full-text selections using the
ftor
operator.
An ordered selection consists of a full-text selection followed by the postfix operator "ordered".
Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.
A primary full-text selection is the basic form of a full-text selection. It specifies tokens and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses and the FTExtensionSelection are also a primary full-text selections.
A scope selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.
Those items are called the search context.
SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
A stemming option modifies token and phrase matching by specifying whether stemming is applied or not.
A stop word option controls matching of tokens by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text being searched.
A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not.
A wildcard option modifies token and phrase matching by specifying whether or not wildcards are recognized in query strings.
A window selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTWindow.
This appendix provides a summary of features defined in this specification whose effect is explicitly implementation-defined. The conformance rules require vendors to provide documentation that explains how these choices have been exercised.
Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization.
A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.
A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.
A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.
Implementations are free to provide implementation-defined ways to differentiate between markup's effect on token boundaries during tokenization.
The set of expressions (of form ExprSingle) that can be assigned to a score variable in a let-clause is implementation-defined. If an expression not supported by the scoring algorithm is passed to the scoring algorithm, the result is implementation-defined.
When a sequence of query tokens is considered as a phrase, it matches a sequence of tokens in the tokenized form of the text being searched only if the two sequences correspond in an implementation-defined way.
The match option application order, subject to the stated constraints, is implementation-defined.
The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. It MAY influence the behavior of other match options in an implementation-defined way.
The set of valid language identifiers is implementation-defined.
If an invalid language identifier is specified, then the behavior is implementation-defined.
When a processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language from the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.
It is implementation-defined which thesaurus relationships an implementation supports.
If a query specifies thesaurus relationships not supported by the thesaurus, or does not specify a relationship, the behavior is implementation-defined.
The effect of specifying a particular range of levels in an FTThesaurusID is implementation-defined.
If a query does not specify the number of levels, and the implementation does not follow the default of querying all levels of hierarchical relationships, then the number of levels of hierarchical relationships queries is implementation-defined.
It is implementation-defined what a stem of a token is, and whether stemming is based on an algorithm, dictionary, or mixed approach.
An implementation-defined comparison is used to determine whether a query token appears in the collection of stop words defined by the applicable stop word option.
Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.
The "stop words default" option specifies that an implementation-defined collection of stop words is used.
An implementation recognizes an implementation-defined set of namespace URIs used to denote extension options. The effect of each, including its error behavior, is implementation-defined.
An implementation recognizes an implementation-defined set of namespace URIs used to denote extension selection pragmas. The effect of each, including its error behavior, is implementation-defined.
The conditions under which tokenization of two equal items produces different tokens is implementation-defined.
An implementation may impose an implementation-defined restriction on the operand of FTIgnoreOption.
For certain full-text components of the static context (see C Static Context Components), the default initial value of the component can be overwritten or augmented with an implementation-defined value or values.