Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The document specifies requirements for Full-Text Search for use in XQuery [XQuery 1.0: An XML Query Language] and XPath [XML Path Language (XPath) 2.0].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a Working Group Note as described in the Process Document. It has been jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. This document is being published as a Working Group Note to persistently record the Requirements that guided the development of XQuery and XPath Full Text 1.0 as a W3C Recommendation.
This document includes, for each requirement, a corresponding status, indicating the current situation of the requirement in XQuery and XPath Full Text 1.0 at the time that it was issued as a final Recommendation on 22 February 2011. Organizations and individuals should review this document to determine whether or not the requirements provided meet the needs of the full-text community.
No substantive changes have been made to this specification since its publication as a Last Call Working Draft.
Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FTreq]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. At the time of publication, work on this document was considered complete and no further revisions are anticipated. It is a stable document and may be used as reference material or cited from another document. However, this document may be updated, replaced, or made obsolete by other documents at any time.
This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
2 Terminology
2.1 Terminology
2.2 SCORE
2.3 Full-Text
Search
3 Language Design
3.1 The Data
Model
3.2 Side-effects on the data
3.3 Score
Function and Full-Text Predicates
3.3.1 Predicate and Score
Independence
3.3.2 Score language
3.4 Score
algorithm
3.4.1 Return Score
3.4.2 Sort by Score
3.4.3 Type, Range of Score
3.4.4 Score Statistics
3.4.5 Semantics of Score
3.5 Combined
score
3.5.1 Score Combination
3.5.2 Score algorithm vendor-provided
3.5.3 Score algorithm overridable
3.5.4 Score influence
3.6 Extensibility
3.6.1 Extensible by vendors
3.6.2 Extensible by users
3.7 First, Future
Versions
3.8 End
user language
3.9 Searchable
query
3.10 Universality
4 Integration
4.1 XPath
4.2 Extensibility
Mechanisms
4.2.1 Integration into XQuery/XPath
4.2.2 XQuery and XPath Full Text 1.0
Extensibility
4.3 Composability
4.4 Human-readable
4.5 XML
syntax
5 Implementation
5.1 Declarativity
6 Functionality and Scope
6.1 Functionality
6.2 Search
Scope
6.2.1 Search within arbitrary structure
6.2.2 Constructed Structures
6.2.3 Return Arbitrary Nodes
6.2.4 Parts of Search Tree
6.3 Attributes
6.3.1 Search within attributes
6.3.2 Search across attributes and content
6.4 Markup
6.5 Element
Boundaries
6.5.1 Search across element boundaries
6.5.2 Element as a token boundary
6.6 Score
6.6.1 Score accessible
6.6.2 Implicit ordering
6.6.3 Score extendable
A References
A.1 Non-Normative
B Change Log
"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.
The requirements are written without reference to any particular solution.
The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:
This word means that the item is an absolute requirement.
This word means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.
This word means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Each requirement also includes a status section, indicating its current situation in the XML-Query family of specifications. Three status levels are available:
This indicates that the requirement, according to its original formulation, has been completely met. Optional clarificatory text may follow.
This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.
This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.
[Definition: SCORE reflects relevance of matched material.]
[Definition: Full-Text Search in this document is an extension to the XQuery and XPath language. It provides a way to query text which has been tokenized, i.e. broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).]
This section covers requirements for XQuery and XPath Full Text language design that are independent from, but related to, integration and scoping requirements.
XQuery and XPath Full Text 1.0 functions MUST operate on instances of the [XQuery 1.0 and XPath 2.0 Data Model].
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST NOT introduce or rely on side-effects.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST allow the user to return SCORE.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST allow the user to sort by SCORE.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST define the type and range of SCORE values. The SCORE SHOULD be a float, in the range 0-1.
Status: this requirement has been partially met. Float has been changed to double because double is the maximal promotion type.
XQuery and XPath Full Text 1.0 MUST be able to generate a SCORE for a combination of full-text predicates.
Status: this requirement has been met.
The algorithm to produce combined SCOREs MUST be vendor-provided.
Status: this requirement has been met.
The algorithm to produce combined SCOREs SHOULD be overridable by users.
Status: this requirement has been partially met. Since SCORE is implementation-dependent, the recommendation is silent on this and all matters relating to implementation of scoring.
Users MUST be able to influence individual components of complex score expressions.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST be extensible by vendors.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MAY be extensible by users.
Status: this requirement has been met.
The first version of XQuery and XPath Full Text 1.0 MUST provide a robust framework for future versions.
Status: this requirement has been met.
It is not a requirement that XQuery and XPath Full Text 1.0 be designed as an end-user UI language.
Status: this requirement has been met.
It SHOULD be possible to search XQuery and XPath Full Text 1.0 queries.
Status: this requirement has been met.
This section specifies requirements for the integration of XQuery and XPath Full Text 1.0 with XQuery and XPath.
Part, but not necessarily all, of XQuery and XPath Full Text 1.0 MUST be usable as part of an XPath expression.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 SHOULD use the extensibility mechanisms that exist in XQuery and XPath for integration into XQuery and XPath.
Status: this requirement has been partially met. XQuery and XPath Full Text 1.0 did not use functions because they were syntactically burdensome to users. The extensibility mechanisms were used in the XML syntax (XQueryX) for XQuery and XPath Full Text 1.0.
XQuery and XPath Full Text 1.0 MUST use the extensibility mechanisms that exist in XQuery and XPath for it's own extensibility.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST be composable with XQuery, and SHOULD be composable with itself.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See [XML Query (XQuery) Requirements].
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See [XML Query (XQuery) Requirements].
Status: this requirement has been met.
This section defines requirements for the functionality in XQuery and XPath Full Text 1.0, and the scope of XQuery and XPath Full Text 1.0 queries.
XQuery and XPath Full Text 1.0 MUST provide, in the first release, the minimum set of full-text functionality that is useful.
single-word search
phrase search
support for stop words
single character suffix
0 or more character suffix
0 or more character prefix
0 or more character infix
proximity searching (unit: words)
specification of order in proximity searching
combination using AND
combination using OR
combination using NOT
word normalization, diacritics
ranking, relevance
Status: this requirement has been met.
Additional functionality represented in the [XQuery and XPath Full Text 1.0 Use Cases] MUST be considered, but may be left to a future release.
Status: this requirement has been met.
Additional functionality from other Full-Text Search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST allow search within an arbitrary structure (an arbitrary XPath expression).
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST NOT preclude Full-Text Search within structures constructed during a query.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST allow a query to return arbitrary nodes.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST allow the combination of predicates on different parts of the searched document 'tree'.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST support Full-Text Search within attributes.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MAY support Full-Text Search within attributes in conjunction with Full-Text Search within element content.
Status: this requirement has been met.
If XQuery and XPath Full Text 1.0 supports search within names of elements and attributes, then it MUST distinguish between
element content and attribute values
and
names of elements and attributes
in any search.
Status: this requirement has been met.
XQuery and XPath Full Text 1.0 MUST support search across element boundaries, at least for NEAR.
Status: this requirement has been met.
Author | Date | Action | Description |
Stephen Buxton | 2003-03-19 | Added a Change Log | |
Stephen Buxton | 2003-03-19 | Terminology definition changes | Switched the definitions of SHOULD and MAY, to be consistent with [XML Query (XQuery) Requirements]. The rest of the document does not need to change, since the earlier versions of this document, on which the text of the spec is based, referred to the definitions in [XML Query (XQuery) Requirements]. |
Stephen Buxton | 2003-04-18 | Change XML Query Requirements link to external URI | Changed links in the document body to point to external latest copy of XML Query Requirements. |
Pat Case | 2006-11-17 | Recorded that requirements were met | Recorded that the XQuery and XPath Full Text 1.0 Requirements have been met (fully or paritially). |
Pat Case | 2007-12-04 | Title | Updated title and title references to remove 1.0, 2.0, and the hyphen. |
Pat Case | 2008-04-04 | Requirement 4.2.1 | Changed the status on 4.2.1 from green to yellow with an explanation. |