W3C

XQuery and XPath Full Text 1.0 Requirements

W3C Working Group Note 25 January 2011

This version:
http://www.w3.org/TR/2011/NOTE-xpath-full-text-10-requirements-20110125/
Latest version:
http://www.w3.org/TR/xpath-full-text-10-requirements/
Previous version:
http://www.w3.org/TR/2008/WD-xpath-full-text-10-requirements-20080516/
Editors:
Stephen Buxton, Oracle Corp
Pat Case, Library of Congress
Michael Rys, Microsoft

Abstract

The document specifies requirements for Full-Text Search for use in XQuery [XQuery 1.0: An XML Query Language] and XPath [XML Path Language (XPath) 2.0].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Working Group Note as described in the Process Document. It has been jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. This document is being published as a Working Group Note to persistently record the Requirements that guided the development of XQuery and XPath Full Text 1.0 as a W3C Recommendation.

This document includes, for each requirement, a corresponding status, indicating the current situation of the requirement in XQuery and XPath Full Text 1.0 at the time that it was issued as a final Recommendation on 22 February 2011. Organizations and individuals should review this document to determine whether or not the requirements provided meet the needs of the full-text community.

No substantive changes have been made to this specification since its publication as a Last Call Working Draft.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FTreq]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. At the time of publication, work on this document was considered complete and no further revisions are anticipated. It is a stable document and may be used as reference material or cited from another document. However, this document may be updated, replaced, or made obsolete by other documents at any time.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Terminology
    2.1 Terminology
    2.2 SCORE
    2.3 Full-Text Search
3 Language Design
    3.1 The Data Model
    3.2 Side-effects on the data
    3.3 Score Function and Full-Text Predicates
        3.3.1 Predicate and Score Independence
        3.3.2 Score language
    3.4 Score algorithm
        3.4.1 Return Score
        3.4.2 Sort by Score
        3.4.3 Type, Range of Score
        3.4.4 Score Statistics
        3.4.5 Semantics of Score
    3.5 Combined score
        3.5.1 Score Combination
        3.5.2 Score algorithm vendor-provided
        3.5.3 Score algorithm overridable
        3.5.4 Score influence
    3.6 Extensibility
        3.6.1 Extensible by vendors
        3.6.2 Extensible by users
    3.7 First, Future Versions
    3.8 End user language
    3.9 Searchable query
    3.10 Universality
4 Integration
    4.1 XPath
    4.2 Extensibility Mechanisms
        4.2.1 Integration into XQuery/XPath
        4.2.2 XQuery and XPath Full Text 1.0 Extensibility
    4.3 Composability
    4.4 Human-readable
    4.5 XML syntax
5 Implementation
    5.1 Declarativity
6 Functionality and Scope
    6.1 Functionality
    6.2 Search Scope
        6.2.1 Search within arbitrary structure
        6.2.2 Constructed Structures
        6.2.3 Return Arbitrary Nodes
        6.2.4 Parts of Search Tree
    6.3 Attributes
        6.3.1 Search within attributes
        6.3.2 Search across attributes and content
    6.4 Markup
    6.5 Element Boundaries
        6.5.1 Search across element boundaries
        6.5.2 Element as a token boundary
    6.6 Score
        6.6.1 Score accessible
        6.6.2 Implicit ordering
        6.6.3 Score extendable

Appendices

A References
    A.1 Non-Normative
B Change Log


1 Introduction

"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.

The requirements are written without reference to any particular solution.

2 Terminology

2.1 Terminology

The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:

MUST

This word means that the item is an absolute requirement.

SHOULD

This word means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.

MAY

This word means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.

When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.

Each requirement also includes a status section, indicating its current situation in the XML-Query family of specifications. Three status levels are available:

"Green" status

green status This indicates that the requirement, according to its original formulation, has been completely met. Optional clarificatory text may follow.

"Yellow" status

yellow status This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.

"Red" status

red status This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.

2.2 SCORE

[Definition: SCORE reflects relevance of matched material.]

2.3 Full-Text Search

[Definition: Full-Text Search in this document is an extension to the XQuery and XPath language. It provides a way to query text which has been tokenized, i.e. broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).]

3 Language Design

This section covers requirements for XQuery and XPath Full Text language design that are independent from, but related to, integration and scoping requirements.

3.1 The Data Model

XQuery and XPath Full Text 1.0 functions MUST operate on instances of the [XQuery 1.0 and XPath 2.0 Data Model].

   green status  Status: this requirement has been met.

3.2 Side-effects on the data

XQuery and XPath Full Text 1.0 MUST NOT introduce or rely on side-effects.

   green status  Status: this requirement has been met.

3.3 Score Function and Full-Text Predicates

3.3.1 Predicate and Score Independence

XQuery and XPath Full Text 1.0 MUST allow full-text predicates and SCORE functions independently.

   green status  Status: this requirement has been met.

3.3.2 Score language

XQuery and XPath Full Text 1.0 MUST either

  • use the same language for full-text predicates and SCORE functions

or

  • use a language for full-text predicates that is a proper subset of the language for SCORE functions.

   green status  Status: this requirement has been met.

3.4 Score algorithm

3.4.1 Return Score

XQuery and XPath Full Text 1.0 MUST allow the user to return SCORE.

   green status  Status: this requirement has been met.

3.4.2 Sort by Score

XQuery and XPath Full Text 1.0 MUST allow the user to sort by SCORE.

   green status  Status: this requirement has been met.

3.4.3 Type, Range of Score

XQuery and XPath Full Text 1.0 MUST define the type and range of SCORE values. The SCORE SHOULD be a float, in the range 0-1.

   yellow status  Status: this requirement has been partially met. Float has been changed to double because double is the maximal promotion type.

3.4.4 Score Statistics

XQuery and XPath Full Text 1.0 MUST NOT require an explicit definition of the global corpus statistics (statistics, such as word frequency, used in calculating SCORE).

   green status  Status: this requirement has been met.

3.4.5 Semantics of Score

XQuery and XPath Full Text 1.0 MAY partially define the semantics of SCORE.

   green status  Status: this requirement has been met.

3.5 Combined score

3.5.1 Score Combination

XQuery and XPath Full Text 1.0 MUST be able to generate a SCORE for a combination of full-text predicates.

   green status  Status: this requirement has been met.

3.5.2 Score algorithm vendor-provided

The algorithm to produce combined SCOREs MUST be vendor-provided.

   green status  Status: this requirement has been met.

3.5.3 Score algorithm overridable

The algorithm to produce combined SCOREs SHOULD be overridable by users.

   yellow status  Status: this requirement has been partially met. Since SCORE is implementation-dependent, the recommendation is silent on this and all matters relating to implementation of scoring.

3.5.4 Score influence

Users MUST be able to influence individual components of complex score expressions.

   green status  Status: this requirement has been met.

3.6 Extensibility

3.6.1 Extensible by vendors

XQuery and XPath Full Text 1.0 MUST be extensible by vendors.

   green status  Status: this requirement has been met.

3.6.2 Extensible by users

XQuery and XPath Full Text 1.0 MAY be extensible by users.

   green status  Status: this requirement has been met.

3.7 First, Future Versions

The first version of XQuery and XPath Full Text 1.0 MUST provide a robust framework for future versions.

   green status  Status: this requirement has been met.

3.8 End user language

It is not a requirement that XQuery and XPath Full Text 1.0 be designed as an end-user UI language.

   green status  Status: this requirement has been met.

3.9 Searchable query

It SHOULD be possible to search XQuery and XPath Full Text 1.0 queries.

   green status  Status: this requirement has been met.

3.10 Universality

XQuery and XPath Full Text 1.0 SHOULD be universal. As a minimum, XQuery and XPath Full Text 1.0 MUST allow full-text search in any Unicode character-set and in all common written natural languages.

   green status  Status: this requirement has been met.

4 Integration

This section specifies requirements for the integration of XQuery and XPath Full Text 1.0 with XQuery and XPath.

4.1 XPath

Part, but not necessarily all, of XQuery and XPath Full Text 1.0 MUST be usable as part of an XPath expression.

   green status  Status: this requirement has been met.

4.2 Extensibility Mechanisms

4.2.1 Integration into XQuery/XPath

XQuery and XPath Full Text 1.0 SHOULD use the extensibility mechanisms that exist in XQuery and XPath for integration into XQuery and XPath.

   yellow status  Status: this requirement has been partially met. XQuery and XPath Full Text 1.0 did not use functions because they were syntactically burdensome to users. The extensibility mechanisms were used in the XML syntax (XQueryX) for XQuery and XPath Full Text 1.0.

4.2.2 XQuery and XPath Full Text 1.0 Extensibility

XQuery and XPath Full Text 1.0 MUST use the extensibility mechanisms that exist in XQuery and XPath for it's own extensibility.

   green status  Status: this requirement has been met.

4.3 Composability

XQuery and XPath Full Text 1.0 MUST be composable with XQuery, and SHOULD be composable with itself.

   green status  Status: this requirement has been met.

4.4 Human-readable

XQuery and XPath Full Text 1.0 may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See [XML Query (XQuery) Requirements].

   green status  Status: this requirement has been met.

4.5 XML syntax

XQuery and XPath Full Text 1.0 MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See [XML Query (XQuery) Requirements].

   green status  Status: this requirement has been met.

5 Implementation

5.1 Declarativity

XQuery and XPath Full Text 1.0 MUST be declarative. Notably, it MUST not enforce a particular evaluation strategy.

   green status  Status: this requirement has been met.

6 Functionality and Scope

This section defines requirements for the functionality in XQuery and XPath Full Text 1.0, and the scope of XQuery and XPath Full Text 1.0 queries.

6.1 Functionality

XQuery and XPath Full Text 1.0 MUST provide, in the first release, the minimum set of full-text functionality that is useful.

  1. single-word search

  2. phrase search

  3. support for stop words

  4. single character suffix

  5. 0 or more character suffix

  6. 0 or more character prefix

  7. 0 or more character infix

  8. proximity searching (unit: words)

  9. specification of order in proximity searching

  10. combination using AND

  11. combination using OR

  12. combination using NOT

  13. word normalization, diacritics

  14. ranking, relevance

   green status  Status: this requirement has been met.

Additional functionality represented in the [XQuery and XPath Full Text 1.0 Use Cases] MUST be considered, but may be left to a future release.

   green status  Status: this requirement has been met.

Additional functionality from other Full-Text Search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.

   green status  Status: this requirement has been met.

6.2 Search Scope

6.2.1 Search within arbitrary structure

XQuery and XPath Full Text 1.0 MUST allow search within an arbitrary structure (an arbitrary XPath expression).

   green status  Status: this requirement has been met.

6.2.2 Constructed Structures

XQuery and XPath Full Text 1.0 MUST NOT preclude Full-Text Search within structures constructed during a query.

   green status  Status: this requirement has been met.

6.2.3 Return Arbitrary Nodes

XQuery and XPath Full Text 1.0 MUST allow a query to return arbitrary nodes.

   green status  Status: this requirement has been met.

6.2.4 Parts of Search Tree

XQuery and XPath Full Text 1.0 MUST allow the combination of predicates on different parts of the searched document 'tree'.

   green status  Status: this requirement has been met.

6.3 Attributes

6.3.1 Search within attributes

XQuery and XPath Full Text 1.0 MUST support Full-Text Search within attributes.

   green status  Status: this requirement has been met.

6.3.2 Search across attributes and content

XQuery and XPath Full Text 1.0 MAY support Full-Text Search within attributes in conjunction with Full-Text Search within element content.

   green status  Status: this requirement has been met.

6.4 Markup

If XQuery and XPath Full Text 1.0 supports search within names of elements and attributes, then it MUST distinguish between

  • element content and attribute values

and

  • names of elements and attributes

in any search.

   green status  Status: this requirement has been met.

6.5 Element Boundaries

6.5.1 Search across element boundaries

XQuery and XPath Full Text 1.0 MUST support search across element boundaries, at least for NEAR.

   green status  Status: this requirement has been met.

6.5.2 Element as a token boundary

XQuery and XPath Full Text 1.0 MUST treat an element as a token boundary. This MAY be user-defined.

   yellow status  Status: this requirement has been partially met. By default elements create token boundaries, but implementations may override that for certain elements.

6.6 Score

6.6.1 Score accessible

SCORE MUST be accessible anywhere in the scope of the query.

   green status  Status: this requirement has been met.

6.6.2 Implicit ordering

SCORE SHOULD NOT be used for implicit ordering.

   green status  Status: this requirement has been met.

6.6.3 Score extendable

SCORE MAY be extendable to a general distance-measure.

   green status  Status: this requirement has been met.

A References

A.1 Non-Normative

XQuery 1.0 and XPath 2.0 Data Model
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), Norman Walsh, Mary Fernández, Ashok Malhotra, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/. The latest version is available at http://www.w3.org/TR/xpath-datamodel/.
XQuery 1.0: An XML Query Language
XQuery 1.0: An XML Query Language (Second Edition), Don Chamberlin, Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xquery-20101214/. The latest version is available at http://www.w3.org/TR/xquery/.
XML Path Language (XPath) 2.0
XML Path Language (XPath) 2.0 (Second Edition), Don Chamberlin, Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath20-20101214/. The latest version is available at http://www.w3.org/TR/xpath20/.
XML Query (XQuery) Requirements
XML Query (XQuery) Requirements, Don Chamberlin, Peter Fankhauser, Massimo Marchiori, and Jonathan Robie, Editors. World Wide Web Consortium, 23 March 2007. This version is http://www.w3.org/TR/2007/NOTE-xquery-requirements-20070323/. The latest version is available at http://www.w3.org/TR/xquery-requirements/.
XQuery and XPath Full Text 1.0 Use Cases
XQuery and XPath Full Text 1.0 Use Cases, Sihem Amer-Yahia and Pat Case, Editors. World Wide Web Consortium, 25 January 2011. This version is http://www.w3.org/TR/2011/NOTE-xpath-full-text-10-use-cases-20110125/. The latest version is available at http://www.w3.org/TR/xpath-full-text-10-use-cases/.
SQL/MM Full-Text
ISO/IEC 13249-2:2000, Information technology — Database languages — SQL Multimedia and Application Packages — Part 2: Full-Text, International Organization For Standardization, 2000, available from http://www.iso.org/

B Change Log

Author Date Action Description
Stephen Buxton 2003-03-19 Added a Change Log
Stephen Buxton 2003-03-19 Terminology definition changes Switched the definitions of SHOULD and MAY, to be consistent with [XML Query (XQuery) Requirements]. The rest of the document does not need to change, since the earlier versions of this document, on which the text of the spec is based, referred to the definitions in [XML Query (XQuery) Requirements].
Stephen Buxton 2003-04-18 Change XML Query Requirements link to external URI Changed links in the document body to point to external latest copy of XML Query Requirements.
Pat Case 2006-11-17 Recorded that requirements were met Recorded that the XQuery and XPath Full Text 1.0 Requirements have been met (fully or paritially).
Pat Case 2007-12-04 Title Updated title and title references to remove 1.0, 2.0, and the hyphen.
Pat Case 2008-04-04 Requirement 4.2.1 Changed the status on 4.2.1 from green to yellow with an explanation.