Copyright © 2016 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document is governed by the 1 September 2015 W3C Process Document.
This is a Working Group Note as described in the Process Document. It was developed by the W3C XML Query Working Group, which is part of the XML Activity.
These Requirements identify extensions to the XQuery 3.0 Recommendation, published 04 April 2014, that have been requested by WG participants and by reviewers who do not participate in the W3C activities. The XML Query WG has not yet fully reviewed these requirements.
Please report errors in this document using W3C's public Bugzilla system (instructions can be found at https://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[XQuery31Req]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at https://lists.w3.org/Archives/Public/public-qt-comments/.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The primary goal of XML Query 3.1 is to extend XML Query 3.0 with support for JSON maps and arrays, and to leverage these structures to make XQuery more useful. These data structures are also part of XPath 3.1, and are used in XSLT as well as XQuery.
Other features that improve usability or compatibility will be considered as time permits.
Satisfying these goals may require changes to the set of seven documents that have progressed to Recommendation together (Data Model 3.1, Functions and Operators 3.1, Serialization 3.1, XPath 3.1, XQuery 3.1, XQueryX 3.1, and XSLT 3.0).
The following keywords are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:
The item is an absolute requirement.
The item is an absolute prohibition.
There may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.
There may exist valid reasons when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.
An item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.
When the words MUST, SHOULD, or MAY are used in this technical sense [IETF RFC 2119], they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Each requirement also includes a status section, indicating its current situation in the XQuery/XPath/XSLT family of specifications. Three status levels are used:
This indicates that the requirement, according to its original formulation, has been completely met. Optional clarifying text may follow.
This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.
This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.
XQuery 3.1 MUST be backward compatible with [XQuery 3.0].
Every valid XQuery 3.0 expression MUST be valid in XQuery 3.1 and it MUST evaluate to the same result.
Status: this requirement has been met.
XQuery 3.1 MUST be compatible with XQuery 3.0 extensions developed by the XML Query Working Group, including [XQuery Update Facility 3.0] and [XQuery and XPath Full Text 3.0].
Status: this requirement has been met.
XQuery 3.1 MUST support collections of name / value pairs, which we call maps. In JSON, they are called objects, in other languages they are sometimes called records, structs, dictionaries, hash tables, keyed lists, or associative arrays).
Status: this requirement has been met.
The map feature MUST provide a convenient syntax for creating maps.
Status: this requirement has been met.
The map feature MUST provide a convenient syntax for returning the value associated with a key.
Status: this requirement has been met.
The map feature MUST provide a convenient way to enumerate the keys in a map.
Status: this requirement has been met (using functions).
The map feature MUST provide a convenient way to create modified copies of maps, e.g. by adding or deleting entries.
Status: this requirement has been met (using functions).
The map feature MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.
Status: this requirement has been met.
A map SHOULD allow any atomic value as a key. The map feature SHOULD allow keys of various types to be used as keys in the same map.
Status: this requirement has been met.
A map SHOULD allow any XDM sequence as a value. A map MUST allow any XDM item, map, or array as a value.
Status: this requirement has been met.
A map MUST be allowed as a member of an XDM sequence.
Status: this requirement has been met.
It MAY be possible to use a map as a function.
Status: this requirement has been met.
For the sake of optimizability, a map SHOULD NOT expose identity via the
is
, <<
, >>
, union
, intersect
,
or except
operators, or any operation that exposes document order.
Status: this requirement has been met.
XQuery 3.1 MUST support arrays, which can nest.
Status: this requirement has been met.
XQuery 3.1 MUST provide a convenient syntax for creating arrays.
Status: this requirement has been met.
Arrays MUST provide a convenient syntax for returning the value found in a given position.
Status: this requirement has been met (using function call syntax).
Arrays SHOULD provide a convenient way to create modified copies of an array, e.g. by adding or deleting entries.
Status: this requirement has been met (using functions).
Arrays MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.
Status: this requirement has been met.
An array MUST allow any XDM item, array, or map as a member of an array.
Status: this requirement has been met.
An array MUST be allowed as a member of an XDM sequence.
Status: this requirement has been met.
It MAY be possible to use an array as a function.
Status: this requirement has been met.
For the sake of optimizability, an array SHOULD NOT expose identity via the
is
, <<
, >>
, union
, intersect
,
or except
operators, or any operation that exposes document order.
Status: this requirement has been met.
XQuery 3.1 MUST support JSON serialization.
Status: this requirement has been met.
XQuery 3.1 MAY support serialization to multiple resources from a single query.
Status: this requirement has not been met. However, the EXPath File Module provides this functionality for implementations that support it.
XQuery 3.1 MUST provide support for numbers in scientific notation.
Status: this requirement has been met.
XQuery 3.1 MAY support aliases for types.
Status: this requirement has not been met.
XQuery 3.1 MUST provide a means to invoke XSLT transformations.
Status: this requirement has been met. fn:transform()
invokes an XSLT transformation.
XQuery 3.1 MAY provide a standard mechanism for referring to collations.
Status: this requirement has been met.
The solutions provided for the following Use Cases use the XQuery 3.1 query language, and frequently create maps rather than XML. In some cases, XQuery 3.0 solutions that create XML are also provided for comparison. Every XQuery 3.0 solution provided is also a valid XQuery 3.1 solution.
These use cases were originally proposed for XSLT 3.0 streaming. In XQuery, they are done using grouping. In these use cases, we assume that the user is using maps as a lightweight structure to represent the results of grouping.
Find the highest earning employee in each department.
Find both the highest earning employee in each department, and the total number of employees to job-type across all departments.
for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $department := $employee/department let $max-salary := max($salary) let $highest-earners := $employee[salary = $max-salary] return <department name="{$department}">{ $highest-earners }</department>, for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $job-type := $employee/job-type let $totals := count($employee) return <total-by-job-type type="{$job-type}">{ $totals }</total-by-job-type>
for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $department := $employee/department let $max-salary := max($salary) let $highest-earners := $employee[salary = $max-salary] return map { "department" : $department, "highest earners" : $highest-earners } , for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $job-type := $employee/job-type let $totals := count($employee) return map { "job type" : $job-type, "count(employee)" : $totals }
Calculate the word count by lemma of the verbs in the following document.
The XML document, gnt.xml.
<gnt> <s> <w pos="PP">I</w> <w pos="V" lemma="go">go</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="go">went</w> <pu>.</pu> </s> <s> <w pos="PP">He</w> <w pos="V" lemma="go">goes</w> <pu>.</pu> </s> <s> <w pos="PP">I</w> <w pos="V" lemma="see">see</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="see">sees</w> <pu>.</pu> </s> <s> <w pos="PP">I</w> <w pos="V" lemma="have">have</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="have">has</w> <pu>.</pu> </s> </gnt>
<verb lemma="go" count="3"/> <verb lemma="see" count="2"/> <verb lemma="have" count="2"/>
Implement a complex number library for XQuery or XSLT 3.0. Complex numbers should be represented as a single item, so they can themselves be manipulated like regular numbers by returning sequences of them etc.
In this library, the complex number 2 + 3i
is represented
as the map { "true" : 2, "false" : 3 }
declare function i:complex( $real as xs:double, $imaginary as xs:double ) as map(xs:boolean, xs:double) { map { true() : $real, false() : $imaginary } }; declare function i:real( $complex as map(xs:boolean, xs:double) ) as xs:double { $complex(true()) }; declare function i:imaginary( $complex as map(xs:boolean, xs:double) ) as xs:double { $complex(false()) }; declare function i:add( $arg1 as map(xs:boolean, xs:double), $arg2 as map(xs:boolean, xs:double) ) as map(xs:boolean, xs:double) } i:complex(i:real($arg1)+i:real($arg2), i:imaginary($arg1)+i:imaginary($arg2)) }; declare function i:multiply( $arg1 as map(xs:boolean, xs:double), $arg2 as map(xs:boolean, xs:double) ) as map(xs:boolean, xs:double) { i:complex( i:real($arg1)*i:real($arg2) - i:imaginary($arg1)*i:imaginary($arg2), i:real($arg1)*i:imaginary($arg2) + i:imaginary($arg1)*i:real($arg2)) };
Here is a query that uses this library:
i:add(i:complex(2, 3), i:complex(1, -6)), i:multiply(i:complex(2, -1), i:complex(3, 4))
Here is the result of the above query:
{ "true" : 3, "false" : -3 }, { "true" : 10, "false" : 5 }
Build an index to manually optimize retrieval of books in a catalog by their ISBN number.
Construct a list of all authors, and the books they have written.
As in Javascript, a map whose keys are strings and whose associated values are function items can be used in a similar way to a class in object-oriented programming languages.
Suppose an application needs to handle customer order information that may arrive in three different formats, with different hierarchic arrangement.
An application can isolate itself from these differences by defining a set of functions to navigate the relationships between customers, orders, and products: orders-for-customer, orders-for-product, customer-for-order, product-for-order. These functions can be implemented in different ways for the three different input formats.
Flat structure:
<customer id="c123">...</customer> <product id="p789">...</product> <order customer="c123" product="p789">...</order>
Orders within customer elements:
<customer id="c123"> <order product="p789">...</order> </customer> <product id="p789">...</product>
Orders within product elements:
<customer id="c123">...</customer> <product id="p789"> <order customer id="c123">...</order> </product>
For example, with the first format the implementation might be:
declare variable $flat-input-functions := map { 'orders-for-customer' : function($c as element(customer)) as element(order)* { $c/../order[@customer=$c/@id] }, 'orders-for-product' : function($p as element(product)) as element(order)* { $p/../order[@product=$p/@id] }, 'customer-for-order' : function($o as element(order)) as element(customer) { $o/../customer[@id=$o/@customer] }, 'product-for-order' : function($o as element(order)) as element(product) { $o/../product[@id=$o/@product] } };
Create a general interface that takes as input some words, does a full-text search for them, and returns snippets of the top 10 results, ordered by score, where the nodes to search, their structure, how to construct snippets and how to score them differ for different data sets.
Create a template method and use a map of functions to define the implementation of the plug-in points.
(: General interface module :) module namespace this="http://example.com/search-interface/"; declare function this:search( $words as xs:string*, $collection as map(xs:string, function(*))) { for $d in $collection('select')[. contains text {$words} any word] order by $collection('score', $d, $words) count $c where $c <= 10 return $collection('snippet', $d, $words) }; (: Specific implementation example :) import module namespace s="http://example.com/search-interface/"; declare variable $twitter as map(xs:string, function(*)) := map { 'select' : function() as node()* { collection("twitter") }, 'score' : function($n as node(), $words as xs:string*) as xs:double { let score $s1 := $n contains text {$words} any word let score $s2 := $n contains text {$words} all words return $s1 + $s2 }, 'snippet' : function($node as node(), $words as xs:string*) as node() { $node } }; declare variable $blog as map(xs:string, function(*)) := map { 'select' : function() as node()* { collection("blogs")/body }, 'score' : function($n as node(), $words as xs:string*) as xs:double { let $s1 := avg( for $p score $s in $n/para[. contains text {$words} any word] return $s ) let $s2 := avg( for $p score $s in $n/comment[. contains text {$words} weight 0.5 any word] return $s ) let score $s3 := $n/title contains text {$words} weight 5.0 any word return $s1 + $s2 + $s3 }, 'snippet' : function($node as node(), $words as xs:string*) as node() { <result> { $node/title, $node/para[1], $node/comment[1] } </result> } }; declare variable $books as map(xs:string, function(*)) := map { 'select' : function() as node()* { collection()//chapter }, 'score' : function($n as node(), $words as xs:string*) as xs:double { let score $s1 := $n contains text {$words} any word let score $s2 := $n/title contains text {$words} weight 5.0 any word return $s1 + $s2 }, 'snippet' : function($node as node(), $words as xs:string*) as node() { <result> { $node/title, ((for $p score $s in $node/p[. contains text {$words} all words] order by $s return $p), (for $p score $s in $node/p[. contains text {$words} any word] order by $s return $p))[1] } </result> } }; (: Get top 10 from various sources :) s:search(("fire","earthquake"),$books), s:search(("fire","earthquake"),$twitter), s:search(("fire","earthquake"),$blog)
Provide access to various pieces of metadata to application, insulating that application code from variations in document structure.
Define the metadata interface through a map of functions.
(: Specific implementations :) declare namespace xh="http://www.w3.org/1999/xhtml"; declare variable $xhtml as map(xs:string, function(*)) := map { 'title' : function($n as document-node()) as xs:string? { $n/xh:head/xh:title }, 'author' : function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='author']/@content }, 'pubdate' : function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='created']/@content }, 'publisher' : function($n as document-node()) as xs:string? { () } }; declare variable $medline-citation as map(xs:string, function(*)) := map { 'title' : function($n as document-node()) as xs:string? { $n/MedlineCitation/Article/ArticleTitle }, 'author': function($n as document-node()) as xs:string? { string-join( for $a in $n/MedlineCitation//Author return concat($a/LastName, ", ", $a/ForeName) , "; " ) }, 'pubdate' : function($n as document-node()) as xs:string? { let $d := $n/MedlineCitation/Article/PubDate return string-join(($d/Day,$d/Month,$d/Year), " ") }, 'publisher' : function($n as document-node()) as xs:string? { $n/MedlineCitation/MedlineJournalIngo/MedlineTA } };
Often library functions may have a large number of optional arguments, which are awkward or impossible to provide using the existing mechanism of variable arity functions.
Pass the list of parameter names and values to the xdmp:xslt-invoke() function, which invokes an XSLT stylesheet.
declare function xdmp:xslt-invoke($path as xs:string, $input as node(), $params as map(xs:QName, item()*)) as document-node()* external; let $params := map { xs:QName("toc") := true(), xs:QName("index") := doc("index_terms.xml") } return xdmp:xslt-invoke("my-stylesheet.xsl", doc("my-doc.xml"), $params)
Provide a mechanism to supply (otherwise defaulted) option values to the my:doc() function, which control aspects of it's behaviour, including:
Parsing of external entities
DTD validation
XML Schema validation
Lax (XML Schema) validation
Whitespace stripping
URI resolution
Using maps in this scenario brings benefits over using XML structure, including:
Nodes are not copied; their identity is retained
Atomic items are not serialized, and retain their specific type
Functions can be passed in as options - the relevant example in this case being the URI resolver.
declare function my:doc($uri as xs:string, $options as map(xs:string, item()*)) as document-node()? external; (: Enable lax XML Schema validation :) my:doc("validate-me.xml", map { "schema-validation" : true(), "lax-validation" : true() }), (: Enable whitespace stripping, and a custom URI resolution :) my:doc("../relative-uri.xml", map { "strip-whitespace" : true(), "uri-resolver" : resolve-uri(?, base-uri()) })
Design a language-agnostic game (here just the core), which allows a translation function or map as a parameter.
declare function local:play( $secret-number as xs:integer, $guessed-number as xs:integer, $translator as function(xs:string) as xs:string) { switch (true()) case $guessed-number eq $secret-number return $translator("You won!") case $guessed-number lt $secret-number return $translator("The secret number is greater.") default (: $guessed-number gt $secret-number :) return $translator("The secret number is lower.") }; local:play(76, 86, function($x) { $x }), (: Keep English :) local:play(76, 86, map { "You won!" : "Du hast gewonnen!", "The secret number is greater." : "Die geheime Zahl ist groesser.", "The secret number is lower." : Die geheime Zahl ist kleiner." } ), local:play(76, 86, $automated-translator-based-on-natural-language-processing)
Software used for natural language processing and text analytics frequently uses data structures like maps and arrays. For instance, the Python Natural Language Toolkit (NLTK) uses lists and tuples extensively. In this use case, we use a library that invokes NLTK to perform simple natural language processing, returning results in a format very similar to that used by NLTK, and perform a variety of simple tasks.
In this use case, we are using the Gutenberg edition of
Jane Austin's "Emma", as packaged in NLTK. To return the
sentences of a text, we use the nltk:sentences()
function, which returns sentences using the same data
structures as NLTK.
Here are a few sentences resulting from the function call
nltk:sentences('austin-emma.txt')
, using arrays to
represent Python's list structures:
Sentence Representation:
[ ['I', 'must', 'put', 'on', 'a', 'few', 'ornaments', 'now', ',', 'because', 'it', 'is', 'expected', 'of', 'me', '.'], ['A', 'bride', ',', 'you', 'know', ',', 'must', 'appear', 'like', 'a', 'bride', ',', 'but', 'my', 'natural', 'taste', 'is', 'all', 'for', 'simplicity', ';', 'a', 'simple', 'style', 'of', 'dress', 'is', 'so', 'infinitely', 'preferable', 'to', 'finery', '.'], ['But', 'I', 'am', 'quite', 'in', 'the', 'minority', ',', 'I', 'believe', ';', 'few', 'people', 'seem', 'to', 'value', 'simplicity', 'of', 'dress', ',--', 'show', 'and', 'finery', 'are', 'every', 'thing', '.'] ]
NLTK has multiple representations of sentences. If $s
is bound to the second sentence in the above data structure, then nltk:pos-tag($s)
returns the following:
Part of Speech Representation:
[['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.'] ]
If $s is bound to a part of speech representation, we can convert it to an XML format using the following query:
<s> { for $w in $s?* return <w pos="{ $w(2) }">{ $w(1) }</w> } </s>
Or if we prefer to use meaningful names instead of the numeric positions, we can create an index that maps between names and positions and use it as follows:
declare variable $index := map { "pos" : 2, "lemma" : 1 }; <s> { for $w in $s?* return <w pos="{ $w($index("pos")) }">{ $w($index("lemma")) }</w> } </s>
Both queries have the same result:
<s> <w pos="DT">A</w> <w pos="NN">bride</w> <w pos=",">,</w> <w pos="PRP">you</w> <w pos="VBP">know</w> <w pos=",">,</w> <w pos="MD">must</w> <w pos="VB">appear</w> <w pos="IN">like</w> <w pos="DT">a</w> <w pos="NN">bride</w> <w pos=",">,</w> <w pos="CC">but</w> <w pos="PRP$">my</w> <w pos="JJ">natural</w> <w pos="NN">taste</w> <w pos="VBZ">is</w> <w pos="DT">all</w> <w pos="IN">for</w> <w pos="NN">simplicity</w> <w pos=":">;</w> <w pos="DT">a</w> <w pos="JJ">simple</w> <w pos="NN">style</w> <w pos="IN">of</w> <w pos="NN">dress</w> <w pos="VBZ">is</w> <w pos="RB">so</w> <w pos="RB">infinitely</w> <w pos="JJ">preferable</w> <w pos="TO">to</w> <w pos="VB">finery</w> <w pos=".">.</w> </s>
If $s is bound to a sentence in part of speech representation, the following query converts it to a map with meaningful property names:
array { for $w in $s?* return map { "pos" : $w(2), "lemma" : $w(1) } }
Here is the output of the above query:
[ { "pos" : "DT", "lemma" : "A" }, { "pos" : "NN", "lemma" : "bride" }, { "pos" : ",", "lemma" : "," }, { "pos" : "PRP", "lemma" : "you" }, { "pos" : "VBP", "lemma" : "know" }, { "pos" : ",", "lemma" : "," }, { "pos" : "MD", "lemma" : "must" }, { "pos" : "VB", "lemma" : "appear" }, { "pos" : "IN", "lemma" : "like" }, { "pos" : "DT", "lemma" : "a" }, { "pos" : "NN", "lemma" : "bride" }, { "pos" : ",", "lemma" : "," }, { "pos" : "CC", "lemma" : "but" }, { "pos" : "PRP$", "lemma" : "my" }, { "pos" : "JJ", "lemma" : "natural" }, { "pos" : "NN", "lemma" : "taste" }, { "pos" : "VBZ", "lemma" : "is" }, { "pos" : "DT", "lemma" : "all" }, { "pos" : "IN", "lemma" : "for" }, { "pos" : "NN", "lemma" : "simplicity" }, { "pos" : ":", "lemma" : ";" }, { "pos" : "DT", "lemma" : "a" }, { "pos" : "JJ", "lemma" : "simple" }, { "pos" : "NN", "lemma" : "style" }, { "pos" : "IN", "lemma" : "of" }, { "pos" : "NN", "lemma" : "dress" }, { "pos" : "VBZ", "lemma" : "is" }, { "pos" : "RB", "lemma" : "so" }, { "pos" : "RB", "lemma" : "infinitely" }, { "pos" : "JJ", "lemma" : "preferable" }, { "pos" : "TO", "lemma" : "to" }, { "pos" : "VB", "lemma" : "finery" }, { "pos" : ".", "lemma" : "." } ]
If $s is bound to a sentence in part of speech representation, the following query groups words by part of speech, selecting parts of speech particularly illustrative of Jane Austen's writing style.
for $word in $s?* let $pos := $word(2) let $lexeme := $word(1) where $pos = ("JJ", "NN", "RB", "VB") group by $pos order by $pos return <pos name="{$pos}"> { for $l in distinct-values($lexeme) return <lexeme>{ $l }</lexeme> } </pos>
Here is the output of the above query:
<pos name="JJ"> <lexeme>natural</lexeme> <lexeme>simple</lexeme> <lexeme>preferable</lexeme> </pos> <pos name="NN"> <lexeme>bride</lexeme> <lexeme>taste</lexeme> <lexeme>simplicity</lexeme> <lexeme>style</lexeme> <lexeme>dress</lexeme> </pos> <pos name="RB"> <lexeme>so</lexeme> <lexeme>infinitely</lexeme> </pos> <pos name="VB"> <lexeme>appear</lexeme> <lexeme>finery</lexeme> </pos>
In corpus linguistics, n-grams are the basis for certain statistical techniques used to explore and compare texts; for instance, they are used to determine authorship of texts. If $s is bound to a sentence in sentence notation, the following query computes trigrams for a text:
declare function local:words-only($s) { for $w in $s where not($w(2) = (".", ",", ";", ":")) return $w(1) }; for sliding window $w in local:words-only($s?*) start at $i when true() only end at $j when $j - $i eq 2 return array { $w }
Here is the result for a sentence used in an earlier example:
[ "A", "bride", "you" ], [ "bride", "you", "know" ], [ "you", "know", "must" ], [ "know", "must", "appear" ], [ "must", "appear", "like" ], [ "appear", "like", "a" ], [ "like", "a", "bride" ], [ "a", "bride", "but" ], [ "bride", "but", "my" ], [ "but", "my", "natural" ], [ "my", "natural", "taste" ], [ "natural", "taste", "is" ], [ "taste", "is", "all" ], [ "is", "all", "for" ], [ "all", "for", "simplicity" ], [ "for", "simplicity", "a" ], [ "simplicity", "a", "simple" ], [ "a", "simple", "style" ], [ "simple", "style", "of" ], [ "style", "of", "dress" ], [ "of", "dress", "is" ], [ "dress", "is", "so" ], [ "is", "so", "infinitely" ], [ "so", "infinitely", "preferable" ], [ "infinitely", "preferable", "to" ], [ "preferable", "to", "finery" ]
Filters can be used to partition the words of a sentence in a variety of ways. In
this simple example, we use filters to distinguish verbs from other parts of speech.
In NLTK, parse codes that start with the string VB
denote verb forms.
In this example, the variable $s
is bound to sentence in parsed format, e.g.
[ ['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.'] ]
The filter function takes a boolean function, and returns one array with those items that satisfy the function, and a second array with those items that do not.
declare function local:filter($s as item()*, $p as function(item()) as xs:boolean) { array { $s[$p(.)] }, array { $s[not($p(.))] } };
We can call it with the starts-with()
function to partition a sentence.
let $f := function($a) { starts-with($a(2), "VB") } return local:filter($s?*, $f)
Here is the output of the query for the sentence shown above.
[ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ], [ "is", "VBZ" ], [ "finery", "VB" ] ], [ [ "A", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "you", "PRP" ], [ ",", "," ], [ "must", "MD" ], [ "like", "IN" ], [ "a", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT" ], [ "for", "IN" ], [ "simplicity", "NN" ], [ ";", ":" ], [ "a", "DT" ], [ "simple", "JJ" ], [ "style", "NN" ], [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], [ "infinitely", "RB" ], [ "preferable", "JJ" ], [ "to", "TO"], [ ".", "." ] ]
A programmer might choose to represent filter results using a map instead of an array, as shown in the following code.
declare function local:filter($s as item()*, $p as function(item()) as xs:boolean) { { true() : array { $s[$p(.)] }, false() : array { $s[not($p(.))] } } }; let $f := function($a) { starts-with($a(2), "VB") } return local:filter($s?*, $f)
Here is the output of the above query using the same data.
{ "true" : [ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ], ["is", "VBZ" ], [ "finery", "VB" ] ], "false" : [ [ "A", "DT" ], ["bride", "NN" ], [ ",", "," ], [ "you", "PRP" ], [ ",", "," ], [ "must", "MD" ], [ "like", "IN" ], [ "a", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT"], [ "for", "IN" ], [ "simplicity", "NN" ], [ ";", ":" ], [ "a", "DT" ], [ "simple", "JJ" ], [ "style", "NN" ], [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], [ "infinitely", "RB" ], [ "preferable", "JJ" ], [ "to", "TO" ], [ ".", "." ] ] }
When Rigaudon optical character recognition software is used for multilingual texts, languages are identified by character set if possible, and formatted in hocr format. For instance, the text "the other possible derivation from ἡ ἐπιοῦσα, dies crastinus", which contains English, Greek, and Latin, might be represented as follows in raw OCR output (the format is simplified somewhat for the sake of presentation).
<span class="ocr_word" title="bbox 1388 430 1461 474">the</span> <span class="ocr_word" title="bbox 1514 433 1635 476">other</span> <span class="ocr_word" title="bbox 133 498 317 554">pcssible</span> <span class="ocr_word" title="bbox 354 498 590 541">derivation</span> <span class="ocr_word" title="bbox 631 497 738 538">from</span> <span class="ocr_word" title="bbox 772 495 799 547" lang="grc" xml:lang="grc">ἡ</span> <span class="ocr_word" title="bbox 835 495 1019 538" lang="grc" xml:lang="grc">ἐπιοῦσα</span> <span class="ocr_word" title="bbox 134 567 220 607">dies</span> <span class="ocr_word" title="bbox 257 566 462 607">erastinus</span>
In the above output, two words were not correctly recognized, the English word "possible" and the Latin word "crastinus". Rigaudon uses multilingual spell checkers to find the nearest likely word in a one of the languages likely to be used in a given text. For this particular text, we expect to find English, Greek, and Latin.
In this use case, we take the above hocr as input and call the spellcheck function, implemented as an external function, to identify which words are likely in each candidate language. Having done so, we combine the results to construct the most likely text.
The following function extracts the text from the above data.
declare function local:extract-text($spans) { for $s in $spans return string($s) };
Here is the output of the function for the data shown above.
"the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus"
The following function performs a spellcheck in a set of languages, creating a map that identifies the original and each language.
declare variable $languages := ("English", "Greek", "Latin"); declare function local:spellcheck($languages, $text) { map:merge ( map { "languages" : $languages }, map { "raw" : $text }, for $l in $languages return { $l : array { for $w in $text return ext:sc($l, $w) } } ) }; let $t := local:extract-text($spans) return local:spellcheck($languages, $t)
Here is the output of the above query.
{ "languages" : ( "English", "Greek", "Latin" ), "raw" : [ "the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus" ], "English" : [ "the", "other", "possible", "derivation", "from", null, null, "dies", null ], "Greek" : [ null, null, null, null, null, "ἡ", "ἐπιοῦσα", null, null ], "Latin" : [ null, null, null, null, null, null, null, "dies", "crastinus" ] }
The following function merges lookup results in the above
format. The first parameter lists a set of languages, in
preference order. For each word, the function picks the non-null
lookup result for the most preferred language available, or the
original "raw" word if all lookups return null. In this code, we
assume that $m
is bound to the data structure shown
above.
declare variable $languages := ("English", "Greek", "Latin"); declare function local:merge($languages, $m) { let $size := count($m("raw")?*) for $i in 1 to $size let $candidates := ($languages ! $m(.)($i)[ . ne null] , $m("raw")($i)) return $candidates[1] }; local:merge($languages, $m)
Here is the result of the query:
the other possible derivation from ἡ ἐπιοῦσα dies crastinus
This use case uses rotation matrices to rotate a shape in three dimensions.
The following library implements three-dimensional rotation in XQuery
declare function local:rotate-x( $theta ) { [ [ 1, 0, 0 ], [ 0, cosine($theta), - sine($theta) ], [ 0, sine($theta), cosine($theta) ] ] }; declare function local:rotate-y( $theta ) { [ [ cosine($theta), 0, sine($theta) ], [ 0, 1, 0], [ - sine($theta), 0, cosine($theta) ] ] }; declare function local:rotate-z( $theta ) { [ [ cosine($theta), - sine($theta), 0 ], [ sine($theta), cosine($theta), 0 ], [ 0, 0, 1] ] }; declare function local:rotate($pitch as xs:double, $yaw as xs:double, $roll as xs:double) { let $p := local:rotate-x($pitch) let $y := local:rotate-y($yaw) let $r := local:rotate-z($roll) let $py :=local:mult($p, $y) return local:mult($py, $r) }; declare function local:mult( $matrix1, $matix2 ) { if (length($matrix1) != length($matrix2(1)) then error("Matrices must be m*n and n*p to multiply!") else array { for $i in 1 to length($matrix1) return array { for $j in 1 to length($matrix2(1)) return sum ( for $k in 1 to length($matrix2) return $matrix1($i)($k) * $matrix2($k)($j) ) } } }; let $rect := [[0, 0, 0], [10, 0, 0], [10, 10, 0], [0, 10, 0], [0, 0, 0]] let $rot := for $r in $rect() return local:mult($r, local:rotate( 10, 10, 10 ) return img:render( $rot )
JSON is becoming an important data format that many XQuery and XSLT users have to deal with. Tasks performed can include importing JSON, processing it, and exporting JSON.
Import a JSON document and retrieve the mobile phone number from it.
The fn:parse-json() function parses a JSON document into an XDM value as follows:
A JSON object is converted into a map of type map(xs:string, item()?).
A JSON array is converted into a map of type map(xs:integer, item()?).
A JSON string is converted into an xs:string atomic value.
A JSON number is converted into an xs:double atomic value.
A JSON boolean is converted into an xs:boolean atomic value.
A JSON null is converted into the empty sequence.
The JSON document, mildred.json:
{ "firstname": "Mildred", "lastname": "Moore", "age": 32, "address": { "street": "91 High Street", "town": "Biscester", "county": "Oxfordshire", "postcode": "OX6 3PD" }, "phone": [ { "type": "home", "number": "01869 378073" }, { "type": "mobile", "number": "07356 740756" } ] }
Convert a JSON data file to XML.
The JSON document, employees.json:
{ "accounting" : [ { "firstName" : "John", "lastName" : "Doe", "age" : 23 }, { "firstName" : "Mary", "lastName" : "Smith", "age" : 32 } ], "sales" : [ { "firstName" : "Sally", "lastName" : "Green", "age" : 27 }, { "firstName" : "Jim", "lastName" : "Galley", "age" : 41 } ] }
<department name="accounting"> <employee> <firstName>John</firstName> <lastName>Doe</lastName> <age>23</age> </employee> <employee> <firstName>Mary</firstName> <lastName>Smith</lastName> <age>32</age> </employee> </department> <department name="sales"> <employee> <firstName>Sally</firstName> <lastName>Green</lastName> <age>27</age> </employee> <employee> <firstName>Jim</firstName> <lastName>Galley</lastName> <age>41</age> </employee> </department>
let $input := json-doc('employees.json') for $k in map:keys($input) return <department name="{ $k }"> { let $array := $input($k) for $i in 1 to array:size($array) let $emp := $array($i) return <employee> <firstName>{ $emp('firstName') }</firstName> <lastName>{ $emp('lastName') }</lastName> <age>{ $emp('age') }</age> </employee> } </department>
Update the first name of the author "Dan Suciu" to "John" in the "bookinfo.json" document.
The JSON document, bookinfo.json:
{ "book": { "title": "Data on the Web", "year": 2000, "author": [ { "last": "Abiteboul", "first": "Serge" }, { "last": "Buneman", "first": "Peter" }, { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }
declare function local:deep-put($input as item()*, $key as xs:string, $value as item()*) as item()* { let $mf := function($k, $v) { if ($k eq $key) then map{$k : $value} else map{$k : local:deep-put($v, $key, $value)} } for $i in $input return if ($i instance of map(*)) then map:merge(map:for-each($i, $mf)) else if ($i instance of array(*)) then array{ local:deep-put($i?*, $key, $value) } else $i }; local:deep-put(json-doc("bookinfo.json"), "first", "John")
Note:
Extending the Update Facility to allow updating maps would allow a simpler solution.
The following queries are based on a social media site that allows users to interact
with their friends. collection("users")
contains data on users and their friends:
{ "name" : "Sarah", "age" : 13, "gender" : "female", "friends" : [ "Jim", "Mary", "Jennifer"] } { "name" : "Jim", "age" : 13, "gender" : "male", "friends" : [ "Sarah" ] }
Note:
These queries are based on similar queries in the XQuery 3.0 Use Cases.
The input is a sequence (whose order is of no concern) that contains the following sales data, represented here in JSON notation:
{ "product" : "broiler", "store number" : 1, "quantity" : 20 }, { "product" : "toaster", "store number" : 2, "quantity" : 100 }, { "product" : "toaster", "store number" : 2, "quantity" : 50 }, { "product" : "toaster", "store number" : 3, "quantity" : 50 }, { "product" : "blender", "store number" : 3, "quantity" : 100 }, { "product" : "blender", "store number" : 3, "quantity" : 150 }, { "product" : "socks", "store number" : 1, "quantity" : 500 }, { "product" : "socks", "store number" : 2, "quantity" : 10 }, { "product" : "shirt", "store number" : 3, "quantity" : 10 }
We want to group sales by product, across stores.
We assume a function collection("sales") that returns a sequence of items representing the rows in this table.
Query:
map:merge(( for $sales in collection("sales") let $pname := $sales("product") group by $pname return map { $pname : sum(for $s in $sales return $s("quantity")) } ))
Now let's do a more complex grouping query, showing sales by category within each state. We need further data to describe the categories of products and the location of stores.
collection("products") contains the following data:
{ "name" : "broiler", "category" : "kitchen", "price" : 100, "cost" : 70 }, { "name" : "toaster", "category" : "kitchen", "price" : 30, "cost" : 10 }, { "name" : "blender", "category" : "kitchen", "price" : 50, "cost" : 25 }, { "name" : "socks", "category" : "clothes", "price" : 5, "cost" : 2 }, { "name" : "shirt", "category" : "clothes", "price" : 10, "cost" : 3 }
collection("stores") contains the following data:
{ "store number" : 1, "state" : CA }, { "store number" : 2, "state" : CA }, { "store number" : 3, "state" : MA }, { "store number" : 4, "state" : MA }
[ { "CA" : [ {"kitchen" : { "broiler" : 20, "toaster" : 150 }}, {"clothes" : { "socks" : 510 }} ] }, { "MA" : [ { "kitchen" : { "blender" : 250, "toaster" : 50 }}, { "clothes" : { "shirt" : 10 }} ] } ]
The following query groups by state, then by category, then lists individual products and the sales associated with each.
Query:
array { for $store in json-doc('stores.json') ? * let $state := $store?state group by $state return map { $state : array { for $product in json-doc('products.json') ? * let $category := $product?category group by $category return map { $category : map:merge(( for $sales in json-doc('sales.json') ? * where $sales?("store number") = $store?("store number") and $sales?product = $product?name let $pname := $sales?product group by $pname return map { $pname : sum(for $s in $sales return $s?quantity)} )) } } } }
The following query takes satellite data, and summarizes which satellites are visible. The data for the query is a simplified version of a Stellarium file that contains this information.
{ "creator" : "Satellites plugin version 0.6.4", "satellites" : { "AAU CUBESAT" : { "tle1" : "1 27846U 03031G 10322.04074654 .00000056 00000-0 45693-4 0 8768", "visible" : false }, "AJISAI (EGS)" : { "tle1" : "1 16908U 86061A 10321.84797408 -.00000083 00000-0 10000-3 0 3696", "visible" : true }, "AKARI (ASTRO-F)" : { "tle1" : "1 28939U 06005A 10321.96319841 .00000176 00000-0 48808-4 0 4294", "visible" : true } } }
We want to query this data to return a summary that looks like this.
JSON programmers frequently need to convert XML to JSON. The following query is based on a Wikipedia XML export format, using data from the category "Origami". Here is an excerpt of this data:
<mediawiki> <siteinfo> <sitename>Wikipedia</sitename> <page> <title>Kawasaki's theorem</title> <id>14511776</id> <revision> <id>435519187</id> <timestamp>2011-06-21T20:08:56Z</timestamp> <contributor> <username>Some jerk on the Internet</username> <id>6636894</id> </contributor> !!! SNIP !!! <page> <title>Origami techniques</title> <id>193590</id> <revision> <id>447687387</id> <timestamp>2011-08-31T17:21:49Z</timestamp> <contributor> <username>Dmcq</username> <id>3784322</id> </contributor> !!! SNIP !!! <page> <title>Mathematics of paper folding</title> <id>232840</id> <revision> <id>440970828</id> <timestamp>2011-07-23T09:10:42Z</timestamp> <contributor> <username>Tabletop</username> <id>173687</id> </contributor>
[ { "title" : "Kawasaki's theorem", "id" : "14511776", "timestamp" : "2011-06-21T20:08:56Z", "authors" : ["Some jerk on the Internet" ] }, { "title" : "Origami techniques", "id" : "193590", "timestamp" : "2011-08-31T17:21:49Z", "authors" : ["Dmcq" ] }, { "title" : "Mathematics of paper folding", "id" : "232840", "timestamp" : "2011-07-23T09:10:42Z", "authors" : ["Tabletop" ] } ]
The following query converts this data to JSON:
Query:
array { for $page in doc("Wikipedia-Origami.xml")//page return map { "title": string($page/title), "id" : string($page/id), "last updated" : string($page/revision[1]/timestamp), "authors" : array { for $a in $page/revision/contributor/username return string($a) } } }
Suppose a JavaScript implementation provides an interface for queries, and a JavaScript program contains the following data [1]:
var data = { "color" : "blue", "closed" : true, "points" : [[10,10], [20,10], [20,20], [10,20]] };
This data can be converted to SVG by placing the text of a query in a JavaScript variable and calling the appropriate JavaScript function to invoke the query:
var query = "declare variable stroke := attribute stroke { $data("color") }; declare variable points := attribute points { $data("points")?*?* }; if (closed) then <svg><polygon>{ $stroke, $points }</polygon></svg> else <svg><polyline>{ $stroke, $points }</polyline></svg>"
This query can be invoked with a JavaScript API call:
jsoniq(data, query)
Here is the result of the above query:
<svg><polygon stroke="blue" points="10 10 20 10 20 20 10 20" /></svg>
The data in a JSON array is frequently displayed using HTML tables. The following query shows how to transform from the former to the latter.
The following Object contains the labels desired for columns and rows, as well as the data for the table.
{ "col labels" : ["singular", "plural"], "row labels" : ["1p", "2p", "3p"], "data" : [ ["spinne", "spinnen"], ["spinnst", "spinnt"], ["spinnt", "spinnen"] ] }
The following query creates an HTML table, using the column headings and row labels as well as the data in the Object shown above.
<html> <body> <table> <tr> (: Column headings :) { <th> </th>, for $th in json-doc("table.json")("col labels")?* return <th>{ $th }</th> } </tr> { (: Data for each row :) for $r at $i in json-doc("table.json")("data")?* return <tr> { <th>{ json-doc("table.json")("row labels")[$i]) }</th>, for $c in $r?* return <td>{ $c }</td> } </tr> } </table> </body> </html>
XQuery provides support for both sliding windows and tumbling windows, frequently used to analyze event streams or other sequential data. This simple windowing example converts a sequence of items to a table with three columns (using as many rows as necessary), and assigns a row number to each row.
[ { "color" : "Green" }, { "color" : "Pink" }, { "color" : "Lilac" }, { "color" : "Turquoise" }, { "color" : "Peach" }, { "color" : "Opal" }, { "color" : "Champagne" } }
This example assumes a middleware system that presents relational tables as JSON arrays. The following two tables are used as sample data.
userid | firstname | lastname |
W0342 | Walter | Denisovich |
M0535 | Mick | Goulish |
The JSON representation this particular implementation provides for the above table looks like this:
[ { "userid" : "W0342", "firstname" : "Walter", "lastname" : "Denisovich" }, { "userid" : "M0535", "firstname" : "Mick", "lastname" : "Goulish" } ]
userid | ticker | shares |
W0342 | DIS | 153212312 |
M0535 | DIS | 10 |
M0535 | AIG | 23412 |
The JSON representation this particular implementation provides for the above table looks like this:
[ { "userid" : "W0342", "ticker" : "DIS", "shares" : 153212312 }, { "userid" : "M0535", "ticker" : "DIS", "shares" : 10 }, { "userid" : "M0535", "ticker" : "AIG", "shares" : 23412 } ]
The following query uses the fictitious vendor's vendor:table()
function to retrieve the values from a table, and creates an Object for each user,
with a list of the user's holdings in the value of that Object.
array { for $u in vendor:table("Users") order by $u("userid") return map { "userid" : $u("userid"), "first" : $u("firstname"), "last" : $u("lastname"), "holdings" : array { for $h in vendor:table("Holdings") where $h("userid") = $u("userid") order by $h("ticker") return { "ticker" : $u("ticker"), "share" : $u("shares") } } } }
The XQuery Update Facility allows XML data to be updated. These use cases explore what it means to update JSON in the same way. They are based on use cases for JSONiq's updating functions.
Suppose an application receives an order that contains a credit card number, and needs to put the user on probation.
Data for an order:
{ "user" : "Deadbeat Jim", "credit card" : VISA 4111 1111 1111 1111, "product" : "lottery tickets", "quantity" : 243 }
collection("users") contains the data for each individual user:
{ "name" : "Deadbeat Jim", "address" : "1 E 161st St, Bronx, NY 10451", "risk tolerance" : "high" }
The following query adds the pair "status" : "credit card declined"
to the user's record.
let $dbj := collection("users")[ .("name") = "Deadbeat Jim" ] return insert map { "status" : "credit card declined" } into $dbj
After the update is finished, the user's record looks like this:
{ "name" : "Deadbeat Jim", "address" : "1 E 161st St, Bronx, NY 10451", "status" : "credit card declined", "risk tolerance" : "high" }
Many applications need to modify data before forwarding it to another source. The XQuery Update Facility provides an expression called a tranform expression that can be used to create modified copies. The transform expression uses updating expressions to perform a transformation.
Suppose an application make videos available using feeds from Youtube. The following data comes from one such feed:
{ "encoding" : "UTF-8", "feed" : { "author" : [ { "name" : { "$t" : "YouTube" }, "uri" : { "$t" : "http://www.youtube.com/" } } ], "category" : [ { "scheme" : "http://schemas.google.com/g/2005#kind", "term" : "http://gdata.youtube.com/schemas/2007#video" } ], "entry" : [ { "app$control" : { "yt$state" : { "$t" : "Syndication of this video was restricted by its owner.", "name" : "restricted", "reasonCode" : "limitedSyndication" } }, "author" : [ { "name" : { "$t" : "beyonceVEVO" }, "uri" : { "$t" : "http://gdata.youtube.com/feeds/api/users/beyoncevevo" } } ] !!! SNIP !!!
This example is based on an example on Stefan Goessner's JSONT site (http://goessner.net/articles/jsont/).