Copyright © 2014 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a First Public Working Draft as described in the Process Document. It was developed by the W3C XML Query Working Group, which is part of the XML Activity. The group does not expect this document to become a W3C Recommendation, but to eventually publish this document as a W3C Working Group Note.
These Requirements identify extensions to the XQuery 3.0 Recommendation, published 04 April 2014, that have been requested by WG participants and by reviewers who do not participate in the W3C activities. The XML Query WG has not yet fully reviewed these requirements.
Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[XQuery31Req]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Goals
2 Requirements
2.1 Terminology
2.2 General
Requirements
2.2.1 Backward compatibility
2.2.2 Extension compatibility
2.3 Maps, Arrays, Nulls,
and JSON
2.3.1 Maps
2.3.2 Arrays
2.3.3 Nulls
2.3.4 Serialization
2.4 Usability
Features
2.4.1 Scientific Notation
2.4.2 Type Aliases
2.4.3 Invoking XSLT Transformations
2.4.4 Collations
3 Use Cases
3.1 Streaming
3.1.1 Simple Grouping
3.1.1.1
Solution in XQuery 3.0
3.1.1.2
Solution in XQuery 3.0 with XSLT Maps
3.1.1.3
Solution in XSLT 3.0
3.1.2 Simultaneous Grouping
3.1.2.1
Solution in XQuery 3.0
3.1.2.2
Solution in XQuery 3.0 with XSLT Maps
3.1.3 Word Count by Lemma
3.1.3.1
Input Data
3.1.3.2
Result
3.1.3.3
Solution in XQuery 3.0 with XSLT Maps:
3.1.3.4
Alternative Solution in XQuery 3.0 with XSLT
Maps:
3.1.3.5
Solution Using Grouping in XQuery 3.0:
3.1.3.6
Solution in XSLT 3.0:
3.2 Compound
Values
3.2.1 Complex Number Library
3.2.1.1
Solution in XQuery 3.0 with XSLT Maps:
3.2.1.2
Solution in XSLT 3.0 (using type-alias proposal,
still in discussion):
3.3 Manual
Indexing
3.3.1 Simple Manual Join
3.3.1.1
Input Data
3.3.1.2
Solution in XQuery 3.0 with XSLT Maps:
3.3.1.3
Solution in XSLT 3.0:
3.4 Interface /
Implementation Pattern
3.4.1 Data Variety
3.4.1.1
Input Data
3.4.1.2
Solution in XQuery 3.0 with XSLT Maps:
3.4.1.3
Solution in XSLT 3.0:
3.4.2 Search and Snippeting
3.4.2.1
Solution in XQuery Full Text 3.0 with XSLT
Maps:
3.4.3 Abstracting Document Structure
3.4.3.1
Solution in XQuery 3.0 with XSLT Maps:
3.5 Parameter
Passing
3.5.1 XSLT Stylesheet Parameters
3.5.1.1
Solution in XQuery 3.0 with XSLT Maps:
3.5.2 Function Options
3.5.2.1
Solution in XQuery 3.0 with XSLT Maps:
3.5.2.2
Solution in XQuery 3.0 with XSLT Maps enhanced
with stronger typing:
3.5.3 Translation
3.5.3.1
Solution in XQuery 3.0 with XSLT Maps:
3.5.4 Cipher Functions
3.5.4.1
Solution in XQuery 3.0 with XSLT Maps:
3.6 Natural Language
Processing
3.6.1 Input Data
3.6.2 Convert Part of Speech Data to XML
3.6.3 Converting arrays to maps
3.6.4 Group by Part of Speech
3.6.5 Trigrams
3.6.6 Partitioning using filters
3.7 Comparing Sequences
in Optical Character Recognition
3.8 Transforms for
Graphics
3.9 JSON
3.9.1 Information Retrieval
3.9.1.1
Input Data
3.9.1.2
Result
3.9.1.3
Solution in XQuery 3.0 with XSLT Maps:
3.9.1.4
Alternative Solution in XQuery 3.0 with XSLT
Maps:
3.9.1.5
Solution in JSONiq:
3.9.2 Converting JSON to XML
3.9.2.1
Input Data
3.9.2.2
Result
3.9.2.3
Solution in XQuery 3.0 with XSLT Maps:
3.9.2.4
Solution in JSONiq:
3.9.2.5
Solution in XSLT 3.0:
3.9.3 Update by Copying
3.9.3.1
Input Data
3.9.3.2
Solution in XQuery 3.0 with XSLT Maps:
3.9.3.3
Solution in XSLT 3.0:
3.9.4 Joins
3.9.4.1
Input Data
3.9.4.2
Solution in JSONiq:
3.9.4.3
Solution in XSLT 3.0:
3.9.5 Grouping Queries for JSON
3.9.5.1
Input Data
3.9.5.2
Result
3.9.5.3
Solution in JSONiq:
3.9.5.4
Solution in XSLT 3.0:
3.9.6 More Complex Grouping Queries for JSON
3.9.6.1
Input Data
3.9.6.2
Result
3.9.6.3
Solution in JSONiq:
3.9.6.4
Solution in XSLT 3.0:
3.9.7 JSON to JSON Transformations
3.9.7.1
Input Data
3.9.7.2
Result
3.9.7.3
Solution in JSONiq:
3.9.7.4
Solution in XSLT 3.0:
3.9.8 Converting XML to JSON
3.9.8.1
Input Data
3.9.8.2
Result
3.9.8.3
Solution in JSONiq:
3.9.9 Transforming JSON to SVG
3.9.9.1
Input Data
3.9.9.2
Solution in JSONiq:
3.9.10 Transforming Arrays to HTML Tables
3.9.10.1
Input Data
3.9.10.2
Solution in JSONiq:
3.9.11 Windowing Queries
3.9.11.1
Input Data
3.9.11.2
Result
3.9.11.3
Solution in JSONiq:
3.9.12 JSON views in middleware
3.9.12.1
Input Data
3.9.12.2
Solution in JSONiq:
3.9.13 In-Place Updates
3.9.13.1
Input Data
3.9.13.2
Solution in JSONiq:
3.9.14 Data Transformations
3.9.14.1
Input Data
3.9.14.2
Solution in JSONiq:
The primary goal of XML Query 3.1 is to extend XML Query 3.0 with support for JSON maps and arrays, and to leverage these structures to make XQuery more useful. These data structures are also part of XPath 3.1, and are used in XSLT as well as XQuery.
Other features that improve usability or compatibility will be considered as time permits.
Satisfying these goals may require changes to the set of seven documents that have progressed to Recommendation together (Data Model 3.1, Functions and Operators 3.1, Serialization 3.1, XPath 3.1, XQuery 3.1, XQueryX 3.1, and XSLT 3.0).
The following keywords are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:
The item is an absolute requirement.
The item is an absolute prohibition.
There may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.
There may exist valid reasons when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.
An item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.
When the words MUST, SHOULD, or MAY are used in this technical sense [IETF RFC 2119], they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Each requirement also includes a status section, indicating its current situation in the XQuery/XPath/XSLT family of specifications. Three status levels are used:
This indicates that the requirement, according to its original formulation, has been completely met. Optional clarifying text may follow.
This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.
This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.
XQuery 3.1 MUST be backward compatible with [XQuery 3.0].
Every valid XQuery 3.0 expression MUST be valid in XQuery 3.1 and it MUST evaluate to the same result.
Status: this requirement has been met.
XQuery 3.1 MUST be compatible with XQuery 3.0 extensions developed by the XML Query Working Group, including [XQuery Update Facility 3.0] and [XQuery and XPath Full Text 3.0].
Status: this requirement has been met.
XQuery 3.1 MUST support collections of name / value pairs, which we call maps. In JSON, they are called objects, in other languages they are sometimes called records, structs, dictionaries, hash tables, keyed lists, or associative arrays).
Status: this requirement has been met.
The map feature MUST provide a convenient syntax for creating maps.
Status: this requirement has been met.
The map feature MUST provide a convenient syntax for returning the value associated with a key.
Status: this requirement has been met.
The map feature MUST provide a convenient way to enumerate the keys in a map.
Status: this requirement has been met (using functions).
The map feature MUST provide a convenient way to create modified copies of maps, e.g. by adding or deleting entries.
Status: this requirement has been met (using functions).
The map feature MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.
Status: this requirement has been met.
A map SHOULD allow any atomic value as a key. The map feature SHOULD allow keys of various types to be used as keys in the same map.
Status: this requirement has been met.
A map SHOULD allow any XDM sequence as a value. A map MUST allow any XDM item, map, or array as a value.
Status: this requirement has been met.
A map MUST be allowed as a member of an XDM sequence.
Status: this requirement has been met.
It MAY be possible to use a map as a function.
Status: this requirement has been met.
For the sake of optimizability, a map SHOULD NOT expose identity via the
is
, <<
, >>
,
union
, intersect
, or except
operators, or any operation that exposes document order.
Status: this requirement has been met.
XQuery 3.1 MUST support arrays, which can nest.
Status: this requirement has been met.
XQuery 3.1 MUST provide a convenient syntax for creating arrays.
Status: this requirement has been met.
Arrays MUST provide a convenient syntax for returning the value found in a given position.
Status: this requirement has been met (using function call syntax).
Arrays SHOULD provide a convenient way to create modified copies of an array, e.g. by adding or deleting entries.
Status: this requirement has been met (using functions).
Arrays MUST NOT preclude in-situ updates analogous to updates in the XQuery Update Facility.
Status: this requirement has been met.
An array MUST allow any XDM item, array, or map as a member of an array.
Status: this requirement has been met.
An array MUST be allowed as a member of an XDM sequence.
Status: this requirement has been met.
It MAY be possible to use an array as a function.
Status: this requirement has been met.
For the sake of optimizability, an array SHOULD NOT expose identity via the
is
, <<
, >>
,
union
, intersect
, or except
operators, or any operation that exposes document order.
Status: this requirement has been met.
XQuery 3.1 MUST provide support for numbers in scientific notation.
Status: this requirement has been met.
XQuery 3.1 MAY support aliases for types.
Status: this requirement has not been met.
XQuery 3.1 MUST provide a means to invoke XSLT transformations.
Status: this requirement has not been met.
XQuery 3.1 MAY provide a standard mechanism for referring to collations.
Status: this requirement has been met
(via fn:put()
.
The solutions provided for the following Use Cases include solutions in the following languages:
XSLT Maps: the maps provided in the current Working Draft of XSLT. See [XSLT 3.0].
XQuery 3.0: the XQuery 3.0 language, without maps. See [XQuery 3.0].
JSONiq: the JSONiq proposal. See [JSONiq].
Note:
None of these solutions are in the XQuery 3.1 language. These solutions are shown in languages we used to investigate the requirements for XQuery 3.1. The next publication of these use cases will replace the current set of solutions with XQuery 3.1 solutions.
In a streaming application you only get one chance to look at each piece of data in the source file. Therefore, if the output is not a pure event-to-event function of the input, you have to selectively remember some of the things you have seen in the input for use later. This sometimes creates a need for data structures to hold working data in memory. This is an important motivating use case for maps in XSLT. Some of the motivating examples for XSLT can be solved in other ways in XQuery; because XQuery does not have a streaming facility, it's unclear whether maps would be the best solution for these examples in a streaming XQuery processor.
Note:
This is solved in XSLT 3.0 using the streaming facility.
Find the highest earning employee in each department.
for $e in doc("employees.xml")/employees/employee, $d in $e/department group by $d return <department name="{$d}"> { let $max := max($e/salary) return $e[salary=$max] } </department>
declare function local:search-employees( $employees as element(employee)*, $highest-earners as map(xs:string, element(employee)) ) { if(empty($employees)) then $highest-earners else let $this := head($employees) let $existing := $highest-earners($this/department) let $new-earners := if ($existing/salary gt $this/salary) then $highest-earners else map:new(($highest-earners, map:entry($this/department, $this))) return local:search-employees(tail($employees), $new-earners) }; let $highest-earners := local:search-employees(doc("employees.xml")/*/employee, map:new()) for $department in map:keys($highest-earners) return <department name="{$department}">{ $highest-earners($department) }</department>
<xsl:stream href="employees.xml"> <xsl:iterate select="*/employee"> <xsl:param name="highest-earners" as="map(xs:string, element(employee))" select="map:new()"/> <xsl:variable name="this" select="copy-of(.)" as="element(employee)"/> <xsl:next-iteration> <xsl:with-param name="highest-earners" select="let $existing := $highest-earners($this/department) return if ($existing/salary gt $this/salary) then $highest-earners else map:new(($highest-earners, map:entry($this/department, $this)))"/> </xsl:next-iteration> <xsl:on-completion> <xsl:for-each select="map:keys($highest-earners)"> <department name="{.}"> <xsl:copy-of select="$highest-earners(.)"/> </department> </xsl:for-each> </xsl:on-completion> </xsl:iterate> </xsl:stream>
Find both the highest earning employee in each department, and the total number of employees to job-type across all departments.
for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $department := $employee/department let $max-salary := max($salary) let $highest-earners := $employee[salary = $max-salary] return <department name="{$department}">{ $highest-earners }</department>, for $employee in doc("employees.xml")/*/employee let $salary := $employee/salary group by $job-type := $employee/job-type let $totals := count($employee) return <total-by-job-type type="{$job-type}">{ $totals }</total-by-job-type>
declare function local:search-employees( $employees as element(employee)*, $highest-earners as map(xs:string, element(employee), $totals as map(xs:string, xs:double)) ) { if(empty($employees)) then ($highest-earners, $totals) else let $this := head($employees) let $existing := $highest-earners($this/department) let $new-earners := if ($existing/salary gt $this/salary) then $highest-earners else map:new(($highest-earners, map:entry($this/department, $this))) let $job-type := $this/job-type/string() let $new-totals := map:new(($totals, map { $job-type := $totals($job-type) + 1 })) return local:search-employees(tail($employees), $new-earners, $new-totals) }; let $results := local:search-employees(doc("employees.xml")/*/employee, map:new()) let $highest-earners := $results[1] let $totals := results[2] return ( for $department in map:keys($highest-earners) return <department name="{$department}">{ $highest-earners($department) }</department>, for $job-type in map:keys($totals) return <total-by-job-type type="{$job-type}">{ $totals($job-type) }</total-by-job-type> )
Calculate the word count by lemma of the verbs in the following document.
The XML document, gnt.xml.
<gnt> <s> <w pos="PP">I</w> <w pos="V" lemma="go">go</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="go">went</w> <pu>.</pu> </s> <s> <w pos="PP">He</w> <w pos="V" lemma="go">goes</w> <pu>.</pu> </s> <s> <w pos="PP">I</w> <w pos="V" lemma="see">see</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="see">sees</w> <pu>.</pu> </s> <s> <w pos="PP">I</w> <w pos="V" lemma="have">have</w> <pu>.</pu> </s> <s> <w pos="PP">She</w> <w pos="V" lemma="have">has</w> <pu>.</pu> </s> </gnt>
<verb lemma="go" count="3"/> <verb lemma="see" count="2"/> <verb lemma="have" count="2"/>
declare function local:word-count($words, $result) { if(empty($words)) then $result else let $word := head($words) return local:word-count(tail($words), map:new(($result, map { $word/@lemma := ($result($word/@lemma), 0)[1] + 1 }))) }; let $counts := local:word-count(doc("gnt.xml")//w[m:is-verb(.)], map{}) for $lemma in map:keys($counts) let $count := $counts($lemma) order by $count return <verb lemma="{ $lemma }" count="{ $count }"/>
let $counts := fold-left(function($map, $word) { map:new(($result, map { $word/@lemma := ($map($word/@lemma), 0)[1] + 1 })) }, map{}, doc("gnt.xml")//w[m:is-verb(.)]) for $lemma in map:keys($counts) let $count := $counts($lemma) order by $count return <verb lemma="{ $lemma }" count="{ $count }"/>
A solution just using grouping, without maps.
for $word in doc("gnt.xml")//w let $lemma := $word/@lemma where m:is-verb($word) group by $lemma order by count($word) descending return <verb lemma="{ $lemma }" count="{count($word)}" />
<xsl:iterate select="doc("gnt.xml")//w"> <xsl:param name="result" select="map{}"/> <xsl:next-iteration> <xsl:with-param name="result" select="map:new(($map, map { $word := ($map($word), 0)[1] + 1 }))"/> </xsl:next-iteration> <xsl:on-completion> <xsl:for-each select="map:keys($result)"> <xsl:sort select="$result(.)"/> <verb lemma="{ . }" count="{ $result(.) }"/> </xsl:for-each> </xsl:on-completion> </xsl:iterate>
Implement a complex number library for XQuery or XSLT 3.0. Complex numbers should be represented as a single item, so they can themselves be manipulated like regular numbers by returning sequences of them etc.
declare function i:complex( $real as xs:double, $imaginary as xs:double ) as map(xs:boolean, xs:double) { map{ true() := $real, false() := $imaginary } }; declare function i:real( $complex as map(xs:boolean, xs:double) ) as xs:double { $complex(true()) }; declare function i:imaginary( $complex as map(xs:boolean, xs:double) ) as xs:double { $complex(false()) }; declare function i:add( $arg1 as map(xs:boolean, xs:double), $arg2 as map(xs:boolean, xs:double) ) as map(xs:boolean, xs:double) } i:complex(i:real($arg1)+i:real($arg2), i:imaginary($arg1)+i:imaginary($arg2)) }; declare function i:multiply( $arg1 as map(xs:boolean, xs:double), $arg2 as map(xs:boolean, xs:double) ) as map(xs:boolean, xs:double) { i:complex( i:real($arg1)*i:real($arg2) - i:imaginary($arg1)*i:imaginary($arg2), i:real($arg1)*i:imaginary($arg2) + i:imaginary($arg1)*i:real($arg2)) };
<xsl:type-alias name="i:complex" as="map(xs:boolean, xs:double)"/> <xsl:function name="i:complex" as="i:complex"> <xsl:param name="real" as="xs:double"/> <xsl:param name="imaginary" as="xs:double"/> <xsl:sequence select="map{ true() := $real, false() := $imaginary }"/> </xsl:function> <xsl:function name="i:real" as="xs:double"> <xsl:param name="complex" as="i:complex"/> <xsl:sequence select="$complex(true())"/> </xsl:function> <xsl:function name="i:imaginary" as="xs:double"> <xsl:param name="complex" as="i:complex"/> <xsl:sequence select="$complex(false())"/> </xsl:function> <xsl:function name="i:add" as="i:complex"> <xsl:param name="arg1" as="i:complex"/> <xsl:param name="arg2" as="i:complex"/> <xsl:sequence select="i:complex(i:real($arg1)+i:real($arg2), i:imaginary($arg1)+i:imaginary($arg2))"/> </xsl:function> <xsl:function name="i:multiply" as="i:complex"> <xsl:param name="arg1" as="i:complex"/> <xsl:param name="arg2" as="i:complex"/> <xsl:sequence select="i:complex( i:real($arg1)*i:real($arg2) - i:imaginary($arg1)*i:imaginary($arg2), i:real($arg1)*i:imaginary($arg2) + i:imaginary($arg1)*i:real($arg2))"/> </xsl:function>
Build an index to manually optimize retrieval of books in a catalog by their ISBN number.
Construct a list of all authors, and the books they have written.
Book elements of the form:
<book> <isbn>0470192747</isbn> <publisher>Wiley</publisher> <title>XSLT 2.0 and XPath 2.0 Programmer's Reference</title> </book>
Author elements of the form:
<author> <author>Michael H. Kay</author> <isbn>0470192747</isbn> <isbn>...</isbn> </book>
declare variable $index := map:new(//book ! map{isbn := .}); <table>{ for $a in //author return <tr> <td>{ $a/name/string() }</td> <td>{ string-join($a/isbn ! $index(.)/title/string(), ", ") }</td> </tr> }</table>
XSLT has the xsl:key functionality, which is preferable. However, a straight forward translation from the XQuery solution follows:
<xsl:variable name="index" select="map:new(//book ! map{isbn := .})"/> <table> <xsl:for-each select="//author"> <tr> <td><xsl:value-of select="name"/></td> <td><xsl:value-of select="string-join(isbn ! $index(.)/title/string(), ', ')"/></td> </tr> </xsl:for-each> }</table>
As in Javascript, a map whose keys are strings and whose associated values are function items can be used in a similar way to a class in object-oriented programming languages.
Suppose an application needs to handle customer order information that may arrive in three different formats, with different hierarchic arrangement.
An application can isolate itself from these differences by defining a set of functions to navigate the relationships between customers, orders, and products: orders-for-customer, orders-for-product, customer-for-order, product-for-order. These functions can be implemented in different ways for the three different input formats.
Flat structure:
<customer id="c123">...</customer> <product id="p789">...</product> <order customer="c123" product="p789">...</order>
Orders within customer elements:
<customer id="c123"> <order product="p789">...</order> </customer> <product id="p789">...</product>
Orders within product elements:
<customer id="c123">...</customer> <product id="p789"> <order customer id="c123">...</order> </product>
For example, with the first format the implementation might be:
let $flat-input-functions as map(xs:string, function(*))* return map { 'orders-for-customer' := function($c as element(customer)) as element(order)* { $c/../order[@customer=$c/@id] }, 'orders-for-product' := function($p as element(product)) as element(order)* { $p/../order[@product=$p/@id] }, 'customer-for-order' := function($o as element(order)) as element(customer) { $o/../customer[@id=$o/@customer] }, 'product-for-order' := function($o as element(order)) as element(product) { $o/../product[@id=$o/@product] } }
<xsl:variable name="flat-input-functions" as="map(xs:string, function(*))*" select="map { 'orders-for-customer' := function($c as element(customer)) as element(order)* {$c/../order[@customer=$c/@id]}, 'orders-for-product' := function($p as element(product)) as element(order)* {$p/../order[@product=$p/@id]}, 'customer-for-order' := function($o as element(order)) as element(customer) {$o/../customer[@id=$o/@customer]}, 'product-for-order' := function($o as element(order)) as element(product) {$o/../product[@id=$o/@product]} } "/>
Create a general interface that takes as input some words, does a full-text search for them, and returns snippets of the top 10 results, ordered by score, where the nodes to search, their structure, how to construct snippets and how to score them differ for different data sets.
Create a template method and use a map of functions to define the implementation of the plug-in points.
(: General interface module :) module namespace this="http://example.com/search-interface/"; declare function this:search( $words as xs:string*, $collection as map(xs:string, function(*)) ) { (for $d in $collection('select')[. contains text {$words} any word] order by $collection('score', $d, $words) return $collection('snippet', $d, $words))[position()<=10] }; (: Specific implementation example :) import module namespace s="http://example.com/search-interface/"; declare variable $twitter as map(xs:string, function(*)) := map { 'select' := function() as node()* { collection("twitter") }, 'score' := function($n as node(), $words as xs:string*) as xs:double { let score $s1 := $n contains text {$words} any word let score $s2 := $n contains text {$words} all words return $s1 + $s2 }, 'snippet' := function($node as node(), $words as xs:string*) as node() { $node }, }; declare variable $blog as map(xs:string, function(*)) := map { 'select' := function() as node()* { collection("blogs")/body }, 'score' := function($n as node(), $words as xs:string*) as xs:double { let $s1 := avg( for $p score $s in $n/para[. contains text {$words} any word] return $s) let $s2 := avg( for $p score $s in $n/comment[. contains text {$words} weight 0.5 any word] return $s) let score $s3 := $n/title contains text {$words} weight 5.0 any word return $s1 + $s2 + $s3 }, 'snippet' := function($node as node(), $words as xs:string*) as node() { <result>{$node/title, $node/para[1], $node/comment[1]}</result> }, }; declare variable $books as map(xs:string, function(*)) := map { 'select' := function() as node()* { collection()//chapter }, 'score' := function($n as node(), $words as xs:string*) as xs:double { let score $s1 := $n contains text {$words} any word let score $s2 := $n/title contains text {$words} weight 5.0 any word return $s1 + $s2 }, 'snippet' := function($node as node(), $words as xs:string*) as node() { <result>{$node/title, ((for $p score $s in $node/p[. contains text {$words} all words] order by $s return $p), (for $p score $s in $node/p[. contains text {$words} any word] order by $s return $p))[1] }</result> }, }; (: Get top 10 from various sources :) s:search(("fire","earthquake"),$books), s:search(("fire","earthquake"),$twitter), s:search(("fire","earthquake"),$blog)
Provide access to various pieces of metadata to application, insulating that application code from variations in document structure.
Define the metadata interface through a map of functions.
(: Specific implementations :) declare namespace xh="http://www.w3.org/1999/xhtml"; declare variable $xhtml as map(xs:string, function(*)) := map { 'title' := function($n as document-node()) as xs:string? { $n/xh:head/xh:title }, 'author' := function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='author']/@content }, 'pubdate' := function($n as document-node()) as xs:string? { $n/xh:head/xh:meta[@name='created']/@content }, 'publisher' := function($n as document-node()) as xs:string? { () } }; declare variable $medline-citation as map(xs:string, function(*)) := map { 'title' := function($n as document-node()) as xs:string? { $n/MedlineCitation/Article/ArticleTitle }, 'author' := function($n as document-node()) as xs:string? { string-join( for $a in $n/MedlineCitation//Author return concat($a/LastName, ", ", $a/ForeName), "; ") }, 'pubdate' := function($n as document-node()) as xs:string? { let $d := $n/MedlineCitation/Article/PubDate return string-join(($d/Day,$d/Month,$d/Year), " ") }, 'publisher' := function($n as document-node()) as xs:string? { $n/MedlineCitation/MedlineJournalIngo/MedlineTA } };
Often library functions may have a large number of optional arguments, which are awkward or impossible to provide using the existing mechanism of variable arity functions.
Pass the list of parameter names and values to the xdmp:xslt-invoke() function, which invokes an XSLT stylesheet.
declare function xdmp:xslt-invoke($path as xs:string, $input as node(), $params as map(xs:QName, item()*)) as document-node()* external; xdmp:xslt-invoke("my-stylesheet.xsl", doc("my-doc.xml"), map { xs:QName("toc") := true(), xs:QName("index") := doc("index_terms.xml") })
Provide a machanism to supply (otherwise defaulted) option values to the fn:doc() function, which control aspects of it's behaviour, including:
Parsing of external entities
DTD validation
XML Schema validation
Lax (XML Schema) validation
Whitespace stripping
URI resolution
Using maps in this scenario brings benefits over using XML structure, including:
Nodes are not copied; identity is retained
Atomic items are not serialized, and retain their specific type
Functions can be passed in as options - the relevant example in this case being the URI resolver.
declare function fn:doc($uri as xs:string, $options as map(xs:string, item()*)) as document-node()? external; (: Enable lax XML Schema validation :) doc("validate-me.xml", map { "schema-validation" := true(), "lax-validation" := true() }), (: Enable whitespace stripping, and a custom URI resolution :) doc("../relative-uri.xml", map { "strip-whitespace" := true(), "uri-resolver" := resolve-uri(?, base-uri()) })
declare function fn:doc( $uri as xs:string, $options as strong-map( external-entities as xs:boolean?, dtd-validation as xs:boolean?, schema-validation as xs:boolean?, lax-validation as xs:boolean?, strip-whitespace as xs:boolean?, uri-resolver as function(xs:string) as xs:string ) ) as document-node()? external; (: Enable lax XML Schema validation :) doc("validate-me.xml", map { xs:QName("schema-validation") := true(), xs:QName("lax-validation") := true() }), (: Enable whitespace stripping, and a custom URI resolution :) doc("../relative-uri.xml", map { xs:QName("strip-whitespace") := true(), xs:QName("uri-resolver") := resolve-uri(?, base-uri()) })
Design a language-agnostic game (here just the core), which allows a translation function or map as a parameter.
declare function local:play( $secret-number as xs:integer, $guessed-number as xs:integer, $translator as function(xs:string) as xs:string) { switch (true()) case $guessed-number eq $secret-number return $translator("You won!") case $guessed-number lt $secret-number return $translator("The secret number is greater.") default case (: $guessed-number gt $secret-number :) return $translator("The secret number is lower.") }; local:play(76, 86, function($x) { $x }), (: Keep English :) local:play(76, 86, map { "You won!" := "Du hast gewonnen!", "The secret number is greater." := "Die geheime Zahl ist groesser.", "The secret number is lower." := Die geheime Zahl ist kleiner." } ), local:play(76, 86, $automated-translator-based-on-natural-language-processing)
Provide an encryption function which will encode some input according to a cipher that can be a codebook implemented as a map or an explicit algorithm.
declare function local:encode( $input as xs:string, $cipher as function(xs:integer) as xs:integer) { codepoints-to-string($cipher(string-to-codepoints($input))) }; let $code := { string-to-codepoints("a") := string-to-codepoints("z"), string-to-codepoints("b") := string-to-codepoints("e"), ... } return local:encode("Message", $code), local:encode("Message", function($c) { $c + 3 (: Caesar's cipher :) })
Software used for natural language processing and text analytics frequently uses data structures like maps and arrays. For instance, the Python Natural Language Toolkit (NLTK) uses lists and tuples extensively. In this use case, we use a library that invokes NLTK to perform simple natural language processing, returning results in a format very similar to that used by NLTK, and perform a variety of simple tasks.
In this use case, we are using the Gutenberg edition of Jane
Austin's "Emma", as packaged in NLTK. To return the sentences of a
text, we use the nltk:sentences()
function, which
returns sentences using the same data structures as NLTK.
Here are a few sentences resulting from the function call
nltk:sentences('austin-emma.txt')
, using arrays to
represent Python's list structures:
Sentence Representation:
[ ['I', 'must', 'put', 'on', 'a', 'few', 'ornaments', 'now', ',', 'because', 'it', 'is', 'expected', 'of', 'me', '.'], ['A', 'bride', ',', 'you', 'know', ',', 'must', 'appear', 'like', 'a', 'bride', ',', 'but', 'my', 'natural', 'taste', 'is', 'all', 'for', 'simplicity', ';', 'a', 'simple', 'style', 'of', 'dress', 'is', 'so', 'infinitely', 'preferable', 'to', 'finery', '.'], ['But', 'I', 'am', 'quite', 'in', 'the', 'minority', ',', 'I', 'believe', ';', 'few', 'people', 'seem', 'to', 'value', 'simplicity', 'of', 'dress', ',--', 'show', 'and', 'finery', 'are', 'every', 'thing', '.'] ]
NLTK has multiple representations of sentences. If
$s
is bound to the second sentence in the above data
structure, then nltk:pos-tag($s)
returns the
following:
Part of Speech Representation:
[['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.'] ]
If $s is bound to a part of speech representation, we can convert it to an XML format using the following query:
<s> { for $w in $s() return <w pos="{ $w(2) }">{ $w(1) }</w> } </s>
Or if we prefer to use meaningful names instead of the numeric positions, we can create an index that maps between names and positions and use it as follows:
declare variable $index := { "pos" : 2, "lemma" : 1 }; <s> { for $w in $s() return <w pos="{ $w($index("pos")) }">{ $w($index("lemma")) }</w> } </s>
Both queries have the same result:
<s> <w pos="DT">A</w> <w pos="NN">bride</w> <w pos=",">,</w> <w pos="PRP">you</w> <w pos="VBP">know</w> <w pos=",">,</w> <w pos="MD">must</w> <w pos="VB">appear</w> <w pos="IN">like</w> <w pos="DT">a</w> <w pos="NN">bride</w> <w pos=",">,</w> <w pos="CC">but</w> <w pos="PRP$">my</w> <w pos="JJ">natural</w> <w pos="NN">taste</w> <w pos="VBZ">is</w> <w pos="DT">all</w> <w pos="IN">for</w> <w pos="NN">simplicity</w> <w pos=":">;</w> <w pos="DT">a</w> <w pos="JJ">simple</w> <w pos="NN">style</w> <w pos="IN">of</w> <w pos="NN">dress</w> <w pos="VBZ">is</w> <w pos="RB">so</w> <w pos="RB">infinitely</w> <w pos="JJ">preferable</w> <w pos="TO">to</w> <w pos="VB">finery</w> <w pos=".">.</w> </s>
If $s is bound to a sentence in part of speech representation, the following query converts it to a map with meaningful property names:
[ for $w in $s() return { "pos" : $w(2), "lemma" : $w(1) } ]
Here is the output of the above query:
[ { "pos" : "DT", "lemma" : "A" }, { "pos" : "NN", "lemma" : "bride" }, { "pos" : ",", "lemma" : "," }, { "pos" : "PRP", "lemma" : "you" }, { "pos" : "VBP", "lemma" : "know" }, { "pos" : ",", "lemma" : "," }, { "pos" : "MD", "lemma" : "must" }, { "pos" : "VB", "lemma" : "appear" }, { "pos" : "IN", "lemma" : "like" }, { "pos" : "DT", "lemma" : "a" }, { "pos" : "NN", "lemma" : "bride" }, { "pos" : ",", "lemma" : "," }, { "pos" : "CC", "lemma" : "but" }, { "pos" : "PRP$", "lemma" : "my" }, { "pos" : "JJ", "lemma" : "natural" }, { "pos" : "NN", "lemma" : "taste" }, { "pos" : "VBZ", "lemma" : "is" }, { "pos" : "DT", "lemma" : "all" }, { "pos" : "IN", "lemma" : "for" }, { "pos" : "NN", "lemma" : "simplicity" }, { "pos" : ":", "lemma" : ";" }, { "pos" : "DT", "lemma" : "a" }, { "pos" : "JJ", "lemma" : "simple" }, { "pos" : "NN", "lemma" : "style" }, { "pos" : "IN", "lemma" : "of" }, { "pos" : "NN", "lemma" : "dress" }, { "pos" : "VBZ", "lemma" : "is" }, { "pos" : "RB", "lemma" : "so" }, { "pos" : "RB", "lemma" : "infinitely" }, { "pos" : "JJ", "lemma" : "preferable" }, { "pos" : "TO", "lemma" : "to" }, { "pos" : "VB", "lemma" : "finery" }, { "pos" : ".", "lemma" : "." } ]
If $s is bound to a sentence in part of speech representation, the following query groups words by part of speech, selecting parts of speech particularly illustrative of Jane Austen's writing style.
for $word in $s() let $pos := $word(2) let $lexeme := $word(1) where $pos = ("JJ", "NN", "RB", "VB") group by $pos order by $pos return <pos name="{$pos}"> { for $l in distinct-values($lexeme) return <lexeme>{ $l }</lexeme> } </pos>
Here is the output of the above query:
<pos name="JJ"> <lexeme>natural</lexeme> <lexeme>simple</lexeme> <lexeme>preferable</lexeme> </pos> <pos name="NN"> <lexeme>bride</lexeme> <lexeme>taste</lexeme> <lexeme>simplicity</lexeme> <lexeme>style</lexeme> <lexeme>dress</lexeme> </pos> <pos name="RB"> <lexeme>so</lexeme> <lexeme>infinitely</lexeme> </pos> <pos name="VB"> <lexeme>appear</lexeme> <lexeme>finery</lexeme> </pos>
In corpus linguistics, n-grams are the basis for certain statistical techniques used to explore and compare texts; for instance, they are used to determine authorship of texts. If $s is bound to a sentence in sentence notation, the following query computes trigrams for a text:
declare function local:words-only($s) { for $w in $s where not($w(2) = (".", ",", ";", ":")) return $w(1) }; for sliding window $w in local:words-only($s()) start at $i when true() only end at $j when $j - $i eq 2 return [ $w ]
Here is the result for a sentence used in an earlier example:
[ "A", "bride", "you" ], [ "bride", "you", "know" ], [ "you", "know", "must" ], [ "know", "must", "appear" ], [ "must", "appear", "like" ], [ "appear", "like", "a" ], [ "like", "a", "bride" ], [ "a", "bride", "but" ], [ "bride", "but", "my" ], [ "but", "my", "natural" ], [ "my", "natural", "taste" ], [ "natural", "taste", "is" ], [ "taste", "is", "all" ], [ "is", "all", "for" ], [ "all", "for", "simplicity" ], [ "for", "simplicity", "a" ], [ "simplicity", "a", "simple" ], [ "a", "simple", "style" ], [ "simple", "style", "of" ], [ "style", "of", "dress" ], [ "of", "dress", "is" ], [ "dress", "is", "so" ], [ "is", "so", "infinitely" ], [ "so", "infinitely", "preferable" ], [ "infinitely", "preferable", "to" ], [ "preferable", "to", "finery" ]
Filters can be used to partition the words of a sentence in a
variety of ways. In this simple example, we use filters to
distinguish verbs from other parts of speech. In NLTK, parse codes
that start with the string VB
denote verb forms.
In this example, the variable $s
is bound to
sentence in parsed format, e.g.
[ ['A', 'DT'], ['bride', 'NN'], [',', ','], ['you', 'PRP'], ['know', 'VBP'], [',', ','], ['must', 'MD'], ['appear', 'VB'], ['like', 'IN'], ['a', 'DT'], ['bride', 'NN'], [',', ','], ['but', 'CC'], ['my', 'PRP$'], ['natural', 'JJ'], ['taste', 'NN'], ['is', 'VBZ'], ['all', 'DT'], ['for', 'IN'], ['simplicity', 'NN'], [';', ':'], ['a', 'DT'], ['simple', 'JJ'], ['style', 'NN'], ['of', 'IN'], ['dress', 'NN'], ['is', 'VBZ'], ['so', 'RB'], ['infinitely', 'RB'], ['preferable', 'JJ'], ['to', 'TO'], ['finery', 'VB'], ['.', '.'] ]
The filter function takes a boolean function, and returns one array with those items that satisfy the function, and a second array with those items that do not.
declare function local:filter($s as item()*, $p as function(item()) as xs:boolean) { [ $s[$p(.)] ], [ $s[not($p(.))] ] };
We can call it with the starts-with()
function to
partition a sentence.
let $f := function($a) { starts-with($a(2), "VB") } return local:filter($s(), $f)
Here is the output of the query for the sentence shown above.
[ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ], [ "is", "VBZ" ], [ "finery", "VB" ] ], [ [ "A", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "you", "PRP" ], [ ",", "," ], [ "must", "MD" ], [ "like", "IN" ], [ "a", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT" ], [ "for", "IN" ], [ "simplicity", "NN" ], [ ";", ":" ], [ "a", "DT" ], [ "simple", "JJ" ], [ "style", "NN" ], [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], [ "infinitely", "RB" ], [ "preferable", "JJ" ], [ "to", "TO"], [ ".", "." ] ]
A programmer might choose to represent filter results using a map instead of an array, as shown in the following code.
declare function local:filter($s as item()*, $p as function(item()) as xs:boolean) { { true() : [ $s[$p(.)] ], false() : [ $s[not($p(.))] ] } }; let $f := function($a) { starts-with($a(2), "VB") } return local:filter($s(), $f)
Here is the output of the above query using the same data.
{ "true" : [ [ "know", "VBP" ], [ "appear", "VB" ], [ "is", "VBZ" ], ["is", "VBZ" ], [ "finery", "VB" ] ], "false" : [ [ "A", "DT" ], ["bride", "NN" ], [ ",", "," ], [ "you", "PRP" ], [ ",", "," ], [ "must", "MD" ], [ "like", "IN" ], [ "a", "DT" ], [ "bride", "NN" ], [ ",", "," ], [ "but", "CC" ], [ "my", "PRP$" ], [ "natural", "JJ" ], [ "taste", "NN" ], [ "all", "DT"], [ "for", "IN" ], [ "simplicity", "NN" ], [ ";", ":" ], [ "a", "DT" ], [ "simple", "JJ" ], [ "style", "NN" ], [ "of", "IN" ], [ "dress", "NN" ], [ "so", "RB" ], [ "infinitely", "RB" ], [ "preferable", "JJ" ], [ "to", "TO" ], [ ".", "." ] ] }
When Rigaudon optical character recognition software is used for multilingual texts, languages are identified by character set if possible, and formatted in hocr format. For instance, the text "the other possible derivation from ἡ ἐπιοῦσα, dies crastinus", which contains English, Greek, and Latin, might be represented as follows in raw OCR output (the format is simplified somewhat for the sake of presentation).
<span class="ocr_word" title="bbox 1388 430 1461 474">the</span> <span class="ocr_word" title="bbox 1514 433 1635 476">other</span> <span class="ocr_word" title="bbox 133 498 317 554">pcssible</span> <span class="ocr_word" title="bbox 354 498 590 541">derivation</span> <span class="ocr_word" title="bbox 631 497 738 538">from</span> <span class="ocr_word" title="bbox 772 495 799 547" lang="grc" xml:lang="grc">ἡ</span> <span class="ocr_word" title="bbox 835 495 1019 538" lang="grc" xml:lang="grc">ἐπιοῦσα</span> <span class="ocr_word" title="bbox 134 567 220 607">dies</span> <span class="ocr_word" title="bbox 257 566 462 607">erastinus</span>
In the above output, two words were not correctly recognized, the English word "possible" and the Latin word "crastinus". Rigaudon uses multilingual spell checkers to find the nearest likely word in a one of the languages likely to be used in a given text. For this particular text, we expect to find English, Greek, and Latin.
In this use case, we take the above hocr as input and call the spellcheck function, implemented as an external function, to identify which words are likely in each candidate language. Having done so, we combine the results to construct the most likely text.
The following function extracts the text from the above data.
declare function local:extract-text($spans) { for $s in $spans return string($s) };
Here is the output of the function for the data shown above.
"the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus"
The following function performs a spellcheck in a set of languages, creating a map that identifies the original and each language.
declare variable $languages := ("English", "Greek", "Latin"); declare function local:spellcheck($languages, $text) { {| { "languages" : $languages }, { "raw" : $text }, for $l in $languages return { $l : [ for $w in $text return ext:sc($l, $w) ] } |} }; let $t := local:extract-text($spans) return local:spellcheck($languages, $t)
Here is the output of the above query.
{ "languages" : ( "English", "Greek", "Latin" ), "raw" : [ "the", "other", "pcssible", "derivation", "from", "ἡ", "ἐπιοῦσα", "dies", "erastinus" ], "English" : [ "the", "other", "possible", "derivation", "from", null, null, "dies", null ], "Greek" : [ null, null, null, null, null, "ἡ", "ἐπιοῦσα", null, null ], "Latin" : [ null, null, null, null, null, null, null, "dies", "erastinus" ] }
The following function merges lookup results in the above
format. The first parameter lists a set of languages, in preference
order. For each word, the function picks the non-null lookup result
for the most preferred language available, or the original "raw"
word if all lookups return null. In this code, we assume that
$m
is bound to the data structure shown above.
declare variable $languages := ("English", "Greek", "Latin"); declare function local:merge($languages, $m) { let $size := count($m("raw")()) for $i in 1 to $size let $candidates := ($languages ! $m(.)($i)[ . ne null] , $m("raw")($i)) return $candidates[1] }; local:merge($languages, $m)
Here is the result of the query:
the other possible derivation from ἡ ἐπιοῦσα dies crastinus
This use case uses rotation matrices to rotate a shape in three dimensions.
The following library implements three-dimensional rotation in XQuery
declare function local:rotate-x( $theta ) { [ [ 1, 0, 0 ], [ 0, cosine($theta), - sine($theta) ], [ 0, sine($theta), cosine($theta) ] ] }; declare function local:rotate-y( $theta ) { [ [ cosine($theta), 0, sine($theta) ], [ 0, 1, 0], [ - sine($theta), 0, cosine($theta) ] ] }; declare function local:rotate-z( $theta ) { [ [ cosine($theta), - sine($theta), 0 ], [ sine($theta), cosine($theta), 0 ], [ 0, 0, 1] ] }; declare function local:rotate($pitch as xs:double, $yaw as xs:double, $roll as xs:double) { let $p := local:rotate-x($pitch) let $y := local:rotate-y($yaw) let $r := local:rotate-z($roll) let $py :=local:mult($p, $y) return local:mult($py, $r) }; declare function local:mult( $matrix1, $matix2 ) { if (length($matrix1) != length($matrix2(1)) then error("Matrices must be m*n and n*p to multiply!") else [ for $i in 1 to length($matrix1) return [ for $j in 1 to length($matrix2(1)) return sum ( for $k in 1 to length($matrix2) return $matrix1($i)($k) * $matrix2($k)($j) ) ] ] }; let $rect := [[0, 0, 0], [10, 0, 0], [10, 10, 0], [0, 10, 0], [0, 0, 0]] let $rot := for $r in $rect() return local:mult($r, local:rotate( 10, 10, 10 ) return img:render( $rot )
JSON is becoming an important data format that many XQuery and XSLT users have to deal with. Tasks performed can include importing JSON, processing it, and exporting JSON.
Import a JSON document and retrieve the mobile phone number from it.
The fn:parse-json() function parses a JSON document into an XDM value as follows:
A JSON object is converted into a map of type map(xs:string, item()?).
A JSON array is converted into a map of type map(xs:integer, item()?).
A JSON string is converted into an xs:string atomic value.
A JSON number is converted into an xs:double atomic value.
A JSON boolean is converted into an xs:boolean atomic value.
A JSON null is converted into the empty sequence.
The JSON document, mildred.json:
{ "firstname": "Mildred", "lastname": "Moore", "age": 32, "address": { "street": "91 High Street", "town": "Biscester", "county": "Oxfordshire", "postcode": "OX6 3PD" }, "phone": [ { "type": "home", "number": "01869 378073" }, { "type": "mobile", "number": "07356 740756" } ] }
let $phoneArray := parse-json(unparsed-text("mildred.json"))("phone") for $n in map:keys($phoneArray) let $entry := $phoneArray($n) where $entry("type") = "mobile" return $entry("number")
declare function map:entries($map as map(*)) as map(*)* { for $k in map:keys($map) return map { "key" := $k, "value" := $map($k) } }; parse-json(unparsed-text("mildred.json")) ("phone")!map:entries(.)[.("value")("type") = "mobile"]("number")
Convert a JSON data file to XML.
The JSON document, employees.json:
{ "accounting" : [ { "firstName" : "John", "lastName" : "Doe", "age" : 23 }, { "firstName" : "Mary", "lastName" : "Smith", "age" : 32 } ], "sales" : [ { "firstName" : "Sally", "lastName" : "Green", "age" : 27 }, { "firstName" : "Jim", "lastName" : "Galley", "age" : 41 } ] }
<department name="accounting"> <employee> <firstName>John</firstName> <lastName>Doe</lastName> <age>23</age> </employee> <employee> <firstName>Mary</firstName> <lastName>Smith</lastName> <age>32</age> </employee> </department> <department name="sales"> <employee> <firstName>Sally</firstName> <lastName>Green</lastName> <age>27</age> </employee> <employee> <firstName>Jim</firstName> <lastName>Galley</lastName> <age>41</age> </employee> </department>
let $input := parse-json(unparsed-text('employees.json')) for $k in map:keys($input) return <department name="{$k}">{ let $array := $input($k) for $i in map:keys($array) let $emp := $array($i) return <employee> <firstName>{ $emp('firstName') }</firstName> <lastName>{ $emp('lastName') }</lastName> <age>{ $emp('age') }</age> </employee> }</department>
for $dept in pairs(json("employees.json")) return <department name="{ name($dept) }"> { for $employee in members(value($dept)) return <employee> <firstName>{ $employee('firstName') }</firstName> <lastName>{ $employee('lastName') }</lastName> <age>{ $employee('age') }</age> </employee> }</department>
<xsl:template name="main"> <xsl:variable name="input" as="map(xs:string, map(xs:string, xs:anyAtomicType)*)" select="parse-json(unparsed-text('employees.json'))"/> <xsl:for-each select="map:keys($input)"> <department name="{.}"> <xsl:for-each select="$input(.)"> <employee> <firstName><xsl:value-of select=".('firstName')"/></firstName> <lastName><xsl:value-of select=".('lastName')"/></lastName> <age><xsl:value-of select=".('age')"/></age> </employee> </xsl:for-each> </department> </xsl:for-each> </xsl:template>
Update the first name of the author "Dan Suciu" to "John" in the "bookinfo.json" document.
The JSON document, bookinfo.json:
{ "book": { "title": "Data on the Web", "year": 2000, "author": [ { "last": "Abiteboul", "first": "Serge" }, { "last": "Buneman", "first": "Peter" }, { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }
declare function local:map-transform($map as map(*)) { typeswitch($arg) case $map as map(*) return map:new(( for $k in map:keys($map) let $v := $map($k) return map { $k := local:map-transform($v) }, if($map('last')='Suciu') then map { 'first' := "John" } else () )) default $arg }; local:map-transform(parse-json(unparsed-text("bookinfo.json")))
Assuming a function map:entries() which returns the entries in a map as a sequence of singleton maps.
<xsl:template match="~map(*)" mode="john" as="map(*)"> <xsl:variable name="entries" as="map(*)*> <xsl:apply-templates select="map:entries(.)" mode="john"/> </xsl:variable> <xsl:sequence select="map:new($entries)"/> </xsl:template> <xsl:template match="~map(*)[.('last')='Suciu']" mode="john"> <xsl:sequence select="map:new((., map{'first':='John'}))"/> </xsl:template>
The following queries are based on a social media site that
allows users to interact with their friends.
collection("users")
contains data on users and their
friends:
{ "name" : "Sarah", "age" : 13, "gender" : "female", "friends" : [ "Jim", "Mary", "Jennifer"] } { "name" : "Jim", "age" : 13, "gender" : "male", "friends" : [ "Sarah" ] }
The following query performs a join on Sarah's friend list to return the Object representing each of her friends:
for $sarah in collection("users") $friend in collection("users") where $sarah("name") = "Sarah" and values($sarah("friends")) = $friend("name") return $friend
The query can be simplified using a filter. In the following
expression, [.("name") = "Sarah"]
is a filter that
restricts the set of users to the one named "Sarah":
let $sarah := collection("users")[.("name") eq "Sarah"] for $friend in values($sarah("friends")) return collection("users")[.("name") eq $friend]
Solution using the XSLT maps proposal: essentially the same as
the above, assuming (a) the existence of some mechanism similar to
collection()
to get a collection of JSON inputs and
parse them using the parse-json()
function, and (b)
the existence of a (potentially user-written) function
values()
to extract the values of the map representing
a JSON array. This function might be written:
<xsl:function name="values" as="item(*)"> <xsl:param name="array" as="map(xs:integer, item())"/> <xsl:for-each select="map:keys($array)"> <xsl:sequence select="$array(.)"/> </xsl:for-each> </xsl:function>
Note:
These queries are based on similar queries in the XQuery 3.0 Use Cases.
The input is a sequence (whose order is of no concern) that contains the following sales data, represented here in JSON notation:
{ "product" : "broiler", "store number" : 1, "quantity" : 20 }, { "product" : "toaster", "store number" : 2, "quantity" : 100 }, { "product" : "toaster", "store number" : 2, "quantity" : 50 }, { "product" : "toaster", "store number" : 3, "quantity" : 50 }, { "product" : "blender", "store number" : 3, "quantity" : 100 }, { "product" : "blender", "store number" : 3, "quantity" : 150 }, { "product" : "socks", "store number" : 1, "quantity" : 500 }, { "product" : "socks", "store number" : 2, "quantity" : 10 }, { "product" : "shirt", "store number" : 3, "quantity" : 10 }
We want to group sales by product, across stores.
We assume a function collection("sales") that returns a sequence of items representing the rows in this table.
Query:
{ for $sales in collection("sales") let $pname := $sales("product") group by $pname return $pname : sum(for $s in $sales return $s("quantity")) }
Solution using the XSLT maps proposal: assuming that collection("sales") delivers a sequence of unparsed JSON texts, and that the result is to be serialized as a JSON text:
<xsl:variable name="entries" as="map(xs:string, xs:integer)"> <xsl:for-each-group select="collection('sales')!parse-json(.)" group-by=".('product')"> <xsl:sequence select="map{ current-grouping-key() := sum(current-group()('quantity')) }"/> </xsl:for-each-group> </xsl:variable> <xsl:sequence select="serialize-json($entries)"/>
Now let's do a more complex grouping query, showing sales by category within each state. We need further data to describe the categories of products and the location of stores.
collection("products") contains the following data:
{ "name" : "broiler", "category" : "kitchen", "price" : 100, "cost" : 70 }, { "name" : "toaster", "category" : "kitchen", "price" : 30, "cost" : 10 }, { "name" : "blender", "category" : "kitchen", "price" : 50, "cost" : 25 }, { "name" : "socks", "category" : "clothes", "price" : 5, "cost" : 2 }, { "name" : "shirt", "category" : "clothes", "price" : 10, "cost" : 3 }
collection("stores") contains the following data:
{ "store number" : 1, "state" : CA }, { "store number" : 2, "state" : CA }, { "store number" : 3, "state" : MA }, { "store number" : 4, "state" : MA }
[ { "CA" : [ {"kitchen" : { "broiler" : 20, "toaster" : 150 }}, {"clothes" : { "socks" : 510 }} ] }, { "MA" : [ { "kitchen" : { "blender" : 250, "toaster" : 50 }}, { "clothes" : { "shirt" : 10 }} ] } ]
The following query groups by state, then by category, then lists individual products and the sales associated with each.
Query:
{ for $store in collection("stores") let $state := $store("state") group by $state return $state : { for $product in collection("products") let $category := $product("category") group by $category return $category : { for $sales in collection("sales") where $sales("store number") = $store("store number") and $sales("product") = $product("name") let $pname := $sales("product") group by $pname return $pname : sum( for $s in $sales return $s("quantity") ) } } }
An equivalent XSLT solution is given below. This uses the syntax of the proposed maps facility in XSLT.
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="map xs"> <xsl:output method="text"/> <xsl:variable name="sales" as="map(*)*" select=' map{ "product" := "broiler", "store number" := 1, "quantity" := 20 }, map{ "product" := "toaster", "store number" := 2, "quantity" := 100 }, map{ "product" := "toaster", "store number" := 2, "quantity" := 50 }, map{ "product" := "toaster", "store number" := 3, "quantity" := 50 }, map{ "product" := "blender", "store number" := 3, "quantity" := 100 }, map{ "product" := "blender", "store number" := 3, "quantity" := 150 }, map{ "product" := "socks", "store number" := 1, "quantity" := 500 }, map{ "product" := "socks", "store number" := 2, "quantity" := 10 }, map{ "product" := "shirt", "store number" := 3, "quantity" := 10 }'/> <xsl:variable name="products" as="map(*)*" select=' map{ "name" := "broiler", "category" := "kitchen", "price" := 100, "cost" := 70 }, map{ "name" := "toaster", "category" := "kitchen", "price" := 30, "cost" := 10 }, map{ "name" := "blender", "category" := "kitchen", "price" := 50, "cost" := 25 }, map{ "name" := "socks", "category" := "clothes", "price" := 5, "cost" := 2 }, map{ "name" := "shirt", "category" := "clothes", "price" := 10, "cost" := 3 }'/> <xsl:variable name="stores" as="map(*)*" select=' map{ "store number" := 1, "state" := "CA" }, map{ "store number" := 2, "state" := "CA" }, map{ "store number" := 3, "state" := "MA" }, map{ "store number" := 4, "state" := "MA" }'/> <xsl:template name="main"> <xsl:variable name="state-maps" as="map(*)*"> <xsl:for-each-group select="$stores" group-by=".('state')"> <xsl:variable name="state" select="current-grouping-key()" as="xs:string"/> <xsl:variable name="stores-in-state" select="current-group()!.('store number')" as="xs:integer*"/> <xsl:variable name="state-map-entry" as="map(*)*"> <xsl:for-each-group select="$products" group-by=".('category')"> <xsl:variable name="category" select="current-grouping-key()" as="xs:string"/> <xsl:variable name="products-in-category" select="current-group()" as="map(*)*"/> <xsl:variable name="totals-map" as="map(*)*"> <xsl:variable name="totals-map-entries" as="map(*)*"> <xsl:for-each select="$products-in-category"> <xsl:variable name="product-name" select=".('name')"/> <xsl:variable name="product-sales" select="$sales[.('product') = $product-name and .('store number') = $stores-in-state]"/> <xsl:if test="exists($product-sales)"> <xsl:sequence select="map{ $product-name := sum($product-sales!.('quantity')) }"/> </xsl:if> </xsl:for-each> </xsl:variable> <xsl:sequence select="map:new($totals-map-entries)"/> </xsl:variable> <xsl:sequence select="map{ $category := $totals-map }"/> </xsl:for-each-group> </xsl:variable> <xsl:sequence select=" map { $state := $state-map-entry }"/> </xsl:for-each-group> </xsl:variable> <xsl:value-of select="serialize-json($state-maps, map{ 'indent' := true()} )"/> </xsl:template> </xsl:stylesheet>
Note that this example appears to suffer badly from the lack of
composability between the XPath map{}
construct and
the XSLT xsl:for-each-group
instruction. For such use
cases, an XSLT instruction to construct maps could be a better
approach.
The following query takes satellite data, and summarizes which satellites are visible. The data for the query is a simplified version of a Stellarium file that contains this information.
{ "creator" : "Satellites plugin version 0.6.4", "satellites" : { "AAU CUBESAT" : { "tle1" : "1 27846U 03031G 10322.04074654 .00000056 00000-0 45693-4 0 8768", "visible" : false }, "AJISAI (EGS)" : { "tle1" : "1 16908U 86061A 10321.84797408 -.00000083 00000-0 10000-3 0 3696", "visible" : true }, "AKARI (ASTRO-F)" : { "tle1" : "1 28939U 06005A 10321.96319841 .00000176 00000-0 48808-4 0 4294", "visible" : true } } }
We want to query this data to return a summary that looks like this.
{ "visible" : [ "AJISAI (EGS)", "AKARI (ASTRO-F)" ], "invisible" : [ "AAU CUBESAT" ] }
The following is a JSONiq query that returns the desired result.
Query:
let $sats := json("satellites.json")("satellites") return { "visible" : [ for $sat in pairs($sats) where $sat("visible") return name($sat) ], "invisible" : [ for $sat in pairs($sats) where not($sat)("visible")) return name($sat) ] }
Equivalent using the XSLT maps proposal:
<xsl:variable name="sats" select="parse-json(unparsed-text('satellites.json'))('satellites')"/> <xsl:sequence select="map{ 'visible' := array(map:keys($sats)[$sats(.)('visible')]), 'invisible' := array(map:keys($sats)[$sats(.)('invisible')])}"/>
This assumes the existence of a (potentially user-written) function array() that takes a sequence and turns it into a map with consecutive integer keys:
<xsl:function name="array" as="map(xs:integer, item())"> <xsl:param name="seq" as="item()*"/> <xsl:sequence select="map:new(for $i in 1 to count($seq) return map{$i := $seq[$i]})"/> </xsl:function>
JSON programmers frequently need to convert XML to JSON. The following query is based on a Wikipedia XML export format, using data from the category "Origami". Here is an excerpt of this data:
<mediawiki> <siteinfo> <sitename>Wikipedia</sitename> <page> <title>Kawasaki's theorem</title> <id>14511776</id> <revision> <id>435519187</id> <timestamp>2011-06-21T20:08:56Z</timestamp> <contributor> <username>Some jerk on the Internet</username> <id>6636894</id> </contributor> !!! SNIP !!! <page> <title>Origami techniques</title> <id>193590</id> <revision> <id>447687387</id> <timestamp>2011-08-31T17:21:49Z</timestamp> <contributor> <username>Dmcq</username> <id>3784322</id> </contributor> !!! SNIP !!! <page> <title>Mathematics of paper folding</title> <id>232840</id> <revision> <id>440970828</id> <timestamp>2011-07-23T09:10:42Z</timestamp> <contributor> <username>Tabletop</username> <id>173687</id> </contributor>
[ { "title" : "Kawasaki's theorem", "id" : "14511776", "timestamp" : "2011-06-21T20:08:56Z", "authors" : ["Some jerk on the Internet" ] }, { "title" : "Origami techniques", "id" : "193590", "timestamp" : "2011-08-31T17:21:49Z", "authors" : ["Dmcq" ] }, { "title" : "Mathematics of paper folding", "id" : "232840", "timestamp" : "2011-07-23T09:10:42Z", "authors" : ["Tabletop" ] } ]
The following query converts this data to JSON:
Query:
[ for $page in doc("Wikipedia-Origami.xml")//page return { "title": string($page/title), "id" : string($page/id), "last updated" : string($page/revision[1]/timestamp), "authors" : [ for $a in $page/revision/contributor/username return string($a) ] } ]
Suppose a JavaScript implementation provides an interface for JSONiq queries, and a JavaScript program contains the following data [1]:
var data = { "color" : "blue", "closed" : true, "points" : [[10,10], [20,10], [20,20], [10,20]] };
This data can be converted to SVG by placing the text of a query in a JavaScript variable and calling the appropriate JavaScript function to invoke the query:
var query = "declare variable stroke := attribute stroke { color }; declare variable points := attribute points { points }; if (closed) then <svg><polygon>{ $stroke, $points }</polygon></svg> else <svg><polyline>{ $stroke, $points }</polyline></svg>"
This query can be invoked with a JavaScript API call:
jsoniq(data, query)
Here is the result of the above query:
<svg><polygon stroke="blue" points="10 10 20 10 20 20 10 20" /></svg>
The data in a JSON array is frequently displayed using HTML tables. The following query shows how to transform from the former to the latter.
The following Object contains the labels desired for columns and rows, as well as the data for the table.
{ "col labels" : ["singular", "plural"], "row labels" : ["1p", "2p", "3p"], "data" : [ ["spinne", "spinnen"], ["spinnst", "spinnt"], ["spinnt", "spinnen"] ] }
The following query creates an HTML table, using the column headings and row labels as well as the data in the Object shown above.
<html> <body> <table> <tr> (: Column headings :) { <th> </th>, for $th in values(json("table.json")("col labels")) return <th>{ $th }</th> } </tr> { (: Data for each row :) for $r at $i in values(json("table.json")("data")) return <tr> { <th>{ values(json("table.json")("row labels")[$i]) }</th>, for $c in $r return <td>{ $c }</td> } </tr> } </table> </body> </html>
XQuery provides support for both sliding windows and tumbling windows, frequently used to analyze event streams or other sequential data. This simple windowing example converts a sequence of items to a table with three columns (using as many rows as necessary), and assigns a row number to each row.
[ { "color" : "Green" }, { "color" : "Pink" }, { "color" : "Lilac" }, { "color" : "Turquoise" }, { "color" : "Peach" }, { "color" : "Opal" }, { "color" : "Champagne" } }
This example assumes a middleware system that presents relational tables as JSON arrays. The following two tables are used as sample data.
userid | firstname | lastname |
W0342 | Walter | Denisovich |
M0535 | Mick | Goulish |
The JSON representation this particular implementation provides for the above table looks like this:
[ { "userid" : "W0342", "firstname" : "Walter", "lastname" : "Denisovich" }, { "userid" : "M0535", "firstname" : "Mick", "lastname" : "Goulish" } ]
userid | ticker | shares |
W0342 | DIS | 153212312 |
M0535 | DIS | 10 |
M0535 | AIG | 23412 |
The JSON representation this particular implementation provides for the above table looks like this:
[ { "userid" : "W0342", "ticker" : "DIS", "shares" : 153212312 }, { "userid" : "M0535", "ticker" : "DIS", "shares" : 10 }, { "userid" : "M0535", "ticker" : "AIG", "shares" : 23412 } ]
The following query uses the fictitious vendor's
vendor:table()
function to retrieve the values from a
table, and creates an Object for each user, with a list of the
user's holdings in the value of that Object.
[ for $u in vendor:table("Users") order by $u("userid") return { "userid" : $u("userid"), "first" : $u("firstname"), "last" : $u("lastname"), "holdings" : [ for $h in vendor:table("Holdings") where $h("userid") = $u("userid") order by $h("ticker") return { "ticker" : $u("ticker"), "share" : $u("shares") } ] } ]
The XQuery Update Facility allows XML data to be updated. JSONiq provides updating functions to allow JSON to be updated.
Suppose an application receives an order that contains a credit card number, and needs to put the user on probation.
Data for an order:
{ "user" : "Deadbeat Jim", "credit card" : VISA 4111 1111 1111 1111, "product" : "lottery tickets", "quantity" : 243 }
collection("users") contains the data for each individual user:
{ "name" : "Deadbeat Jim", "address" : "1 E 161st St, Bronx, NY 10451", "risk tolerance" : "high" }
The following query adds "status" : "credit card
declined"
to the user's record.
let $dbj := collection("users")[ .("name") = "Deadbeat Jim" ] return json:insert-into($dbj, "status" : "credit card declined")
After the update is finished, the user's record looks like this:
{ "name" : "Deadbeat Jim", "address" : "1 E 161st St, Bronx, NY 10451", "status" : "credit card declined", "risk tolerance" : "high" }
Many applications need to modify data before forwarding it to another source. The XQuery Update Facility provides an expression called a tranform expression that can be used to create modified copies. The transform expression uses updating expressions to perform a transformation. JSONiq defines updating functions for JSON, which can be used in the XQuery transform expression.
Suppose an application make videos available using feeds from Youtube. The following data comes from one such feed:
{ "encoding" : "UTF-8", "feed" : { "author" : [ { "name" : { "$t" : "YouTube" }, "uri" : { "$t" : "http://www.youtube.com/" } } ], "category" : [ { "scheme" : "http://schemas.google.com/g/2005#kind", "term" : "http://gdata.youtube.com/schemas/2007#video" } ], "entry" : [ { "app$control" : { "yt$state" : { "$t" : "Syndication of this video was restricted by its owner.", "name" : "restricted", "reasonCode" : "limitedSyndication" } }, "author" : [ { "name" : { "$t" : "beyonceVEVO" }, "uri" : { "$t" : "http://gdata.youtube.com/feeds/api/users/beyoncevevo" } } ] !!! SNIP !!!
The following query creates a modified copy of the feed by removing all entries that restrict syndication.
let $feed := json("incoming.json") return copy $out := $feed modify for $entry in $out("feed")("entry") where $entry("app$control")("yt$state")("name") = "restricted" return json:delete($entry) return $out
This example is based on an example on Stefan Goessner's JSONT site (http://goessner.net/articles/jsont/).