This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The specification contains no formal definition of the Unicode codepoint collation http://www.w3.org/2005/xpath-functions/collation/codepoint A suitable definition might be: declare function compare-seq($x as xs:integer*, $y as xs:integer*) as xs:integer { if (count($x) eq 0 or count($y) eq 0) then if (count($x) eq 0 and count($y) eq 0) then 0 else if (count($x) eq 0) then -1 else +1 else if ($x[1] eq $y[1]) then compare-seq(remove($x, 1), remove($y, 1)) else if ($x[1] lt $y[1]) then -1 else +1 } and then compare($X as xs:string, $Y as xs:string) under the Unicode codepoint collation is defined to have the result compare-seq(string-to-codepoints($X), string-to-codepoints($Y)). Problem raised by Patrick Durusau (patrick at durusau dot net) on public-qt-comments, 2 Sept 2009.
The following proposal was accepted by the WG on 2009-09-29 ACTION A-411-02: MK will produce a textual proposal for resolving Bugzilla #7630 (definition of the Unicode codepoint collation). For the 1.0/2.0 specification: Add a new paragraph after the current fourth paragraph of F+O section 7.3.1 The Unicode codepoint collation does not perform any normalization on the supplied strings. It is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows: * If both sequences are empty, the strings are equal * If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string * If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B. * If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B. * Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:subsequence($A, 2) and fn:subsequence($B, 2) For the 1.1/2.1 specification: Use the same rules, but create a new section containing the definition of the Unicode codepoint collation and refer to this section from the appropriate places; and make "Unicode codepoint collation" a defined term, hyperlinking all references to it.
I note that the agreed change has been made to the 3.0 draft, but the change for the 1.0/2.0 specification does not appear in the published second edition. I have therefore added a reference to this bug to the list of candidate errata (in the xsl-query-specs CVS area), and am herewith closing the bug.