This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The fn:distinct-values function (Section 15.1.6) eliminates duplicates from an atomized sequence, based on comparing values by the "eq" operator. However, it says "Values that cannot be compared, i.e. the eq operator is not defined for their types, are considered to be distinct." This is problematic for the following reasons: (1) If incomparable values were actually compared by the "eq" operator, an error would result (for example, 7 eq "7" raises error XPTY0004.) (2) An "order by" clause also raises an error (XPTY0004) if it encounters incomparable sort keys. (3) The aggregation functions fn:avg, fn:min, fn:max, and fn:sum also raise an error (FORG0006) if they encounter incomparable sort keys. (4) Implementations of fn:distinct-values based on sorting or hashing are not possible under the current definition because they do not accept heterogeneous input sequences. In summary, the current specification of fn:distinct-values is inconsistent with the rest of the language and difficult to implement efficiently. The definition of fn:distinct-values should be made consistent with other functions and operators by raising an error if incomparable values are encountered. This will allow "order by" and fn:distinct-values to share a common efficient implementation. Proposal: In the definition of fn:distinct-values, replace the second sentence with the following: "If the input sequence contains any two values for which the eq operator is not defined, a type error is raised [err:FORG0006]." Also add an example: fn:distinct-values(1, 2.3, "Hello") raises err:FORG0006.
I actually attempted to implement the previous specification of distinct-values, when non-comparable values were considered an error, and I found it very difficult to achieve; I found the current specification much easier to implement (my implementation is based on hashing using a simple hash function based on both the value and the type label). So let's base the argument on what's right for users, not on implementation factors, which are likely to vary from one implementor to another. From a usability point of view, XML Schema supports union types, and the typed value of a collection of nodes can therefore contain a mixture of different atomic types. It seems to me a most unfriendly and unnecessary restriction to tell users that they can't invoke distinct-values() on a collection whose schema definition is a union type. Note also that although sorting in XQuery disallows mixed types, sorting and grouping in XSLT do not, so the consistency argument works both ways. Michael Kay
This was discussed during the joint WG meeting on 5/17/2005 and there was no consensus to make this change. Ashok Malhotra