This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Test case K2-StringLT-1 contains the comparison of two large codepoints. I generate the following XQueryX for this test case: <?xml version="1.0"?> <xqx:module xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/XQueryX http://www.w3.org/2005/XQueryX/xqueryx.xsd"> <xqx:mainModule> <xqx:queryBody> <xqx:ltOp> <xqx:firstOperand> <xqx:stringConstantExpr> <xqx:value></xqx:value> </xqx:stringConstantExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:stringConstantExpr> <xqx:value>�</xqx:value> </xqx:stringConstantExpr> </xqx:secondOperand> </xqx:ltOp> </xqx:queryBody> </xqx:mainModule> </xqx:module> When I attempt to validate this XQueryX, I see this error: Character reference "�" is an invalid XML character. I'm weak on the details of Unicode. I believe that character � is �. I see the following in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt: D800;<Non Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;; DB7F;<Non Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;; Perhaps you could change � to some other character. I've experimented a bit, and 휀 validates just fine.
I think the translation of the query into XQueryX was done incorrectly. From looking at the file at the octet level, the first operand is the octet sequence ee a9 a0, the second is f0 91 85 b0. These are the UTF-8 representations of the characters with codepoints (decimal) 60000 and 70000 respectively. Codepoint 70000 will be represented in UTF-16 as a surrogate pair, and it looks as if your translation has taken the first 16 bits of the surrogate pair as representing the entire character.
Mike, your comment helped me pinpoint the bug in the XQueryX generation. I agree that the test case is correct as it is.