9291 – wrong XQueryX tests - double UTF-8 encoding

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9291 - wrong XQueryX tests - double UTF-8 encoding

Summary: wrong XQueryX tests - double UTF-8 encoding

Status:	RESOLVED FIXED

Alias:	None

Product:	XML Query Test Suite
Classification:	Unclassified
Component:	XML Query Test Suite (show other bugs)
Version:	1.0.2
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Andrew Eisenberg
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:	http://zorba-xquery.com
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-03-22 15:40 UTC by Daniel Turcanu
Modified:	2010-05-07 14:45 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Daniel Turcanu 2010-03-22 15:40:04 UTC

The XQueryX tests that contain unicode characters have those characters encoded as UTF-8 twice. That is, one non-ascii character gets encoded on 4 bytes instead of 2.

The failing tests are:
XQueryX/EncodeURIfunc/K-EncodeURIfunc-4
XQueryX/EscapeHTMLURIFunc/K-EscapeHTMLURIFunc-5
XQueryX/Functions/AllStringFunc/AssDisassStringFunc/StringToCodepointFunc/fn-string-to-codepoints1args-4
XQueryX/Functions/AllStringFunc/EscapingFuncs/EncodeURIfunc/fn-encode-for-uri1args-2
XQueryX/Functions/AllStringFunc/EscapingFuncs/EscapeHTMLURIFunc/fn-escape-html-uri1args-2
XQueryX/Functions/AllStringFunc/EscapingFuncs/IRIToURIfunc/fn-iri-to-uri1args-2
XQueryX/StringToCodepointFunc/K-StringToCodepointFunc-12
XQueryX/StringToCodepointFunc/K-StringToCodepointFunc-19
XQueryX/StringToCodepointFunc/K-StringToCodepointFunc-20
XQueryX/StringToCodepointFunc/K-StringToCodepointFunc-21

The testing was performed using Zorba XQuery 1.1.

Comment 1 Andrew Eisenberg 2010-03-24 20:57:42 UTC

Sorry, but I am not seeing the problem that you describe.

I looked at the first test case you listed, K-EncodeURIfunc-4. The XQuery contains encode-for-URI("~bébé") ... I see the string literal as bytes 7E 62 C3 A9 62 C3 A9. The XQueryX that is generated is:

              <xqx:functionCallExpr>
                <xqx:functionName>encode-for-uri</xqx:functionName>
                <xqx:arguments>
                  <xqx:stringConstantExpr>
                    <xqx:value>~b&#233;b&#233;</xqx:value>
                  </xqx:stringConstantExpr>
                </xqx:arguments>
              </xqx:functionCallExpr>

The two-byte Unicode characters are being replaced by charRefs in the XQueryX that is generated.

Comment 2 Andrew Eisenberg 2010-05-03 22:04:12 UTC

Daniel, if I don't receive any further information from you, then I will have to close this bug report without making any changes.

Comment 3 Daniel Turcanu 2010-05-07 14:45:28 UTC

Ok, I just checked them all and they work fine. They must have been fixed in the latest XQTS.