This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The XqueryX files appear to have incorrectly translated character references. eg Constr-elem-curlybr-4.xq has } which is encoded as <xqx:stringConstantExpr> <xqx:value>&#x7d;</xqx:value> </xqx:stringConstantExpr> which translates back with the stylesheet to &#x7d; The translator needs to encode character references by themselves, or indeed by the characters referenced. xq2xqx encodes this test file using <xqx:stringConstantExpr> <xqx:value>}</xqx:value> </xqx:stringConstantExpr> which does translate back to an equivalent query. (This affects several tests files)
still the same in 0.9.4 (this affects lots of files, any ones using & in the XQuery)
(In reply to comment #1) > still the same in 0.9.4 (this affects lots of files, any ones using & in the > XQuery) > still the same in the xqueryx.zip posted to public cvs today affects around 80 files as far as I can see Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals056.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals057.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals058.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals059.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals060.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals061.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/K-Literals-47.xq Queries/XQuery/Expressions/PrimaryExpr/Literals/K-Literals-49.xq Queries/XQuery/Expressions/Construct/DirectConElem/Constr-elem-curlybr-3.xq Queries/XQuery/Expressions/Construct/DirectConElem/Constr-elem-curlybr-4.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-3.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-4.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-ws-5.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemAttr/Constr-attr-charref-1.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-eol-3.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-eol-4.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemContent/Constr-cont-charref-1.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-1.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-2.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-3.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-genchref-4.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-1.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-2.xq Queries/XQuery/Expressions/Construct/DirectConElem/DirectConElemWhitespace/Constr-ws-adjchref-3.xq Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConPI/Constr-comppi-space-2.xq Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConPI/Constr-comppi-space-4.xq Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConComment/Constr-compcomment-dash-3.xq Queries/XQuery/Expressions/Construct/ComputeCon/ComputeConComment/Constr-compcomment-doubledash-3.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-005.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-006.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-007.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-008.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-009.xq Queries/XQuery/Expressions/PrologExpr/BoundarySpaceProlog/boundary-space-010.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-2.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-3.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-4.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-5.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-6.xq Queries/XQuery/Expressions/PrologExpr/BaseURIProlog/base-URI-18.xq Queries/XQuery/Expressions/PrologExpr/NamespaceProlog/namespaceDecl-23.xq Queries/XQuery/Expressions/PrologExpr/VariableProlog/InternalVariablesWithout/VarDecl009.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-9.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-10.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-13.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-16.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-17.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-18.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-21.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-22.xq Queries/XQuery/CodepointToStringFunc/K-CodepointToStringFunc-23.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-3.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-4.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-5.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode1args-6.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode2args-4.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/NormalizeUnicodeFunc/fn-normalize-unicode-1.xq Queries/XQuery/Functions/AllStringFunc/GeneralStringFunc/TranslateFunc/fn-translate3args-2.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates01.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates02.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates03.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates04.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates05.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates06.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates07.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates09.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates10.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates11.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates12.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates13.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates14.xq Queries/XQuery/Functions/AllStringFunc/Surrogates/surrogates15.xq Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/IRIToURIfunc/fn-iri-to-uri-18.xq Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/EscapeHTMLURIFunc/fn-escape-html-uri-20.xq Queries/XQuery/Functions/AllStringFunc/EscapingFuncs/EscapeHTMLURIFunc/fn-escape-html-uri-21.xq Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch04.xq Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch05.xq Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch06.xq Queries/XQuery/Functions/AllStringFunc/MatchStringFunc/MatchesFunc/caselessmatch07.xq Queries/XQuery/Functions/NodeSeqFunc/SeqDocFunc/fn-doc-1.xq
This is still not fixed in the XqueryX.zip 1.4 posted at the weekend. for example Literals061 encodes "€" as <xqx:value>&#8364;</xqx:value> rather than <xqx:value>€</xqx:value> David
Just to note that this problem (first reported on XQTS 0.8.0, and confirmed in 0.8.{2,4,6} and 0.9.{0,4} is still present in the current.zip file in CVS, even though we're hopefully getting close to 1.0). David
I believe that I have fixed this problem in the most recent version of the XQueryX stylesheet (see Bugzilla bug # 3446, http://www.w3.org/Bugs/Public/show_bug.cgi?id=3446) and the revised stylesheet posted at http://www.w3.org/2005/XQueryX/xqueryx.xsl contains the fix. Please try that out and let me know whether the problem persists.
(In reply to comment #5) > I believe that I have fixed this problem in the most recent version of the > XQueryX stylesheet (see Bugzilla bug # 3446, > http://www.w3.org/Bugs/Public/show_bug.cgi?id=3446) and the revised stylesheet > posted at http://www.w3.org/2005/XQueryX/xqueryx.xsl contains the fix. Please > try that out and let me know whether the problem persists. > The problem wasn't in the stylesheet, the problem was/is that the XQueryX files in the test suite are wrong. David
and just to confirm, the latest stylesheet just downloaded has no effect on these files. (Which is good as the existing stylesheet was working correctly in these cases) saxon Queries/XQueryX/Expressions/PrimaryExpr/Literals/Literals061.xqx xqueryx.xsl produces declare variable $input-context external ; "&#8364;" which is not equivalent to the xquery test cat Queries/XQuery/Expressions/PrimaryExpr/Literals/Literals061.xq (: Name: Literals061 :) (: Description: Test for string literal containing the character reference '€' which transaltes into the 'Euro' currency symbol :) (: insert-start :) declare variable $input-context external; (: insert-end :) "€"
I'm not sure why this bug has remained open for 9 months I wouldn't have thought it that hard to fix the xqueryx files in the distribution (running sed -e s/&/&/ over them all would do the job) However in case fixing the generator being used does prove difficult Perhaps I should repeat the standing offer that the xq2xml distribution contains a set of xqueryx versions of the test files, and is all distributed under the w3c software licence so you are welcome to use any of them, I was planning to wait until the XQTS 1.0 release before updating but have just updated today so there are currently 15038 xqueryx files available in http://monet.nag.co.uk/xq2xml/xqxtest-20060811.zip which include files that fix this problem as well as files that fix bug #3521 and fill in the gaps where the current test suite has no xqueryx file at all for some reason. If you want to drop some of these files into a 1.0 Test suite release feel free. David
The list in comment #1 was for XQTS 0.9.4, the list updated for the current.zip in cvs is the following 60 files: Literals061 K-Literals-47 Constr-elem-curlybr-3 Constr-elem-curlybr-4 Constr-cont-eol-3 Constr-cont-eol-4 Constr-cont-charref-1 Constr-ws-genchref-1 Constr-ws-genchref-2 Constr-ws-genchref-3 Constr-ws-genchref-4 Constr-ws-adjchref-1 Constr-ws-adjchref-2 Constr-ws-adjchref-3 Constr-comppi-space-2 Constr-comppi-space-4 Constr-compcomment-dash-3 Constr-compcomment-doubledash-3 boundary-space-005 boundary-space-006 boundary-space-007 boundary-space-008 boundary-space-009 boundary-space-010 K-CodepointToStringFunc-9 K-CodepointToStringFunc-10 K-CodepointToStringFunc-13 K-CodepointToStringFunc-16 K-CodepointToStringFunc-17 K-CodepointToStringFunc-18 K-CodepointToStringFunc-21 K-CodepointToStringFunc-22 K-CodepointToStringFunc-23 fn-normalize-unicode1args-3 fn-normalize-unicode1args-4 fn-normalize-unicode1args-5 fn-normalize-unicode1args-6 fn-normalize-unicode2args-4 fn-normalize-unicode-1 fn-translate3args-2 surrogates01 surrogates02 surrogates03 surrogates04 surrogates05 surrogates06 surrogates07 surrogates09 surrogates10 surrogates11 surrogates12 surrogates13 surrogates14 surrogates15 fn-escape-html-uri-20 fn-escape-html-uri-21 caselessmatch04 caselessmatch05 caselessmatch06 caselessmatch07
In addition to the files in comment #9 (which use references in strings) the following use them in attribute values Constr-attr-ws-3 Constr-attr-ws-4 Constr-attr-ws-5 Constr-attr-charref-1
I am back to working on XQueryX, and I am looking at this bug along with the others.. I'll also need to clarify how entity refs are handlded (possibly w/ Jim), since it's not clear what the behaviour should be. Thank you for the offer to use your XQueryX files - that's up to the XQTTF, but it seems to me that for consistency reason it would be better to have all files come from a single generator.
> since it's not clear what the behaviour should be. I do not see any ambiguity in the current spec, what ambiguity do you see? string literals in XQueryX should just encode the string in XML not the XML encoding of the XQuery encoding of the string. so the string of length 1 consisting of an ampersand is encoded as & not as &amp; The implementation in xqueryx.xsl also requires this, it is easy to check by running xqueryx.xsl on any of the xqueryx files listed in comment #9 that the resulting XQuery is not equivalent to the original XQuery file in the test suite. David
(In reply to comment #12) > > since it's not clear what the behaviour should be. > > I do not see any ambiguity in the current spec, what ambiguity do you see? > > string literals in XQueryX should just encode the string in XML not the XML > encoding of the XQuery encoding of the string. so the string of length 1 > consisting of an ampersand is encoded as & not as &amp; > The implementation in xqueryx.xsl also requires this, it is easy to check by > running xqueryx.xsl on any of the xqueryx files listed in comment #9 that the > resulting XQuery is not equivalent to the original XQuery file in the test > suite. > > David > Well, it seems that the resolution of entity references and character references is not clearly defined... XQueryX spec 3.1.1 says: "Each predefined entity reference is replaced by the character it represents when the string literal is processed." For example, take surrogates01.xq that you mention as one of the files having the problem. The XQueryX we currently generate for this includes the escaped & character: <xqx:stringConstantExpr> <xqx:value>abc&#x1D156;def</xqx:value> </xqx:stringConstantExpr> The stylesheet converts the XQueryX to: string-length("abc&#x1D156;def") Are string-length("abc𝅖def") and string-length("abc&#x1D156;def")? That would depend on the rules for resolving entity refs and character refs... at least one XQuery processor I tried resolves these two strings to the same string value, and returns the same answer for both these queries: 7. So what are the exact rules for the resolution of entity refs and character refs?
Also, you say "string literals in XQueryX should just encode the string in XML not the XML encoding of the XQuery encoding of the string. " So, for example, string-length("<"), should in fact be encoded as: <xqx:stringConstantExpr> <xqx:value><</xqx:value> </xqx:stringConstantExpr> So which characters should be replaced by entity refs when producing XQueryX? Seems that the stylesheet assumes ",',<,> should be replaced but not &?
> Well, it seems that the resolution of entity references and character > references is not clearly defined... I honestly am struggling to see any ambiguity in the current specification and so I'm not sure I can really answer your questions in a helpful way but I'll try. > "Each predefined entity reference is replaced by the character it represents > when the string literal is processed." There are five predefined entities, including amp and that means that & gets replaced by an ampersand character. The whole point of writing & rather than & is to _stop_ it being used as markup so it is absolutely clear that in XQuery as in XML &#1234;is the 7 characters 7 # 1 2 3 4 ; not a reference to the character with codepoint 1234. It would be absolutely bizare if Xquery were defined otherwise, as it would be using XML syntax with completely different semantics. > Are string-length("abc𝅖def") and string-length("abc&#x1D156;def")? > at least one XQuery processor I tried resolves these two strings to the same > string value, bugs happen, report it as a bug to that system's maintainers, That is unquestionably a bug. > So which characters should be replaced by entity refs when producing XQueryX? > Seems that the stylesheet assumes ",',<,> should be replaced but not &? as always when writing xml (or xml-like) syntax you just need to quote those characters that have special significance in XML, which includes &, and this is what the stylesheet does, see the template name="quote" which <xsl:with-param name="toBeReplaced">&</xsl:with-param> <xsl:with-param name="replacement">&amp;</xsl:with-param> David
> > "Each predefined entity reference is replaced by the character it represents > > when the string literal is processed." > > There are five predefined entities, including amp and that means that & > gets replaced by an ampersand character. The whole point of writing & > rather than & is to _stop_ it being used as markup so it is absolutely clear > that in XQuery as in XML > &#1234;is the 7 characters 7 # 1 2 3 4 ; not a reference to the character > with codepoint 1234. It would be absolutely bizare if Xquery were defined > otherwise, as it would be using XML syntax with completely different semantics. > > > Are string-length("abc𝅖def") and string-length("abc&#x1D156;def")? > > at least one XQuery processor I tried resolves these two strings to the same > > string value, > bugs happen, report it as a bug to that system's maintainers, That is > unquestionably a bug. >>> Ok. Well, I would interpret this in the >>> same way. My point, however, is that >>> this is not stated anywhere in the XQuery spec - that >>> what I mean by "ambiguity". >>> >>> I am validating w/ Jim whether this is the intended meaning.
I'm puzzled that you don't find the XQuery spec clear on the subject of how predefined entity references are handled. It seems eminently clear to me. There are three places they can occur: in string literals, in attribute content, and in element content. For string literals, section 3.1.1 spells out the rules and seems entirely clear. For attribute content, rule 1 says "Attribute value normalization is then applied to normalize whitespace and expand character references and predefined entity references. " This spells out the rules by reference to the XML specification (which describes the interaction of entity expansion and whitespace normalization): the rules are complicated, but I think they are unambiguous. For element content, section 3.7.1.3 rule 1b gives the rules by reference to the rules in 3.1.1 for string literals. So what exactly is it that you think isn't stated clearly in the XQuery specification? (You alleged that one implementation did double-expansion of entity references, turning &< into a less-than-sign. I think it's quite clear in the XQuery spec that processors mustn't do that. If you're in element content, for example, no possible reading of section 3.7.1.3 would allow that interpretation. In any case, as David Carlisle points out, common sense should give you the same answer: if an ampersand written as & were treated in the same way as one written as &, why would the specification bother to provide a way of escaping the character in the first place?) Michael Kay
(In reply to comment #17) Michael, Section 3.7.1.3 states: "Predefined entity references and character references are expanded into their referenced strings, as described in 3.1.1 Literals." And section 3.1.1 states: "Each predefined entity reference is replaced by the character it represents when the string literal is processed." It doesn't say anything about how character refs are processed (as far as I can see), but does give some example of string value with character refs. Given these descriptions, one possible algorithm, for example, is to process a string by first applying all entity ref replacements, and then all the character reference replacements on the resulting string. Which is what at least one processors I tried appears to do. But yes, I agree with the common-sense interpretation David gives. > I'm puzzled that you don't find the XQuery spec clear on the subject of how > predefined entity references are handled. It seems eminently clear to me. > > There are three places they can occur: in string literals, in attribute > content, and in element content. > > For string literals, section 3.1.1 spells out the rules and seems entirely > clear. > > For attribute content, rule 1 says "Attribute value normalization is then > applied to normalize whitespace and expand character references and predefined > entity references. " This spells out the rules by reference to the XML > specification (which describes the interaction of entity expansion and > whitespace normalization): the rules are complicated, but I think they are > unambiguous. > > For element content, section 3.7.1.3 rule 1b gives the rules by reference to > the rules in 3.1.1 for string literals. > > So what exactly is it that you think isn't stated clearly in the XQuery > specification? > > (You alleged that one implementation did double-expansion of entity references, > turning &< into a less-than-sign. I think it's quite clear in the XQuery > spec that processors mustn't do that. If you're in element content, for > example, no possible reading of section 3.7.1.3 would allow that > interpretation. In any case, as David Carlisle points out, common sense should > give you the same answer: if an ampersand written as & were treated in the > same way as one written as &, why would the specification bother to provide a > way of escaping the character in the first place?) > > Michael Kay >
(In reply to comment #8) > I'm not sure why this bug has remained open for 9 months I wouldn't have > thought it that hard to fix the xqueryx files in the distribution (running > sed -e s/&/&/ > over them all would do the job) > > However in case fixing the generator being used does prove difficult Perhaps I > should repeat the standing offer that the xq2xml distribution contains a set of > xqueryx versions of the test files, and is all distributed under the w3c > software licence so you are welcome to use any of them, I was planning to wait > until the XQTS 1.0 release before updating but have just updated today so there > are currently 15038 xqueryx files available in > http://monet.nag.co.uk/xq2xml/xqxtest-20060811.zip > which include files that fix this problem as well as files that fix bug #3521 > and fill in the gaps where the current test suite has no xqueryx file at all > for some reason. If you want to drop some of these files into a 1.0 Test suite > release feel free. > > David > David, it looks like in certain cases the XQueryX implementation should escape &. For example: <!--<?&-< ><![CDATA[x]]>--> Is currently correctly encoded as: <xqx:value><?&-<&#x20;><![CDATA[x]]></xqx:value> So, it doesn't seem to be a blind replace of & with & as suggested above (I am, btw, actually fixing the generator rather then modifying the queries)... right?
(In reply to comment #18) > Given these descriptions, one possible algorithm, for example, is to process a > string > by first applying all entity ref replacements, and then all the character > reference replacements on the resulting string. Which is what at least > one processors I tried appears to do. The spec does not explictly say that algorithm is not used, but it can not list all possible non-used algorithm. if such double parsing were to be used the string "&" would, like the string "&" be a syntax error (unterminated reference, element content of <a/> would generate an element node, etc. There is no way that the spec can be interepreted in that way. David
(In reply to comment #19) > David, it looks like in certain cases the XQueryX implementation > should escape &. > .. > So, it doesn't seem to be a blind replace of & with & as suggested above I'd assumed that your convertor was always double escaping and so removing one level would fix it (it certainly fixes most) if your convertor is sometimes double escaping and sometimes not, them clearly you only need to remove the double escaping at those places where it was added. > > For example: > > <!--<?&-< ><![CDATA[x]]>--> > > Is currently correctly encoded as: > > <xqx:value><?&-<&#x20;><![CDATA[x]]></xqx:value> > that encoding is incorrect. Given the XQuery <!--<?&-< ><![CDATA[x]]>--> xq2xqx produces <xqx:module xmlns:xqx="http://www.w3.org/2005/XQueryX"> <xqx:mainModule> <xqx:queryBody> <xqx:computedCommentConstructor> <xqx:argExpr> <xqx:stringConstantExpr> <xqx:value><?&-&lt;&#x20;><![CDATA[x]]></xqx:value> </xqx:stringConstantExpr> </xqx:argExpr> </xqx:computedCommentConstructor> </xqx:queryBody> </xqx:mainModule> </xqx:module> which when processed with the standard stylesheet produces comment{"<?&-&lt;&#x20;><![CDATA[x]]>"} which is an equivalent query, both produce the XML <!--<?&-< ><![CDATA[x]]>--> If however the xqx:value-of element is replaced by the element that you suggested, then the standard xqueryx stylesheet produces comment{"<?&-<&#x20;><![CDATA[x]]>"} which is not an equivalent query, when executed it produces <!--<?&-< ><![CDATA[x]]>--> which is an entirely different XML comment. David
(In reply to comment #21) > > For example: > > > > <!--<?&-< ><![CDATA[x]]>--> Note also that this thread is about character and entity references, but that example does not have any character or entity references (just as it does not have any PI constructor or CDATA section) so is not really an example of anything discussed here.
(In reply to comment #21) > (In reply to comment #19) > > > David, it looks like in certain cases the XQueryX implementation > > should escape &. > > .. > > So, it doesn't seem to be a blind replace of & with & as suggested above > > I'd assumed that your convertor was always double escaping and so removing one > level would fix it (it certainly fixes most) if your convertor is sometimes > double escaping and sometimes not, them clearly you only need to remove the > double escaping at those places where it was added. > > >>> I see. Well, the convertor is not "sometimes double escaping and >>> sometimes not". >>> It is always replacing "&" with "&" (which, I agree, >>> is likely not correct >>> given the common-sense interpretation of how entity/character refs >>> should be resolved). >>> >>> As far as the query - that's a bug.. I copied the text >>> from the wrong query yesterday. Obviously the "&" before "lt;" is >>> "&" in the current encoding, because all "&" are replaced with "&": >>> >>> <?&-&lt;&#x20;><![CDATA[x]]>
(In reply to comment #20) > (In reply to comment #18) > > > Given these descriptions, one possible algorithm, for example, is to process a > > string > > by first applying all entity ref replacements, and then all the character > > reference replacements on the resulting string. Which is what at least > > one processors I tried appears to do. > > > The spec does not explictly say that algorithm is not used, but it can not list > all possible non-used algorithm. if such double parsing were to be used the > string "&" would, like the string "&" be a syntax error (unterminated > reference, element content of <a/> would generate an element node, etc. > There is no way that the spec can be interepreted in that way. > > David >>> Not necessarily. I think you're assuming the >>> XQuery rules would apply after the second pass of such algorithm, >>> but that doesn't have to be the case. >>> >>> I am not asking XQuery to like all possible non-used algorithm. >>> It could, very easily and precisely, however, give the *one* >>> algorithm to be used. Which would eliminate potential ambiguities. >
(In reply to comment #22) > (In reply to comment #21) > > > > For example: > > > > > > <!--<?&-< ><![CDATA[x]]>--> > > Note also that this thread is about character and entity references, but that > example does not have any character or entity references (just as it does not > have any PI constructor or CDATA section) so is not really an example of > anything discussed here. > I think this is definitely relevant. This is about encoding "&" correctly, with the entity reference &. Moreover, I was commenting on the suggestion to " fix the xqueryx files in the distribution (running sed -e s/&/&/ over them all would do the job)", which would not be correct given what the current generator outputs. In any case, thanks for the feedback... I am making a tentative modification to the generator based on the common-sense interpretation of entity/character ref processing, but will await Jim's confirmation before committing.
> In any case, thanks for the feedback... I am making a tentative > modification to the generator > based on the common-sense interpretation of entity/character > ref processing, but will await Jim's confirmation before committing. > I trust that this will be done _before_ any update to XQTS is made. It would be unreasonable to ask any implementors to do any CR testing of an XqueryX implementation before this is fixed.