This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The results for Literals057/058 should use Fragment comparator rather than Text comparator, since the reference results contain XML entity/character references. Similarly for Literals062/063/064/065 and 068/069
Mike, see http://www.w3.org/Bugs/Public/show_bug.cgi?id=2402 which is the same issue (except it was raised on 0.7 so it is phrased in reverse:-) In my own harness I have just decided to treat Text as Fragment always. (I don't see any occasions when it is safe to do as the documentation advises and test the result of an XML serialisation using byte comparison: any character could be arbitrarily serialised using a character ref, which would fail a byte-comparison test), although actually I never serialise the result at all, and do xpath comparison of the result trees. David
With the new requirement that all results be serialized using the XML serialization option, the 'Fragment' and 'Text' comparators become essentially equivalent. We have retained this distinction in the catalog to try and indicate to the user that a scalar value is being returned ('Text' comparator) as opposed to some sequence of XML nodes. I believe the results as they stand in 0.8 are correct though right?
I wasn't aware of a change in this area: and it doesn't seem to have been reflected in the documentation. The "Guidelines for running tests" say that compare="Text" means the results should be compared using "byte comparison", which means that you should report a fail if your results are ["] and the expected results are ["]. I agree it makes sense to get rid of the distinction between Fragment and Text, but such changes need to be properly documented and announced. I've reopened the bug to allow the documentation to be fixed. Michael Kay
The results still need to be compared using 'byte comparisson' - it's the way that the results of the query are serialized that have changed. Previously, we said that the user should use 'XML serialization' for 'Fragment' and 'XML' comparators and 'text serialization' for 'Text'. But, it turns out that the only required serialization method in XQuery is 'XML serialization', so we can't rely on implementations having any form of 'text serialization'. This should already be spelled out in the 0.8 draft where we specify the serialzation settings used to generate the results. For the scalar results (i.e. those where the verifier is 'Text'), this means that they should serialize their results as though they were a top-level XML text node. The only changes this makes to the stored results is that the special XML characters (i.e. <, >, &, " and ') are serialized in their entitized form (i.e. <, >, &, " and ' respectively). The de- entitization of these characters is a side-effect of the 'text serialzation', so should not be performed when using 'XML serialization'. In your case, if you used a fragment verifier with a scalar result whose expected value was '<' - adding XML elements around this would give you <container><</container> which is invalid XML (the '<' must be entitized), so we really need to store these characters in their entitized forms.
Given a query whose result sequence is a text node containing a single character, namely a double quotation mark, there are at least four legal outputs of the XML serialization method: " " " % I would expect most processors to use the first, but the reference results use the second and this is also legal. Since serialization using the XML output method may produce any of these forms and they are all equivalent, comparison of the serialized results byte-for-byte is clearly not an option.
I was under the impression that there was a single, well-defined method for storing these special characters in an XML test node. Apparently, this is not the case and we allow multiple different options. I have updated the test catalog with multiple results for these cases, to handle all the different serialization options.
(In reply to comment #6) > I was under the impression that there was a single, well-defined method for > storing these special characters in an XML test node. Apparently, this is not > the case and we allow multiple different options. I have updated the test > catalog with multiple results for these cases, to handle all the different > serialization options. No, sorry that is not enough. It is not just "special characters" it is _all_ characters. When the result contains a character such as "1" the Xquery engine is allowed, using the XML serialisation, to serialise it as "1" or "I" or "1" or anything else that will parse to give a character 1. So it is _never_ safe to do a byte comprison of the XML serialisation with the supplied file. You always need to parse the supplied result file as XML and then serialise it using exactly the same canonical serialisation you are using for your result, and then compare. In other words you need to follow the documentation given for the Fragment comparison, not the documentation for the Text comparison. It would be simpler just to globally replace "Text" by "Fragment" in the catalog file. David
You are correct - we need to update the test execution guidelines to state that XML canonicalization needs to be applied to values that use the 'Text' comparator too. There is still a valid distinction between 'Text' and 'Fragment' in the test suite though - 'Text' comparator results represent scalar values while 'Fragment' represents XML or sequences of XML. While the test harnesses may choose to implement 'Text' and 'Fragment' with the same semantics (i.e. add container elements and canonicalize for both), we would still like to retain this distinction between the different types of results in the catalog.