This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
A few files in ExpectedTestResults start out with "Byte Order Mark" U+FEFF. For example: ExpectedTestResults/Expressions/FLWORExpr/ForExprType/ForExprType056.xml I'm no Unicode expert, but this seems wrong. Byte Order Mark only makes sense in UTF-16 to distinguish big-endian from little-endian. It does not belong in UTF-8, which these files are. Of course it's easy to just ignore the initial Byte Order mark, but I wanted to at least put this bug report on the record. ForExprType036.xml ForExprType055.xml ForExprType056.xml ForExprType058.xml ForExprType059.txt ForExprType060.txt ForExprType062.xml
The XML 1.0 specification (section 4.3.3) states that entities encoded in UTF-8 may begin with a byte order mark and that XML parsers must handle this. However, this was a late change to the spec and many older parsers don't like it. I'd suggest you find a more recent parser.
(In reply to comment #1) > The XML 1.0 specification (section 4.3.3) states that entities encoded in UTF-8 > may begin with a byte order mark and that XML parsers must handle this. > However, this was a late change to the spec and many older parsers don't like > it. I'd suggest you find a more recent parser. These output files are "Fragment", so they're not even supposed to be valid XML. For example ForExprType059.txt is a lone processing-instruction. So the framework has to treat the expected file as an external parsed entity, which is another useless complication. There is the same argument as I med for bug #3756: If the testsuite can use raw textual comparison between the actual output and thex epected output, then the testsuite is more robust. Anything that requires parsing or messaging the expected output file is another opportunity for not cathcing a bug. This is not a big deal, since it's trivial to work around, but I think it is a blemish (at least) in the testsuite.
I think Per is correct that the BOM here is incompatible with the fragment comparison description which requires that the expected result files are inserted into element content to be made well formed before being parsed. Actually I notice that the guidelines say > For XML fragments, the same root node must be created for both which isn't very clear, but given the context I think an element node is meant rather than a document node. Certainly that's what I do. I hadn't noticed these BOM as I read the file using xslt's unparsed-text function which uses xml's algorithm, and so swallows the BOM, before adding a start and end tag and parsing the result, but I don't think that that is required by the guidelines as written, and the BOM should be removed from expected results to be read using the fragment comparison. David
I lean towards Per's suggestion. If it was a query file or an XML file it would have been a different thing, but in this case it's a fairly exotic file with special treatment. So I would prefer to see the BOMs removed.
All: Removed the offending "BOM" characters. Thanks, Carmelo