This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The input file for this test is in UTF-16 encoding, and the expected test results can only be achieved if the processor is able to infer that the encoding is UTF-16. There is nothing in the spec that guarantees the processor will be able to make this inference. If the processor doesn't spot that the file is in UTF-16, it will assume UTF-8, and fail with a decoding error.
Also affects test cases -047, -051, -052 in the same test set. (Note that while the processor MAY take account of a BOM in inferring the encoding (under the "implementation-defined heuristics" provision), it is not required to do so.)
We could get around this by including the encoding as an argument in the function. The spec allows this as an alternative. I can make the change if it is ok with Mike and Tim?
Yes, we could include the encoding as an explicit argument. The only question is whether that would spoil the intent of the test. The alternative might be to have two alternate results, one for processors that correctly infer the encoding using implementation-defined heuristics, and one for processors that fall back to UTF-8. (I'm not sure if the spec allows for the possibility that implementation-defined heuristics will be invoked and will give the wrong answer...)
I suspect that I should have included encoding attributes for each of the resource elements. Would that make the reported problem go away?
Adding encoding and/or media-type attributes might provide a solution. It's not clear however that implementations are obliged to provide an API that allows the application (in this case the test driver) to supply the media-type or encoding of a resource in this way, so it's not the whole answer. I would be inclined to add this information, and still add an alternative result for implementations that use the fallback encoding.
Agreed. I was about to quote the text from the XML spec, but of course this is plain text. There might be an argument for requiring similar behaviour for plain text as with XML regarding utf-8 and utf-16, but as the spec stands, what you propose is correct.
I've attempted a fix. Please mark as CLOSED if you agree with the resolution. Otherwise, REOPEN.