This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
There seems to be a bug in the new XML parser. It doesn't recognize some Japanese encodings other than UTF, such as Shift_JIS, EUC-JP. Try validating http://www.mitsue.co.jp/ , you'll see some XML errors. But try saving the page in an XML format (mitsue.xml) and opening it in Firefox and Internet Explorer, I got no such errors. Rewrite the source substituting "shift_jis" for "UTF-8" and it will validate. Thus, the validator seems to have some encoding detection and handling issues. There are so many webpages with Shift_JIS or EUC-JP or whatever non-UTF. I'm afraid that launching the new validator without fixing that issue would cause serious confusion in Japanese market.
Nice catch Masataka, thanks a lot. I found out that the problem was with <?xml version="1.0" encoding="Shift_JIS"?> which causes the XML parser to read the XML content as shift-jis, even though the validator systematically transcodes everything to UTF-8 without passing it to the different parsers. I'm looking at whether I can tell the XML parser to ignore the encoding="..." or whether I should be rewriting the value to be UTF-8.
Fixed with a regexp, which should cover pretty much all reasonable cases. http://lists.w3.org/Archives/Public/www-validator-cvs/2007Jul/0101.html