802 – double utf-8 bom yields in encoding errors in Validation result

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 802 - double utf-8 bom yields in encoding errors in Validation result

Summary: double utf-8 bom yields in encoding errors in Validation result

Status:	RESOLVED FIXED

Alias:	None

Product:	Validator
Classification:	Unclassified
Component:	check (show other bugs)
Version:	0.6.7
Hardware:	Other other

Importance:	P2 normal
Target Milestone:	---
Assignee:	Terje Bless
QA Contact:	qa-dev tracking

URL:	http://validator.w3.org/check?uri=htt...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2004-06-16 13:54 UTC by Bj
Modified:	2005-08-18 03:19 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Bj 2004-06-16 13:54:38 UTC

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.websitedev.de%2Fmarkup%
2Fvalidator%2Ftests%2Fdouble-utf-8-bom.html

Is invalid,

http://validator.w3.org/check?uri=http%3A%2F%2Fvalidator.w3.org%2Fcheck%3Furi%
3Dhttp%253A%252F%252Fwww.websitedev.de%252Fmarkup%252Fvalidator%252Ftests%
252Fdouble-utf-8-bom.html

  Sorry, I am unable to validate this document because on line 172 it
  contained one or more bytes that I cannot interpret as utf-8 (in other
  words, the bytes found are not valid values in the specified Character 
  Encoding). Please check both the content of the file and the character
  encoding indication.

Comment 1 Bj 2004-09-06 21:13:12 UTC

This is probably a duplicate of the bug that deals with using the UTF-8 flag 
for truncate_line() etc.

Comment 2 Ville Skyttä 2004-09-06 21:33:14 UTC

Offtopic, but BOM related: this might be interesting sometime:
http://search.cpan.org/dist/File-BOM/

Comment 3 Terje Bless 2004-09-11 13:15:29 UTC

One way to deal with this is to pass our complete output data through the UTF-8 checker (charlint),
possibly modified to tag illegal byte sequences and continue instead of croaking.

BTW, cf. Comment #1, I can't seem to find this bug you're refering to; care to provide a bug number?

Comment 4 Bj 2004-09-11 23:08:39 UTC

See the relevant comment in the source about Perl 5.8.x --- and I am not sure 
how your suggestion would help. The problem here is that the string is 
considered a byte string and thus substr etc. do not work as expected.

Comment 5 Bj 2005-08-18 03:19:59 UTC

Fixed in HEAD.