10656 – The spec says that for scripts the BOM overrides the HTTP charset

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10656 - The spec says that for scripts the BOM overrides the HTTP charset

Summary: The spec says that for scripts the BOM overrides the HTTP charset

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-09-20 03:28 UTC by contributor
Modified:	2010-10-04 13:54 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description contributor 2010-09-20 03:28:56 UTC

Section: http://www.whatwg.org/specs/web-apps/current-work/#script-processing-encoding

Comment:
The spec says that @charset overrides the HTTP charset and doesn't allow
examining the BOM

Posted from: 173.48.34.3

Comment 1 Boris Zbarsky 2010-09-20 03:34:37 UTC

Specifically, section 4.3.1 has this to say in step 5 of "running a script":

  If the script element has a charset attribute, then let the script block's
  character encoding for this script element be the encoding given by the
  charset attribute.

  Otherwise, let the script block's character encoding for this script element
  be the same as the encoding of the document itself.

Then in step 9 we have:

  Once the resource's Content Type metadata is available, if it ever is, apply
  the algorithm for extracting an encoding from a Content-Type to it. If this
  returns an encoding, and the user agent supports that encoding, then let the
  script block's character encoding be that encoding.

So far so good.  But then step 10 says to fetch the script and execute the script block, and the "executing a script block" stuff has a "If the script is from an external file and the script block's type is a text-based language" section which says to examine the BOM and use the resulting value if a BOM is found, no matter what the HTTP headers said.
That's certainly not what Gecko does.  Does some other browser do this?  Is it what we want to happen here?

Comment 2 Boris Zbarsky 2010-09-20 03:35:56 UTC

Specifically, Gecko's order here is:

1)  HTTP header
2)  @charset
3)  BOM
4)  Linking document

Comment 3 Ian 'Hixie' Hickson 2010-09-26 01:08:08 UTC

The spec's order for scripts is:

1) BOM
2) HTTP header
3) charset=""
4) Linking document

For documents, it is:

1) User
2) HTTP
3) BOM
4) <meta>
5) History
6) autodetect
7) default

I could see an argument for having the script order be:

1) HTTP header
2) BOM
3) charset=""
4) Linking document

...but why would you put charset="" between the HTTP headers and the BOM?

Comment 4 Boris Zbarsky 2010-09-26 19:33:10 UTC

> ...but why would you put charset="" between the HTTP headers and the BOM?

Because it's more reliably in the presence of arbitrary encodings?  In particular, two-byte encodings that are not UTF-16 could give false positives for the BOM.

In any case, the main weirdness I see here is the BOM overriding the HTTP header (and the inconsistency of this with documents and stylesheets).

What do non-Gecko UAs do here?

Comment 5 Ian 'Hixie' Hickson 2010-09-29 00:34:19 UTC

(For the record, I agree that the spec is wrong to have the BOM override HTTP metadata in this case, since it doesn't anywhere else.)

Comment 6 Ian 'Hixie' Hickson 2010-09-29 01:04:04 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Upon further investigation I've gone with what Gecko does here.

Comment 7 contributor 2010-09-29 01:05:13 UTC

Checked in as WHATWG revision r5545.
Check-in comment: Match Gecko for character encoding processing for <script>
http://html5.org/tools/web-apps-tracker?from=5544&to=5545