This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Section: http://www.whatwg.org/specs/web-apps/current-work/#script-processing-encoding Comment: The spec says that @charset overrides the HTTP charset and doesn't allow examining the BOM Posted from: 173.48.34.3
Specifically, section 4.3.1 has this to say in step 5 of "running a script": If the script element has a charset attribute, then let the script block's character encoding for this script element be the encoding given by the charset attribute. Otherwise, let the script block's character encoding for this script element be the same as the encoding of the document itself. Then in step 9 we have: Once the resource's Content Type metadata is available, if it ever is, apply the algorithm for extracting an encoding from a Content-Type to it. If this returns an encoding, and the user agent supports that encoding, then let the script block's character encoding be that encoding. So far so good. But then step 10 says to fetch the script and execute the script block, and the "executing a script block" stuff has a "If the script is from an external file and the script block's type is a text-based language" section which says to examine the BOM and use the resulting value if a BOM is found, no matter what the HTTP headers said. That's certainly not what Gecko does. Does some other browser do this? Is it what we want to happen here?
Specifically, Gecko's order here is: 1) HTTP header 2) @charset 3) BOM 4) Linking document
The spec's order for scripts is: 1) BOM 2) HTTP header 3) charset="" 4) Linking document For documents, it is: 1) User 2) HTTP 3) BOM 4) <meta> 5) History 6) autodetect 7) default I could see an argument for having the script order be: 1) HTTP header 2) BOM 3) charset="" 4) Linking document ...but why would you put charset="" between the HTTP headers and the BOM?
> ...but why would you put charset="" between the HTTP headers and the BOM? Because it's more reliably in the presence of arbitrary encodings? In particular, two-byte encodings that are not UTF-16 could give false positives for the BOM. In any case, the main weirdness I see here is the BOM overriding the HTTP header (and the inconsistency of this with documents and stylesheets). What do non-Gecko UAs do here?
(For the record, I agree that the spec is wrong to have the BOM override HTTP metadata in this case, since it doesn't anywhere else.)
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Accepted Change Description: see diff given below Rationale: Upon further investigation I've gone with what Gecko does here.
Checked in as WHATWG revision r5545. Check-in comment: Match Gecko for character encoding processing for <script> http://html5.org/tools/web-apps-tracker?from=5544&to=5545