This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://mimesniff.spec.whatwg.org/#determining-the-sniffed-media-type-of-a-resource The MIME Sniffing standard isn't clear on whether the media type sniffing algorithm should preserve parameters in the supplied media type when it returns a new media type. This is particularly relevant for the charset parameter, since the way it's written, the algorithm might throw away the charset (and other parameters) from the supplied media type if it returns a new media type.
As far as I can tell decoding is distinct from sniffing. Sniffing happens first. Then decoding happens and the decoding layer will take its own look at the HTTP headers that came with the resource. (Now obviously implementations can optimize this if desired.)
(In reply to comment #0) > http://mimesniff.spec.whatwg.org/#determining-the-sniffed-media-type-of-a- > resource > > The MIME Sniffing standard isn't clear on whether the media type sniffing > algorithm should preserve parameters in the supplied media type when it > returns a new media type. This is particularly relevant for the charset > parameter, since the way it's written, the algorithm might throw away the > charset (and other parameters) from the supplied media type if it returns a > new media type. Do you have a specific example where the charset parameter (or any other parameter) might continue to apply when the sniffed MIME type is not the same as the supplied MIME type? (I'm just trying to get a handle on the situation.)
(In reply to comment #2) > Do you have a specific example where the charset parameter (or any other > parameter) might continue to apply when the sniffed MIME type is not the > same as the supplied MIME type? (I'm just trying to get a handle on the > situation.) Assume that the supplied MIME type is "text/html;charset=windows-1252" and the no-sniff flag is not set. By step 4 of section 9, the algorithm for distinguishing if a resource is a feed or HTML is run. But that algorithm can return only "text/html" or "applications/rss+xml" (with no parameters) as the sniffed MIME type.
(In reply to comment #3) > (In reply to comment #2) > > Do you have a specific example where the charset parameter (or any other > > parameter) might continue to apply when the sniffed MIME type is not the > > same as the supplied MIME type? (I'm just trying to get a handle on the > > situation.) > > Assume that the supplied MIME type is "text/html;charset=windows-1252" and > the > no-sniff flag is not set. By step 4 of section 9, the algorithm for > distinguishing if a resource is a feed or HTML is run. But that algorithm > can return only "text/html" or "applications/rss+xml" (with no parameters) > as the sniffed MIME type. Ah, the algorithm should probably return the supplied MIME type instead of "text/html" in all cases where it appears. I'll have to update it. For the cases where it is actually an RSS or Atom file, don't those formats have other way to determine the charset? Do you think it would be worthwhile to preserve the charset parameter in those cases? Do you know of any other areas where the the charset parameter would be significant? Or of any other parameters that have to be handle specially? (Do only "text" MIME types make use of the charset parameter?)
I've fixed the specific issue with the rules for distinguishing if a resource is a feed or HTML: https://github.com/whatwg/mimesniff/commit/97dc9eb3f36f2fd3a10909327dbcc00fddfdb846
In reply to Comment 4: Suppose the supplied MIME type is "audio/ogg;codecs=vorbis" and the user agent supports that MIME type. Suppose further that the resource header starts with "OggS" followed by null. In this case, the sniffed MIME type returned will be "audio/ogg" with no parameters, even if the Ogg file doesn't use Vorbis audio.
This issue also occurs if the supplied MIME type is "*/*", "application/unknown", or "unknown/unknown", or if the supplied MIME type is an archive, image, audio, or video type. In all these cases, the parameters, if they have any, are discarded when a different MIME type is sniffed. (Special rules also apply to "text/plain" in certain cases, but since these rules apply only because of a bug in older versions of Apache Web Server, I'm not sure if the parameters should be overridden in these cases.)
(In reply to comment #7) > I'm not sure if the parameters should be overridden in these > cases.) I mean "I'm not sure if the supplied media type's parameters should be kept in those cases."
(In reply to comment #4) > > For the cases where it is actually an RSS or Atom file, don't those formats > have other way to determine the charset? Do you think it would be worthwhile > to preserve the charset parameter in those cases? > Both formats use a byte order mark and an encoding declaration to help XML processors determine the encoding. However, merely saying the sniffed MIME type is "application/rss+xml" or "application/atom+xml" with no parameters, regardless of what the supplied MIME type says, would violate RFC3023 (XML Media Types); see sections 3.6, 8.6, 8.7, and 8.8.
> regardless of what the supplied MIME type says I mean, "even though the supplied media type may have parameters of its own".
(In reply to comment #6) > In reply to Comment 4: > > Suppose the supplied MIME type is "audio/ogg;codecs=vorbis" and the user > agent supports that MIME type. Suppose further that the resource header > starts > with "OggS" followed by null. In this case, the sniffed MIME type returned > will > be "audio/ogg" with no parameters, even if the Ogg file doesn't use Vorbis > audio. Actually, I think it would return "application/ogg" (with no parameters). I recall inquiring about whether I should do any further Ogg parsing to determine more details and being told that the browser could do that after it had determined that it was an Ogg file. So I'm not sure what to do in this particular situation. (In reply to comment #7) > This issue also occurs if the supplied MIME type is "*/*", > "application/unknown", or "unknown/unknown", or if the supplied MIME type is > an archive, image, audio, or video type. In all these cases, the > parameters, if they have any, are discarded when a different MIME type is > sniffed. (Special rules also apply to "text/plain" in certain cases, but > since these rules apply only because of a bug in older versions of Apache > Web Server, I'm not sure if the supplied media type's parameters should be kept > in those cases.) I think ignoring parameters associated with an explicitly unknown MIME type is probably the right thing to do, as it is for those specific cases to workaround the Apache bug. The specific category types, though, are probably more debatable, but it's difficult to determine when keeping the parameters would be useful and when it would just be confusing. (Maybe only keep them if the MIME type portion is the same? It's hard to say.) (In reply to comment #9) > (In reply to comment #4) > > > > For the cases where it is actually an RSS or Atom file, don't those formats > > have other way to determine the charset? Do you think it would be worthwhile > > to preserve the charset parameter in those cases? > > > > Both formats use a byte order mark and an encoding declaration to help XML > processors determine the encoding. However, merely saying the sniffed MIME > type is "application/rss+xml" or "application/atom+xml" with no parameters, > even though the supplied media type may have parameters of its own, would > violate RFC3023 (XML Media Types); see sections 3.6, 8.6, 8.7, and 8.8. Ah, good point. I'll have to do more to handle the charset parameter all around. Is there anything else that I have to consider or worry about when dealing with parameters?
A lot of work has been done on these points. If there are any other unaddressed issues, please raise them as GitHub issues.