Copyright © 2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
Internet Media Types are an important part of the Web architecture. This finding discusses three aspects of Internet Media Types: registration by W3C Working Groups, consistency between Internet Media Type and content, and consistency in the communication of character encoding information.
This document has been produced by the W3C Technical Architecture Group (TAG). This version includes changes that have not yet been approved by the TAG regarding (1) registration requirements and (2) charset header information.
The TAG approved the previous draft of this finding at its 3 June 2002 teleconference. The TAG originally reached consensus on this issue at its 28 Jan 2002 teleconference, and after its 20 May 2002 teleconference announced to www-tag. The TAG notes that Tantek Çelik expressed dissent about this finding. At their 16 Dec 2002 teleconference, the TAG agreed to add a publication date to this document, consistent with the TAG's expectation that findings no longer be modified in place.
These findings were derived from discussion of TAG issues w3cMediaType-1, customMediaType-2, and nsMediaType-3 but in some cases extend beyond the specifics of the issue that was raised.
Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with RFC 2119 [RFC2119].
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
W3C Working Groups engaged in defining a language SHOULD arrange for the registration of an Internet Media Type (defined in RFC 2046 [RFC2046]) for that language; see [IANAREG] for registration instructions. The IETF registration forms MUST be available for review along with the specification no later than Candidate Recommendation (or at last call if the Working Group expects to advance directly to Proposed Recommendation). The IETF registration forms SHOULD be available for review no later than last call.
The conventions and framework established by RFC 3023 [RFC3023] SHOULD be followed when registering an Internet Media Type for a language that uses XML syntax.
The architecture of the Web depends on applications making dispatching and security decisions for resources based on their Internet Media Types and other MIME headers. It is a serious error for the response body to be inconsistent with the assertions made about it by the MIME headers. Web software SHOULD NOT attempt to recover from such errors by guessing, but SHOULD report the error to the user to allow intelligent corrective action.
An example of incorrect and dangerous behavior is a
user-agent that reads some part of the body of a response and decides to
treat it as HTML based on its containing a <!DOCTYPE
declaration or <title>
tag, when it was served as
text/plain
or some other non-HTML type.
Examples of such inconsistencies that have been observed on the Web include:
charset
parameter in the message
headers. See SVG diagram for
determining character encoding.The first example in the preceding section is a particularly troublesome case. Section 7.1 of [RFC3023] states:
The use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity.
and states that when used it is always authoritative. However, a receiving application can, with very high reliability, determine the encoding of an XML document by reading it, without reference to any external headers and this is reflected by RFC 3023 in the following sections:
Thus there is no ambiguity when the charset is omitted, and the STRONGLY RECOMMENDED injunction to use the charset is misplaced for application/xml and for non-text "+xml" types. Consequently, for XML representations, server-side applications SHOULD only supply a charset header when there is complete certainty as to the encoding in use. Otherwise, an error will cause a perfectly usable representation to be rejected by an architecturally sound client.
We recommend that section 7.1 of [RFC3023] be amended to something like the following:
The use of the charset parameter, when the charset is reliably known and agrees with the encoding declaration, is RECOMMENDED, since this information can be used by non-XML processors to determine authoritatively the charset of the XML MIME entity.
Last modified: $Date: 2002/12/17 13:06:11 $ by $Author: ijacobs $. $Revision: 1.33 $