This document:Public document·View comments·Disposition of Comments·
Nearby:Mobile Web Best Practices Working Group Other specs in this tool Mobile Web Best Practices Working Group's Issue tracker
Quick access to LC-1995 LC-1996 LC-1997 LC-1998 LC-1999 LC-2000 LC-2001 LC-2002 LC-2003 LC-2004 LC-2005 LC-2006 LC-2007 LC-2008 LC-2009 LC-2010 LC-2011 LC-2012 LC-2013 LC-2014 LC-2015 LC-2016 LC-2017 LC-2018 LC-2019 LC-2020 LC-2021 LC-2022 LC-2023 LC-2024 LC-2025 LC-2026 LC-2027 LC-2028 LC-2029 LC-2030 LC-2031 LC-2032 LC-2033 LC-2034 LC-2036 LC-2037 LC-2038 LC-2039 LC-2040 LC-2041 LC-2042 LC-2043 LC-2044 LC-2045 LC-2046 LC-2047 LC-2048 LC-2049 LC-2050 LC-2051 LC-2052 LC-2053 LC-2054 LC-2064 LC-2065 LC-2066 LC-2067 LC-2068 LC-2069 LC-2070 LC-2071 LC-2072 LC-2073 LC-2074 LC-2075 LC-2076 LC-2077 LC-2078 LC-2079 LC-2080 LC-2081 LC-2082 LC-2083 LC-2084 LC-2085 LC-2089 LC-2090 LC-2091 LC-2097
Previous: LC-2052 Next: LC-2029
5) Section 4.3.6.1 I miss any discussion or reference in the document about the issue of character encodings. Transforming content across different charsets is a mine-field and affects a number of aspects: a) Content may rely upon widely different character encodings, depending on the targetted devices and markets. In particular, the trio China - Japan - Korea (CJK) continues to rely on a number of encodings (such as Shift_JIS, BIG5, etc) whose handling is a complex matter; for instance, there are not necessarily bijective mappings between these encodings and others, including UTF-8. b) Documents may have multi-encoding representations. Different encodings may be associated with external entities through the charset attribute (see HTML 4.0.1). How transformation proxies deal with such a situation is left undefined. c) Similarly, the draft does not explain what happens when a server associates an attribute accept-charset to a form, and whether proxies respect or manipulate such information. d) In i-Mode, and at least in the Softbank environment (Japan), unreserved character points in the character encoding space are used to represent pictograms. Any attempt to convert these characters directly will fail; they should therefore not be transformed, but preserved, taking into account the fact that the character points thus referred to differ between Unicode and Shift_JIS, and that DoCoMo and Softbank do not use the same code points for the same pictograms. A consequence of all this is that if a proxy does not operate natively with the character encoding of the content returned by the server, or is not able to ensure a bijective mapping between this encoding and other encodings it deals with, recurrent and irrecoverable problems will creep. A simple way that could go some way towards alleviating this risk would be to forbid any transformation if the server announces (either via the HTTP field Content-type: charset=..., the XML declaration, or a meta-tag) an encoding different from ASCII or perhaps UTF-8.