IRC log of i18n on 2007-11-09

Timestamps are in UTC.

16:13:30 [RRSAgent]
RRSAgent has joined #i18n
16:13:30 [RRSAgent]
logging to http://www.w3.org/2007/11/09-i18n-irc
16:13:37 [amit]
amit has joined #i18n
16:13:44 [aphillip_]
http://www.w3.org/html/wg/html5/#determining0
16:13:49 [anne]
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html
16:13:56 [aphillip_]
http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0088.html
16:13:58 [fantasai]
16:13 -!- Irssi: Join to #i18n was synced in 0 secs
16:13:58 [fantasai]
16:13 < Hixie> http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#parsing
16:14:01 [fantasai]
16:13 < Hixie> http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#the-input0
16:14:57 [fantasai]
ScribeNick: fantasai
16:15:11 [fantasai]
Addison: THere was a badly-titled thread saying something about making windows-1252 the default encoding.
16:15:29 [fantasai]
Addison: Our first reaction was, wouldn't it be nice if that were something else, say utf-8
16:15:47 [fsasaki]
s/THere/There/
16:15:56 [fantasai]
Addison: At the same time we recognize that there's a legacy encoding issue, since previous versions of HTML required iso-????
16:16:14 [apppp]
apppp has joined #i18n
16:16:16 [hsivonen]
hsivonen has joined #i18n
16:16:23 [hsivonen]
http://hsivonen.iki.fi/charmod-checking/
16:16:28 [hsivonen]
http://hsivonen.iki.fi/charmod-norm-checking/
16:16:51 [smedero]
smedero has joined #i18n
16:16:56 [fantasai]
Addison: If you actually look at the sections, 8.2 and ....
16:17:09 [fantasai]
Addison: It does not in fact say that the default encoding of the universe at large is windows 1252
16:17:23 [fantasai]
Addison: In the sequence there's looking at byte sequences, then using heuristics, etc.
16:17:33 [Philip]
Philip has joined #i18n
16:17:37 [fantasai]
Addison: at the end of that sequence there's a paragraph that says
16:17:52 [MikeSmith]
MikeSmith has joined #i18n
16:17:55 [fantasai]
Addison: if all else fails, you have to supply some implementation-defined default and we recommend you do these things.
16:18:21 [fantasai]
Addison: And windows-1252 just appears out of nowhere.
16:18:48 [fantasai]
Addison: One thought we had was for us to provide some information on why windows-1252 is preferable and how it differs from the standard ISO encodings.
16:19:21 [plh]
plh has joined #i18n
16:19:27 [Hixie]
"
16:19:28 [Hixie]
When a user agent would otherwise use the ISO-8859-1 encoding, it must instead use the Windows-1252 encoding."
16:19:46 [fantasai]
Henri: that part is a violation of charmod
16:19:54 [fantasai]
Addison doesn't consider that a violation of charmod
16:20:35 [fantasai]
Addison: There are superset encodings and they're often tagged with the subset encodings.
16:21:13 [fantasai]
Addison: using the superset interpretation doesn't conflict with using the subset interpretation
16:21:31 [fantasai]
Addison: We're not proposing a substantive change, just providing more justification for what you're doing.
16:21:48 [fantasai]
Addison: We also looked at the structure of the paragraph, and had some concerns.
16:21:56 [fantasai]
Addison: one was the phrasing of "western demographics" etc
16:22:09 [fantasai]
Addison: We had several reactions.
16:22:22 [fantasai]
Addison: Oene it's not clear what a western demographic and how you tell when you're talking to one on the internet.
16:22:36 [fantasai]
Addison: We proposed 2 things, one of which was to turn two things around.
16:22:51 [fantasai]
Addison: We have a love of utf-8, and we'd like you to mention that one first and then the legacy thing
16:22:51 [amit]
amit has joined #i18n
16:23:13 [fantasai]
Addison: We also think the wording could be changed somewhat on the windows-1252 to say that "in a legacy context, if you have to guess, you should guess this one"
16:23:59 [fantasai]
Ian: I haven't gotten to that issue yet, haven't looked at it in detail, sounds ok
16:24:07 [fantasai]
Richard: Is it purely editorial?
16:24:18 [fantasai]
Addison: It doesn't change the result, it just changes how you explain the result.
16:24:32 [fantasai]
Ian: Do you have any recommendation for dealing with say Japan and other parts of East Asia?
16:25:15 [fantasai]
Addison: There are a variety of things in step #7 that allow for various heuristics and sniffing.
16:25:36 [fantasai]
Ian: windows-1252 is fine for US and UK, but what about other places?
16:25:41 [fantasai]
Felix: Depends on what device.
16:26:18 [fantasai]
Addison: Most implementations use information in the browser, e.g. what the browser uses or if a narrower auto-detect is set (as for Japanese)
16:26:44 [fantasai]
Ian: So in the Japanese cases, you expect that the rest of the steps would take care of it?
16:26:57 [fantasai]
Addison: I think you'd trap those encodings before you get to step 7(?)
16:27:30 [fantasai]
Addison: Might want to mention that in some cases of getting a subset encoding to use the superset encoding.
16:27:38 [fantasai]
Addison: I think we can provide that information.
16:28:07 [fantasai]
Ian: I believe when I wrote that section that I checked a browser and that was the only mapping they had.
16:28:24 [fantasai]
Addison: Most browsers dont' just do GBK, but do ????
16:28:46 [fantasai]
Addison: There are some cases, such as in Japan, where the byte patterns are completely different.
16:28:57 [fantasai]
Addison: where the encoding schemes are different even though the charset is the same
16:29:06 [fantasai]
Addison: that kind of autodetection is a separate thing
16:29:11 [fantasai]
Addison: I think this is still valid.
16:29:36 [fantasai]
Addison: THe only question I have is, if you're thinking "what should happen in step 7" is some language-dependent or context-dependent thing ...
16:30:01 [fantasai]
Hixie: In this final step, you don't have any information from the content
16:30:19 [fantasai]
Addison: You might want to think about splitting step 7 and doing a utf-8 detection first
16:30:50 [fantasai]
Addison: UTF-8 has recognizable byte patterns, it would be great to put that first before saying "use your favorite legacy encoding"
16:31:13 [fantasai]
Hixie: The concern is what happens if the user enters some bytes into the form and then submits it?
16:31:24 [fantasai]
Addison: We were just looking at that in the i18n working group
16:31:44 [fantasai]
Hixie: We'd have to make sure that that's what the server was expecting.
16:31:54 [fantasai]
Felix asks a question
16:32:11 [fantasai]
Hixie: Typically different localizations of the browser have different default encodings.
16:32:41 [fsasaki]
s/asks a question/what information are you looking at to guess what encoding the user applies?/
16:33:21 [fantasai]
Hixie: well, the email's in my pile. I don't know when I'll get to it.
16:33:41 [fantasai]
Addison: We'll look at superset encodings and try to write up a document that you can reference.
16:34:13 [fantasai]
Introductions
16:34:21 [fantasai]
Richard Ishida: W3C Internationalization Lead
16:34:28 [fantasai]
Anne van Kesteren: Opera Software
16:34:42 [fantasai]
Elika: fantasai, CSSWG Invited Expert, works on international text layout
16:34:56 [fantasai]
Addison Phillips: Yahoo, i18n wg
16:35:13 [fantasai]
Amit Parashar: something-or-other chair
16:35:23 [fantasai]
Henri Sivonen: working on HTML5 conformance checker
16:35:27 [fantasai]
Ian Hickson: HTML5 editor
16:35:51 [fantasai]
Felix Sasaki: i18n Core, i18n ITS and Web Services Policy WG [W3C]
16:36:32 [plh]
Philippe Le Hegaret: W3C, Architecture Domain (XML, Web Services, i18n), and Video
16:36:49 [fantasai]
Ishida: Can you explain the alt text issue?
16:36:50 [najib]
Najib Tounsi, W3C Morocco Office Mgr.
16:37:08 [fantasai]
Ishida: We believe that you should never put human-readable text in an attribute value because you can't put markup in it
16:37:30 [fantasai]
Ishida: which is important for various i18n reasons: bidi, language annotation, ruby, etc.
16:38:00 [fantasai]
Hixie: We still have the <img> element; we can't get rid of it. It still has alt attr, because it's had that.
16:38:12 [fantasai]
Hixie: We can't give it content because HTML parsers all close it right after the start tag.
16:38:25 [fantasai]
Hixie: We also have the <object> tag, which has full fallback capabilities.
16:38:37 [fantasai]
Ishida: Would the group advise the <object> tag then?
16:39:08 [fantasai]
Hixie: I don't think we'll have a recommendation one way or another; if your fallback content needs element content, then you'll have to use <object>
16:39:27 [fantasai]
Hixie: We've been doing some work, e.g. Acid2, on making sure the <object> tag works properly in various browsers.
16:39:49 [fantasai]
Ishida asks about some XHTML2 stuff
16:40:12 [fantasai]
Hixie: THe XHTML2 group did two things, one was switching some attributes into elements, e.g. title attributes.
16:40:27 [fantasai]
Hixie: Then they also went and started usng rdf for everything: we are certainly not going to do that.
16:40:46 [fantasai]
Hixie: For the first one, I'm not convinced that the benefits of using an element for these things is better than the costs
16:41:06 [fantasai]
Hixie: We can try not to do things like that in the future though
16:41:33 [fantasai]
Hixie: This problem comes up in many places, e.g. in DOM APIs that take a string.
16:41:49 [fantasai]
Hixie: There are also places where we can't make such changes, such as the <title> element
16:42:03 [fantasai]
Hixie: whose content winds up in places like filenames where you can't have structured markup anyway
16:42:29 [fantasai]
Ishida: Can you use bidi in filenames?
16:43:05 [fantasai]
Hixie: probably, but I'm not going to recommend it
16:44:17 [fantasai]
Ishida: We might need to start thinking about how to convert text from markup to strings with bidi control characters.
16:44:35 [anne]
(I think HTML 5 should get &rlo;, &lro;, and &pdf; (or something in that direction) for BiDi. These are already in IE.)
16:44:58 [fantasai]
Hixie: We did consider having a DOM attribute that would pull out e.g. bidi control characters from the markup and alt text from images
16:45:13 [fantasai]
Hixie: not sure where that's going
16:45:34 [fantasai]
Hixie: I would recommend finding solutions for plaintext, since that will work for both
16:46:58 [fantasai]
Discussion of that
16:47:18 [fantasai]
language tags are in Unicode, but were deprecated as soon as they were added: they were added as deprecated and should never be used
16:48:37 [anne]
(event though the characters they map to are apparently deprecated)
16:49:13 [fantasai]
discussion of markup-plaintext thing
16:49:44 [apppp]
reference RFC 3066 should point to BCP 47
16:51:01 [fantasai]
Addison notes that the i18n group needs to review the date parsing things
16:51:06 [najib]
+1 for to add &rle, ..., &pdf; in HTML
16:51:52 [fantasai]
Henri notes that it's using ISO dates anyway
16:52:00 [jgraham_]
jgraham_ has joined #i18n
16:52:09 [fantasai]
najib, if we're adding more entities I want &zwsp;
16:52:11 [fantasai]
:)
16:52:56 [najib]
It depends on usage frequences. :-)
16:53:32 [fantasai]
Topic: Validator checking entity reqs
16:53:40 [smedero]
smedero has left #i18n
16:54:51 [fantasai]
Henri: I don't check that character entities are only used for characters that are unclear.
16:55:02 [fantasai]
Henri: because I can't tell mechanically whether the character is unclear
16:57:05 [anne]
fantasai, I think &zwsp; is also supported by IE
16:57:23 [fantasai]
cool
16:57:30 [fantasai]
let's add it :P
16:57:40 [fantasai]
all the characters next to it have names,
16:57:44 [fantasai]
zwnj, zwj etc
16:57:54 [najib]
I don't have IE on MacOS :-( & :-)
16:58:59 [fantasai]
Ishida explain that this part of charmod is about best practices
16:59:16 [fantasai]
it's not should in the normative sense
17:00:48 [fantasai]
Elika: Maybe you should go through the document and change the wording of should sentences that don't match RFC2119 to something else
17:01:13 [fantasai]
Ishida: Well, we mean it that way for authors. Maybe we need to create different classes and explain which recommendations apply to which
17:01:45 [fsasaki]
http://hsivonen.iki.fi/charmod-norm-checking/
17:02:04 [amit]
amit has joined #i18n
17:02:08 [fantasai]
Henri: I documented which constructs in HTML5 result in a continuous string
17:02:48 [fantasai]
Henri: I don't have any other comment there except that I wrote this and it is available :)
17:03:44 [fantasai]
Henri: I have another comment, but its targetted at the unicode/icu specs
17:03:58 [fantasai]
Ishida: Might want to psot to the unicode list
17:04:02 [fantasai]
s/psot/post/
17:04:20 [apppp]
RRSAgent, set logs world-visible
17:04:25 [apppp]
RRSAgent, make minutes
17:04:25 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:05:16 [apppp]
Title: I18N / HTML5 break out session
17:05:29 [apppp]
Scribe: fantasai
17:05:35 [apppp]
ScribeNick: fantasai
17:05:45 [apppp]
RRSAgent, make minutes
17:05:45 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:08:42 [apppp]
Present: Anne, Addison, Richard, Hixix, Fantasai, Najib, Amit, Philippe, Hsivonen, J.Graham
17:08:49 [apppp]
RRSAgent, make minutes
17:08:49 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:09:18 [apppp]
s/Hixix/Hixie
17:09:24 [apppp]
RRSAgent, make minutes
17:09:24 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:09:44 [apppp]
Chair: Addison Phillips (I18N)
17:09:53 [apppp]
Regrets: none
17:09:59 [apppp]
RRSAgent, make minutes
17:09:59 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:10:55 [apppp]
Meeting: I18N / HTML5 Break-Out
17:11:01 [apppp]
RRSAgent, make minutes
17:11:01 [RRSAgent]
I have made the request to generate http://www.w3.org/2007/11/09-i18n-minutes.html apppp
17:11:31 [jgraham_]
jgraham_ has joined #i18n
17:11:49 [jgraham_]
jgraham_ has left #i18n
17:13:03 [apppp]
apppp has left #i18n
17:24:39 [Philip]
Philip has left #i18n
18:28:48 [najib]
najib has joined #I18N
18:42:26 [najib]
najib has joined #I18N
18:42:42 [r12a]
r12a has joined #i18n
18:42:54 [r12a]
we're on our way
18:43:10 [r12a]
just talking with Ian Jacobs about note vs Rec stuff
18:45:31 [anne]
anne has left #i18n
18:46:03 [anne]
anne has joined #i18n
18:55:37 [fsasaki]
fsasaki has joined #i18n
19:10:34 [aphillip_]
aphillip_ has joined #i18n
19:10:48 [fsasaki]
http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/index.html
19:11:07 [fsasaki]
http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0071.html
19:11:14 [fsasaki]
http://www.w3.org/2007/10/30-core-minutes.html#item08
19:11:32 [fsasaki]
<scribe> ACTION: Addison to write a note for consumption by XML Core expressing our support for extending XML 1.0 [recorded in http://www.w3.org/2007/10/30-core-minutes.html#action03]
19:14:30 [fsasaki]
Dear XML Core Working Group,
19:14:31 [fsasaki]
the i18n Core Working Group looked at your discussion about XML 1.0 and XML 1.1. [1]. We discussed this topic recently [2] and hereby would like to express our support for extending XML 1.0 in the way you described it at [1].
19:14:33 [fsasaki]
On behalf of the i18n Core Working Group,
19:14:34 [fsasaki]
Felix Sasaki
19:14:36 [fsasaki]
[1] http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0071.html
19:14:37 [fsasaki]
[2] http://www.w3.org/2007/10/30-core-minutes.html#item08
19:43:00 [aphillip_]
aphillip_ has left #i18n
19:49:42 [hsivonen]
hsivonen has left #i18n
20:33:23 [aphillip_]
aphillip_ has joined #i18n
20:46:18 [chaals]
chaals has joined #i18n
20:46:56 [aphillip_]
we're around
20:47:00 [aphillip_]
in our room
20:47:07 [aphillip_]
shall we wander past?
20:48:00 [chaals]
in 20 minutes?
20:48:19 [aphillip_]
sure... us come to you? you're in the webapi room, right?
20:49:11 [chaals]
yes please. cambridge a
20:49:16 [aphillip_]
done see you in 20
20:49:28 [chaals]
thanks
21:08:37 [fsasaki]
fsasaki has joined #i18n
22:53:12 [chaals]
chaals has left #i18n