Character Model for the World Wide Web 1.0 W3C Working Draft 26 January 2001 Public Part of Disposition of Comments

Should search engines normalize? --
Responsibility for Normalization: "[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first successfully validated the text for normalization, and MUST NOT normalize the suspect text." I understand that some application such as XML processor MUST NOT normalize the suspect text because the normalization can turn a well-formed document to ill-formed. On the other hand, some application such as search engine SHOULD normalize text so that it can find canonically equivalent text.

LCI-197

LCC-220

Delimiters for character escaping --
Character Escaping: "[S] Explicit end delimiters MUST be provided. Escapes such as \uABCD where the end delimiter is a space or any character other than [01-9A-F] SHOULD be avoided." MUST and SHOULD are mixed here. If the first requirement is MUST, the second must be also MUST.

LCI-196

LCC-219

"character data" vs. "text data" --
Character Escaping: In the first paragraph, two terms "character data" and "text data" appear, which seem to mean the same thing. It would be better to use either one of the term consistently.

LCI-195

LCC-218

Would specifications "determine" the encoding of data? --
Character Encoding Identification: "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data?

LCI-194

LCC-217

"There is also no ambiguity if data is transferred non-electronically ..." --
Mandating a unique character encoding: "There is also no ambiguity if data is transferred non-electronically and later has to be converted back to a digital representation." If "transferred non-electronically" means that characters are written on paper, there are a lot of ambiguity to determine characters from glyph, like if this space is SPACE U+0020 or NO-BREAK SPACE U+00A0.

LCI-193

LCC-216

XML doesn't allow use of full range of Unicode code points and doesn't justify exceptions --
Reference Processing Model: In the first Note in this section, it says "All specifications that derive from the XML 1.0 specification [XML 1.0] automatically inherit this Reference Processing Model." But XML 1.0 is not very good example because it doesn't allow the use of the full range of Unicode code points and it doesn't justify the exceptions.

LCI-192

LCC-214

Issue 5: Responsibilities "Proxy" versus "Recipient"

LCI-188

LCC-213

Issue 4: Full Normalization as a Web Content requirement

LCI-181

LCC-212

Issue 3: Full Normalization as document syntax dependent

LCI-187

LCC-211

Issue 2: XPath string-value

LCI-186

LCC-210

Characters above U+10FFFF

Issue 1: XML comments

LCI-184

LCC-207

"... all applicable requirements MUST be satisfied." (this item deals with IJ's 2nd point)

IURIs, URIs, CHARMOD -- See also subsequent mail

LCI-183

LCC-203

Björn Höhrmann

This comment has been merged with LCC-202

LCC-202

Björn Höhrmann

Normalization vs. encoding layers -- See also subsequent mail

LCI-121

LCC-126

Robert Chilton

U+0Fnn (Tibetan Block) characters

LCI-98

LCC-125

Microsoft

Critique of reliance on early normalization

LCI-104

LCC-123

Library of Congress

Concerns about early normalization -- The originator asked us to follow up with Randy Barry

LCI-102

LCC-122

Library of Congress

Concerns about NFC -- The originator asked us to follow up with Randy Barry

LCI-101

LCC-121

Section 8 (Character Encoding in URI References): Discussion of DSig approach

LCI-100

LCC-120

Section 4.3 (Responsibility for Normalization): Discussion of DSig approach

LCI-99

LCC-119

Section 4 (Early Uniform Normalization): Discussion of DSig approach

LCI-97

LCC-118

Section 3.7 (Character Escaping): "There SHOULD be only one way to escape a character."

LCI-96

LCC-117

Section 3.6.2 (Private Use Code Points): Disagreement with our approach

LCI-95

LCC-116

Section 3.6.1 (Character Encoding Identification): Discussion of DSig approach

LCI-94

LCC-115

Section 3.6 (Choice and Identification of Character Encodings): UTF-16 vs UTF-8

LCI-93

LCC-114

Section 3.5 (Reference Processing Model): What does "arbitrarily restrict the range of characters that can be used" mean?

LCI-59

LCC-113

Section 3.2 (Digital Representation of Characters): "the distinction between CEF and CES is not very clear and might merit an example"

LCI-92

LCC-112

Section 3.1.5 (Units of Collation): "Software developers MUST NOT merely use a one-to-one mapping as their string-compare function ..."

LCI-91

LCC-111

Section 3.1.3 (Units of Visual Rendering): Define "logical order"

LCI-90

LCC-110

Section 3.1.2 (Units of a Writing System, and Units of Aural Rendering): Define phoneme and syllabaries

LCI-89

LCC-109

Section 1.1 (Goals and Scope): "All W3C specifications have to conform ... other specifications ... are strongly encouraged ..."

LCI-15

LCC-99

Comments on Section 8 (Character Encoding in URI References)

LCI-8

LCC-98

"turning marked-up W3C-normalised text into plain text may produce non-NFC results"

LCI-55

LCC-97

Section 4.2.2 (W3C-normalized Text): General comments

LCC-96

Section 4.2.2 (W3C-normalized Text): General comments

LCC-95

The number of bits in a byte may not be equal to 8

LCI-6

LCC-94

List the allowed meaning of 'character'

LCI-39

LCC-93

François Richard

"For a specification to use the Reference Processing Model does not require that implementations actually use Unicode."

LCC-84

Section 4.2.2 (W3C-normalized Text): General comments

LCC-82

W3C specs need commas between maturity level and date

LCI-73

LCC-81

"Eve Maler Eds." -> "Eve Maler, Eds."

LCI-72

LCC-80

In the References section, W3C specs could all have publication dates

LCI-71

LCC-79

"developers and software that tags" -> "developers and software that tag"

LCI-70

LCC-78

"Text is then defined as" -> "Text is then defined as"

LCI-69

LCC-77

"The Unicode Standard" -> "the Unicode Standard"

LCI-68

LCC-76

"target audience of this document are" -> "target audience of this document is"

LCI-67

LCC-75

"Universal Access" -> "universal access"

LCI-66

LCC-74

"Hiragana and Katakana" vs "katakana and hiragana"

LCI-65

LCC-73

Unicode Consortium's instructions on how to refer to Unicode

LCI-64

LCC-72

"Since its early days, the Web has seen the development of a Reference Processing Model."

LCI-63

LCC-71

Please clarify how to handle "control characters"

Conformance checklist

LCI-62

LCC-68

Steve Tolkin

LCI-59

LCC-66

Joyce Nakada

"For a specification to use the Reference Processing Model does not require that implementations actually use Unicode."

"For a specification to use the Reference Processing Model does not require that implementations actually use Unicode."

LCC-65

Miles Whitehead

LCC-64

Example A.3 "appears oversimplified"

LCI-57

LCC-63

"conversion a legal" -> "conversion to a legal"

LCI-56

LCC-62

Comments on Section 8 (Character Encoding in URI References)

LCI-8

LCC-61

Expand GI or avoid it

LCI-29

LCC-60

"turning marked-up W3C-normalised text into plain text may produce non-NFC results"

LCI-55

LCC-59

Discussion of: "Note: Legacy text is always normalized unless it contains escapes which, once expanded, denormalize it."

LCI-54

LCC-58

Impact of versioning on normalisation

LCI-53

LCC-57

Section 4.2.2 (W3C-normalized Text): General comments

LCC-56

Section 4.2.2 (W3C-normalized Text): "the parenthetical definition should be removed, along with its application."

Recommend against unnecessary use of escapes

LCI-49

LCC-53

Use of character escapes in identifiers

LCI-48

LCC-52

Standard named character entities and normalization

LCC-51

Recommend the use of hex rather than decimal NCRs

LCI-46

LCC-50

Comments on Section 3.6.1 (Character Encoding Identification)

LCI-45

LCC-49

"say that XML uses a pseudo-attribute called 'encoding' rather than 'charset'"

LCI-44

LCC-48

"Transfer Encoding Syntax is missing" [from Section 3.2 (Digital Representation of Characters)]

LCI-43

LCC-47

"code point" vs "code position"

LCI-42

LCC-46

The number of bits in a byte may not be equal to 8

LCI-6

LCC-45

"Terms such as 'byte' and 'wyde' are left for the reader to guess, likewise for 'octet' ..."

LCI-41

LCC-44

"There is no definition of terms in the document."

LCI-40

LCC-43

List the allowed meaning of 'character'

LCI-39

LCC-42

"when is multiple 'characters' stored in a single 'physical unit of storage'?"

LCI-38

LCC-41

"All...specification" -> "All...specifications"

LCI-37

LCC-40

"The terminology (SHALL, ..., OPTIONAL, ...) should come before the conformity clause"

LCI-36

LCC-39

"The phrase 'MUST NOT' reflects in itself a lack of internationalisation"

LCI-35

LCC-38

"Conformance" -> "Conformity"

LCI-34

LCC-37

"For a specification to use the Reference Processing Model does not require that implementations actually use Unicode."

Section 1.3 (Notation)

LCI-33

LCC-35

François Richard

Section 8 (Character Encoding in URI References) (various comments)

LCI-8

LCC-32

"APIs in addition SHOULD NOT specify single character or single encoding-unit arguments."

LCI-28

LCC-31

Is DOM Range spec an example of non-numeric substring identification?

LCI-27

LCC-30

"translation of a document from one language to another"

LCI-30

LCC-29

String Indexing needs more examples

LCI-190

LCC-28

"Conversion to a common encoding of UCS"

Section 4.2.3 Examples (clarify)

LCI-25

LCC-25

normalizing-transcoders

LCI-24

LCC-24

What is the definition of "legacy text"? "Legacy encoding"?

LCI-23

LCC-23

"Unicode encoding form"

LCI-22

LCC-22

"Where specifications need to allow the transmission of symbols not in Unicode ..., they MAY define markup for this purpose."

LCI-21

LCC-21

"Receiving software MUST determine the encoding from available information. It MAY recognize as many encodings ... as appropriate. When no charset is provided the receiving software MUST adhere to the default encoding(s) ..."

LCI-20

LCC-20

Section 3.2 (Digital Representation of Characters): Use of "units of encoding" and "unit"

LCI-19

LCC-19

Section 3.2 (Digital Representation of Characters): Use of "encoding" and "character encoding"

LCI-18

LCC-18

Collation units vs characters

LCI-17

LCC-17

"it is not the case that keystrokes and input characters correspond one-to-one"

LCI-16

LCC-16

"All W3C specifications [have to | MUST] conform", "all applicable requirements MUST be satisfied" (this item deals with IJ's 1st and 3rd points)

LCI-15

LCC-15

Need more examples, and more explanations

LCI-14

LCC-9

Tim Moore

Directionality of numbers

LCI-9

LCC-8

Mike Brown

Section 8 (Character Encoding in URI References) is a mess

LCC

From

Description

LCI

Comments: 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 last

Last Call Issues (see also Last Call Comments)

The three Status columns are:

Accepted in principle,
Made the change (only relevant if the value of Accepted in principle is "Y"),
Closed.

The possible values of Accepted in principle are:

Yes,
yes, though implementation in version 1.0 is subject to time constraints,
Partly,
No,
- (indicating 'not applicable').

The possible values of Type are:

Editorial,
Note (eg "this is what XYZ WG did"),
Out of scope (a proposal that the spec cover additional matters which are out of scope),
Question (not to be used if the question requires any action other than an answer),
Substantive,
Typo,
eXtension (a proposal that the spec cover additional matters which are in scope).

Issues: 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 last

LCI	Status	T	Ref	Description	LCC	Comment
A	M	C
LCI-197	N	-	Y	S	4.3	Should search engines normalize? -- Responsibility for Normalization: "[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first successfully validated the text for normalization, and MUST NOT normalize the suspect text." I understand that some application such as XML processor MUST NOT normalize the suspect text because the normalization can turn a well-formed document to ill-formed. On the other hand, some application such as search engine SHOULD normalize text so that it can find canonically equivalent text.	LCC-221	comment-221
LCI-196	P	Y	Y	E	3.7	Delimiters for character escaping -- Character Escaping: "[S] Explicit end delimiters MUST be provided. Escapes such as \uABCD where the end delimiter is a space or any character other than [01-9A-F] SHOULD be avoided." MUST and SHOULD are mixed here. If the first requirement is MUST, the second must be also MUST.	LCC-220	comment-220
LCI-195	Y	Y	Y	E	3.7	"character data" vs. "text data" -- Character Encoding Identification: "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data?	LCC-218	comment-218
LCI-194	Y	Y	Y	E	3.6.2	Would specifications "determine" the encoding of data?-- Character Encoding Identification: "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data?	LCC-218	comment-218
LCI-193	Y	Y	Y	E	3.6.1	"There is also no ambiguity if data is transferred non-electronically ..." -- Mandating a unique character encoding: "There is also no ambiguity if data is transferred non-electronically and later has to be converted back to a digital representation." If "transferred non-electronically" means that characters are written on paper, there are a lot of ambiguity to determine characters from glyph, like if this space is SPACE U+0020 or NO-BREAK SPACE U+00A0.	LCC-217	comment-217
LCI-192	Y	Y	Y	E	3.5	XML doesn't allow use of full range of Unicode code points and doesn't justify exceptions -- Reference Processing Model: In the first Note in this section, it says "All specifications that derive from the XML 1.0 specification [XML 1.0] automatically inherit this Reference Processing Model." But XML 1.0 is not very good example because it doesn't allow the use of the full range of Unicode code points and it doesn't justify the exceptions.	LCC-216	comment-216
LCI-190	y	Y	Y	E	7	String Indexing needs more examples	LCC-29	comment-29
LCI-188	Y	Y	Y	S	4.3	Issue 5: Responsibilities "Proxy" versus "Recipient"	LCC-214	comment-214
LCI-187	Y	Y	Y	S	4.2.2	Issue 3: Full Normalization as document syntax dependent	LCC-212	comment-212
LCI-186	Y	Y	Y	S	4.3	Issue 2: XPath string-value	LCC-211	comment-211
LCI-185	Y	Y	Y	E	3.3 3.5	Transcoding	LCC-209	comment-209
LCI-184	Y	Y	Y	E	3.5	Characters above U+10FFFF	LCC-208	comment-208
LCI-183	P	Y	Y	S	8	IURIs, URIs, CHARMOD -- See also subsequent mail	LCC-207	comment-207
LCI-181	Y	Y	Y	S	4.2 4.3	State that W3C N11N is required	LCC-204 LCC-213	comment-204 comment-213
LCI-180	N	-	Y	S	4	Normalization vs. encoding layers -- See also subsequent mail	LCC-202	comment-202
LCI-179	-	-	Y	S	8	This issue has been merged with LCI-8	-	-
LCI-121	Y	Y	Y	E	2	"... all applicable requirements MUST be satisfied."	LCC-144	comment-144
LCI-104	Y	Y	Y	S	4.3	Critique of reliance on early normalization	LCC-125	comment-125
LCI-102	N	-	Y	S	4.3	Concerns about impact of early normalization	LCC-123	comment-123
LCI-101	N	-	Y	S	4.2	Concerns about NFC	LCC-122	comment-122
LCI-100	-	-	Y	N	8	(Character Encoding in URI References): Discussion of DSig approach	LCC-121	comment-121
LCI-99	-	-	Y	N	4.3	(Responsibility for Normalization): Discussion of DSig approach	LCC-120	comment-120
LCI-98	-	-	Y	O	4.2	U+0Fnn (Tibetan Block) characters	LCC-126	comment-126
LCI-97	-	-	Y	N	4.3	(Early Uniform Normalization): Discussion of DSig approach	LCC-119	comment-119
LCI-96	Y	Y	Y	S	3.7	(Character Escaping): "There SHOULD be only one way to escape a character." -- See miscellany 40	LCC-118	comment-118
LCI-95	N	-	Y	S	3.6.2	(Private Use Code Points): Disagreement with our approach	LCC-117	comment-117
LCI-94	-	-	Y	N	3.6.1	(Character Encoding Identification): Discussion of DSig approach	LCC-116	comment-116
LCI-93	Y	Y	Y	E	3.6	(Choice and Identification of Character Encodings): Why do we say "For APIs, UTF-16 is more appropriate"?	LCC-115	comment-115
LCI-92	N	-	Y	E	3.2	(Digital Representation of Characters): "the distinction between CEF and CES is not very clear and might merit an example"	LCC-113	comment-113
LCI-91	Y	Y	Y	E	3.1.5	(Units of Collation): "Software developers MUST NOT merely use a one-to-one mapping as their string-compare function ..."	LCC-112	comment-112
LCI-90	Y	Y	Y	E	3.1.3	(Units of Visual Rendering): Define "logical order"	LCC-111	comment-111
LCI-89	Y	Y	Y	E	3.1.2	(Units of a Writing System, and Units of Aural Rendering): Define phoneme and syllabaries -- See also mail from Richard Ishida	LCC-110	comment-110
LCI-88	-	-	Y	S	4.3	This issue has been merged with LCI-85	-	-
LCI-75	-	-	Y	-	4.2	This issue has been merged with LCI-47	-	-
LCI-74	-	-	Y	-	8	This issue has been merged with LCI-8	-	-
LCI-73	Y	Y	Y	T	Refs	W3C specs need commas between maturity level and date	LCC-82	comment-82
LCI-72	Y	Y	Y	T	Refs	"Eve Maler Eds." -> "Eve Maler, Eds."	LCC-81	comment-81
LCI-71	N	-	Y	E	Refs	In the References section, W3C specs could all have publication dates	LCC-80	comment-80
LCI-70	Y	Y	Y	T	3.6.1	"developers and software that tags" -> "developers and software that tag"	LCC-79	comment-79
LCI-69	Y	Y	Y	E	3.1.7	"Text is then defined as" -> "Text is then defined as"	LCC-78	comment-78
LCI-68	N	-	Y	E	1.1	"The Unicode Standard" -> "the Unicode Standard"	LCC-77	comment-77
LCI-67	Y	Y	Y	T	1.1	"target audience of this document are" -> "target audience of this document is"	LCC-76	comment-76
LCI-66	N	-	Y	E	1.1	"Universal Access" -> "universal access"	LCC-75	comment-75
LCI-65	Y	Y	Y	T	3.1.2, 5	"Hiragana and Katakana" vs "katakana and hiragana"	LCC-74	comment-74
LCI-64	Y	Y	Y	E	9	Unicode Consortium's instructions on how to refer to Unicode	LCC-73	comment-73
LCI-63	Y	Y	Y	E	3.5	"Since its early days, the Web has seen the development of a Reference Processing Model."	LCC-72	comment-72
LCI-62	y	Y	Y	E	2	Provide a conformance checklist	LCC-71	comment-71
LCI-59	Y	Y	Y	E	3.5	Please clarify how to handle "control characters" -- See miscellany 8	LCC-68 LCC-114	comment-68 comment-114
LCI-57	Y	Y	Y	E	A.3	This example "appears oversimplified"	LCC-64	comment-64
LCI-56	Y	Y	Y	T	8	"conversion a legal" -> "conversion to a legal"	LCC-63	comment-63
LCI-55	Y	Y	Y	S	4.2	"turning marked-up W3C-normalised text into plain text may produce non-NFC results"	LCC-60 LCC-98 LCC-134 LCC-210	comment-60 comment-98 comment-134 comment-210
LCI-54	P	Y	Y	E	4.2	Discussion of: "Note: Legacy text is always normalized unless it contains escapes which, once expanded, denormalize it." -- See miscellany 9	LCC-59	comment-59
LCI-53	N	-	Y	E	4.2	Impact of versioning on normalisation	LCC-58	comment-58
LCI-52	-	-	Y	?	4.2	This issue has been merged with LCI-47	-	-
LCI-51	P	Y	Y	E	4.2.2	(W3C-normalized Text): "the parenthetical definition should be removed, along with its application."	LCC-56	comment-56
LCI-50	N	-	Y	E	4.1	Referencing UTR #15	LCC-55	comment-55
LCI-49	Y	Y	Y	E	3.7	Recommend against unnecessary use of escapes	LCC-54	comment-54
LCI-48	N	-	Y	E	3.7	Use of character escapes in identifiers	LCC-53	comment-53
LCI-47	Y	Y	Y	S	4.2	Entities and normalization	LCC-52 LCC-57 LCC-84 LCC-85 LCC-91 LCC-96 LCC-97 LCC-206	comment-52 comment-57 comment-84 comment-85 comment-91 comment-96 comment-97 comment-206
LCI-46	Y	Y	Y	E	3.7	Recommend the use of hex rather than decimal NCRs -- See note 2	LCC-51	comment-51
LCI-45	N	-	Y	E	3.6.1	(Character Encoding Identification) Comments	LCC-50	comment-50
LCI-44	Y	Y	Y	E	3.6.1	"say that XML uses a pseudo-attribute called 'encoding' rather than 'charset'"	LCC-49	comment-49
LCI-43	N	-	Y	E	3.2	(Digital Representation of Characters) "Transfer Encoding Syntax is missing"	LCC-48	comment-48
LCI-42	Y	Y	Y	E	3	"code point" vs "code position"	LCC-47	comment-47
LCI-41	N	-	Y	E	3	"Terms such as 'byte' and 'wyde' are left for the reader to guess, likewise for 'octet' ..."	LCC-45	comment-45
LCI-40	N	-	Y	E	Gen	"There is no definition of terms in the document."	LCC-44	comment-44
LCI-39	N	-	Y	E	3.1.7	List the allowed meaning of 'character'	LCC-43 LCC-94	comment-43 comment-94
LCI-38	Y	Y	Y	E	3.1.6	"when is multiple 'characters' stored in a single 'physical unit of storage'?" -- See note 3	LCC-42	comment-42
LCI-37	Y	Y	Y	T	2	"All...specification" -> "All...specifications"	LCC-41	comment-41
LCI-36	N	-	Y	E	2	"The terminology (SHALL, ..., OPTIONAL, ...) should come before the conformity clause"	LCC-40	comment-40
LCI-35	N	-	Y	E	2	"The phrase 'MUST NOT' reflects in itself a lack of internationalisation"	LCC-39	comment-39
LCI-34	N	-	Y	E	2	"Conformance" -> "Conformity"	LCC-38	comment-38
LCI-33	N	-	Y	E	1.3	(Notation): Denoting Unicode code points in this specification	LCC-37	comment-37
LCI-32	Y	Y	Y	E	3.5	"For a specification to use the Reference Processing Model does not require that implementations actually use Unicode."	LCC-35 LCC-65 LCC-66 LCC-93	comment-35 comment-65 comment-66 comment-93
LCI-31	Y	Y	Y	E	9	"in synchronism"	LCC-34	comment-34
LCI-30	Y	Y	Y	E	7	"translation of a document from one language to another"	LCC-30	comment-30
LCI-29	Y	Y	Y	E	5	Use of the acronym "GI"	LCC-27 LCC-61	comment-27 comment-61
LCI-28	Y	Y	Y	E	6	"APIs in addition SHOULD NOT specify single character or single encoding-unit arguments."	LCC-32	comment-32
LCI-27	-	-	Y	Q	6	Is DOM Range spec an example of non-numeric substring identification?	LCC-31	comment-31
LCI-26	-	-	Y	Q	6	"Conversion to a common encoding of UCS"	LCC-28	comment-28
LCI-25	Y	Y	Y	E	4.2.3	(Examples) Clarify -- See miscellany 26, miscellany 27, miscellany 28	LCC-26	comment-26
LCI-24	Y	Y	Y	E	4.2	Normalizing-transcoders -- See miscellany 25	LCC-25	comment-25
LCI-23	Y	Y	Y	E	4.2	What is the definition of "legacy text"? "Legacy encoding"? -- See miscellany 24	LCC-24	comment-24
LCI-22	Y	Y	Y	E	4.2	"Unicode encoding form" -- See miscellany 23	LCC-23	comment-23
LCI-21	Y	Y	Y	E	3.6.2	"Where specifications need to allow the transmission of symbols not in Unicode ..., they MAY define markup for this purpose." -- See miscellany 22	LCC-22	comment-22
LCI-20	Y	Y	Y	E	3.6	Receiving software MUST determine the encoding from available information. It MAY recognize as many encodings ... as appropriate. When no charset is provided the receiving software MUST adhere to the default encoding(s) ..." -- See miscellany 20, miscellany 21	LCC-21	comment-21
LCI-19	Y	Y	Y	E	3.2	Use of "units of encoding" and "unit" -- See miscellany 15	LCC-20	comment-20
LCI-18	Y	Y	Y	E	3.2	Use of "encoding" and "character encoding" -- See miscellany 17, miscellany 18, miscellany 19	LCC-19	comment-19
LCI-17	Y	Y	Y	E	3.1.5	Collation units vs characters -- See miscellany 14	LCC-18	comment-18
LCI-16	Y	Y	Y	E	3.1.4	"it is not the case that keystrokes and input characters correspond one-to-one"	LCC-17	comment-17
LCI-15	N	-	Y	S	2	All W3C specifications [have to \| MUST] conform", "all applicable requirements MUST be satisfied" -- See miscellany 13 -- Other part(s) of this item are now in LCI-121	LCC-16 LCC-109	comment-16 comment-109
LCI-14	y	Y	Y	E	Gen	Need more examples, and more explanations	LCC-15	comment-15
LCI-9	-	-	Y	O	3.1.3	Directionality of numbers	LCC-9	comment-9
LCI-8	P	Y	Y		8	(Character Encoding in URI References) General issues	LCC-7 LCC-8 LCC-33 LCC-62 LCC-83 LCC-99 LCC-201	comment-7 comment-8 comment-33 comment-62 comment-83 comment-99 comment-201
LCI-7	-	-	Y	E	8	This issue has been merged with LCI-8	-	-
LCI-6	Y	Y	Y	E	3.1.6	Use of "bytes"	LCC-6 LCC-46 LCC-95	comment-6 comment-46 comment-95
LCI-5	Y	Y	Y	E	3.2	Use of "ISO 8859-1"	LCC-5	comment-5
LCI-1	-	-	Y	E	Gen	This collection of editorial points has been moved elsewhere	-	-
LCI	Status	T	Ref	Description	LCC	Comment
A	M	C

LCI

Status

Ref

Description

LCC

Comment

4.3

3.7

3.7

"character data" vs. "text data" --
Character Encoding Identification: "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data?

LCC-218

comment-218

LCI-194

3.6.2

Would specifications "determine" the encoding of data?--
Character Encoding Identification: "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data?

3.6.1

3.5

String Indexing needs more examples

LCC-29

comment-29

LCI-188

4.3

Issue 5: Responsibilities "Proxy" versus "Recipient"

LCC-214

comment-214

LCI-187

4.2.2

Issue 3: Full Normalization as document syntax dependent

LCC-212

comment-212

LCI-186

4.3

Issue 2: XPath string-value

LCC-211

comment-211

LCI-185

3.3
3.5

Transcoding

LCC-209

comment-209

LCI-184

3.5

Characters above U+10FFFF

LCC-208

comment-208

LCI-183

IURIs, URIs, CHARMOD -- See also subsequent mail

LCC-207

comment-207

LCI-181

4.2
4.3

State that W3C N11N is required

LCC-204
LCC-213

comment-204
comment-213

LCI-180

Normalization vs. encoding layers -- See also subsequent mail

LCC-202

comment-202

LCI-179

This issue has been merged with LCI-8

LCI-121

"... all applicable requirements MUST be satisfied."

LCC-144

comment-144

LCI-104

4.3

Critique of reliance on early normalization

LCC-125

comment-125

LCI-102

4.3

Concerns about impact of early normalization

LCC-123

comment-123

LCI-101

4.2

Concerns about NFC

LCC-122

comment-122

LCI-100

(Character Encoding in URI References): Discussion of DSig approach

LCC-121

comment-121

LCI-99

4.3

(Responsibility for Normalization): Discussion of DSig approach

LCC-120

comment-120

LCI-98

4.2

U+0Fnn (Tibetan Block) characters

LCC-126

comment-126

LCI-97

4.3

(Early Uniform Normalization): Discussion of DSig approach

LCC-119

comment-119

LCI-96

3.7

(Character Escaping): "There SHOULD be only one way to escape a character." -- See miscellany 40

LCC-118

comment-118

LCI-95

3.6.2

(Private Use Code Points): Disagreement with our approach

LCC-117

comment-117

LCI-94

3.6.1

(Character Encoding Identification): Discussion of DSig approach

LCC-116

comment-116

LCI-93

3.6

(Choice and Identification of Character Encodings): Why do we say "For APIs, UTF-16 is more appropriate"?

LCC-115

comment-115

LCI-92

3.2

(Digital Representation of Characters): "the distinction between CEF and CES is not very clear and might merit an example"

LCC-113

comment-113

LCI-91

3.1.5

(Units of Collation): "Software developers MUST NOT merely use a one-to-one mapping as their string-compare function ..."

LCC-112

comment-112

LCI-90

3.1.3

(Units of Visual Rendering): Define "logical order"

LCC-111

comment-111

LCI-89

3.1.2

(Units of a Writing System, and Units of Aural Rendering): Define phoneme and syllabaries -- See also mail from Richard Ishida

LCC-110

comment-110

LCI-88

4.3

This issue has been merged with LCI-85

LCI-75

4.2

This issue has been merged with LCI-47

LCI-74

This issue has been merged with LCI-8

LCI-73

Refs

W3C specs need commas between maturity level and date

LCC-82

comment-82

LCI-72

Refs

"Eve Maler Eds." -> "Eve Maler, Eds."

LCC-81

comment-81

LCI-71

Refs

In the References section, W3C specs could all have publication dates

LCC-80

comment-80

LCI-70

3.6.1

"developers and software that tags" -> "developers and software that tag"

LCC-79

comment-79

LCI-69

3.1.7

"Text is then defined as" -> "Text is then defined as"

LCC-78

comment-78

LCI-68

1.1

"The Unicode Standard" -> "the Unicode Standard"

LCC-77

comment-77

LCI-67

1.1

"target audience of this document are" -> "target audience of this document is"

LCC-76

comment-76

LCI-66

1.1

"Universal Access" -> "universal access"

LCC-75

comment-75

LCI-65

3.1.2, 5

"Hiragana and Katakana" vs "katakana and hiragana"

LCC-74

comment-74

LCI-64

Unicode Consortium's instructions on how to refer to Unicode

LCC-73

comment-73

LCI-63

3.5

"Since its early days, the Web has seen the development of a Reference Processing Model."

LCC-72

comment-72

LCI-62

Provide a conformance checklist

LCC-71

comment-71

LCI-59

3.5

Please clarify how to handle "control characters" -- See miscellany 8

LCC-68
LCC-114

comment-68
comment-114

LCI-57

A.3

This example "appears oversimplified"

LCC-64

comment-64

LCI-56

"conversion a legal" -> "conversion to a legal"

LCC-63

comment-63

LCI-55

4.2

"turning marked-up W3C-normalised text into plain text may produce non-NFC results"

LCC-60
LCC-98
LCC-134
LCC-210

comment-60
comment-98
comment-134
comment-210

LCI-54

4.2

Discussion of: "Note: Legacy text is always normalized unless it contains escapes which, once expanded, denormalize it." -- See miscellany 9

LCC-59

comment-59

LCI-53

4.2

Impact of versioning on normalisation

LCC-58

comment-58

LCI-52

4.2

This issue has been merged with LCI-47

LCI-51

4.2.2

(W3C-normalized Text): "the parenthetical definition should be removed, along with its application."

LCC-56

comment-56

LCI-50

4.1

Referencing UTR #15

LCC-55

comment-55

LCI-49

3.7

Recommend against unnecessary use of escapes

LCC-54

comment-54

LCI-48

3.7

Use of character escapes in identifiers

LCC-53

comment-53

4.2

Entities and normalization

LCC-52
LCC-57
LCC-84
LCC-85
LCC-91
LCC-96
LCC-97
LCC-206

comment-52
comment-57
comment-84
comment-85
comment-91
comment-96
comment-97
comment-206

LCI-46

3.7

Recommend the use of hex rather than decimal NCRs -- See note 2

LCC-51

comment-51

LCI-45

3.6.1

(Character Encoding Identification) Comments

LCC-50

comment-50

LCI-44

3.6.1

"say that XML uses a pseudo-attribute called 'encoding' rather than 'charset'"

LCC-49

comment-49

LCI-43

3.2

(Digital Representation of Characters) "Transfer Encoding Syntax is missing"

LCC-48

comment-48

LCI-42

"code point" vs "code position"

LCC-47

comment-47

LCI-41

"Terms such as 'byte' and 'wyde' are left for the reader to guess, likewise for 'octet' ..."

LCC-45

comment-45

LCI-40

Gen

"There is no definition of terms in the document."

LCC-44

comment-44

LCI-39

3.1.7

List the allowed meaning of 'character'

LCC-43
LCC-94

comment-43
comment-94

LCI-38

3.1.6

"when is multiple 'characters' stored in a single 'physical unit of storage'?" -- See note 3

LCC-42

comment-42

LCI-37

"All...specification" -> "All...specifications"

LCC-41

comment-41

LCI-36

"The terminology (SHALL, ..., OPTIONAL, ...) should come before the conformity clause"

LCC-40

comment-40

LCI-35

"The phrase 'MUST NOT' reflects in itself a lack of internationalisation"

LCC-39

comment-39

LCI-34

"Conformance" -> "Conformity"

LCC-38

comment-38

LCI-33

1.3

(Notation): Denoting Unicode code points in this specification

LCC-37

comment-37