This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27675 - U+FFFD in euc_kr index
Summary: U+FFFD in euc_kr index
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-19 16:49 UTC by public+w3@mearie.org
Modified: 2014-12-20 09:42 UTC (History)
2 users (show)

See Also:


Attachments

Description public+w3@mearie.org 2014-12-19 16:49:47 UTC
The updated euc_kr table now has the following entries:

---8<---
 5916	0xFFFD	� (REPLACEMENT CHARACTER)
 5917	0xFFFD	� (REPLACEMENT CHARACTER)
 5918	0xFFFD	� (REPLACEMENT CHARACTER)
 5919	0xFFFD	� (REPLACEMENT CHARACTER)
 5920	0xFFFD	� (REPLACEMENT CHARACTER)
 5921	0xFFFD	� (REPLACEMENT CHARACTER)
[snip]
 5948	0xFFFD	� (REPLACEMENT CHARACTER)
 5949	0xFFFD	� (REPLACEMENT CHARACTER)
 5950	0xFFFD	� (REPLACEMENT CHARACTER)
 5951	0xFFFD	� (REPLACEMENT CHARACTER)
 5952	0xFFFD	� (REPLACEMENT CHARACTER)
 5953	0xFFFD	� (REPLACEMENT CHARACTER)
---8<---

They correspond to byte sequences A0 5B..60 and A0 7B..80, which are gaps between UHC ranges. I don't think Bug 16691 intended this (as they are the only occurrences of U+FFFD throughout the indices at the moment). This causes an otherwise valid decoder to accept those sequences even when the fatal mode is in the use.
Comment 1 Anne 2014-12-19 19:13:19 UTC
Should I remove the space from your Korean name? Other contributors with a Korean name do not have a space there. See https://encoding.spec.whatwg.org/#acknowledgments for context.

Anyway, that's an aside, thanks a lot for spotting for this! This was a stupid mistake when regenerating the index.

https://github.com/whatwg/encoding/commit/7991e7b9add2a6d1ccd34e637d2c4e15ae7bbf7c
Comment 2 public+w3@mearie.org 2014-12-20 02:30:16 UTC
That is intended (i.e. I personally prefer that), but if you want to be consistent I don't mind normalizing that.
Comment 3 Anne 2014-12-20 09:42:49 UTC
Thanks, I added a note so I don't forget about your preference. It'll stay as is.