This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The hz-gb-2312 encoder shifts to two-byte mode (i.e., emits the shift sequence ~{ or 7E 7B) whenever a non-ASCII character is seen (and the encoder is not in two-byte mode already), without checking whether the character is actually encodable (part of GB2312). If it is not, an encoder error will be emitted next, which means that 1) for a terminating encoder, the output will end with a useless shift sequence, and 2) for a non-terminating encoder, the two-byte shift will have to be followed immediately by a one-byte (ASCII) shift (~} or 7E 7D) before the ASCII representation of the unrepresentable character. It seems better not to output shift sequences with no purpose. This issue also applies to the encoders for ISO-2022-JP and ISO-2022-KR.
Okay, so for the hz-gb-2312 encoder we could switch 7 and 8 and add to the new 8 the additional condition that pointer is not null. A similar type of fix works for the other encoders as far as I can tell.
Yes, that should work. Moving 7 to after 9 would be slightly simpler and might give the same result.
https://github.com/whatwg/encoding/commit/488c13a91d75c6f7314076ffd861a48972ac7f6d