20923 – Need a crazy mode for the %-decoder

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20923 - Need a crazy mode for the %-decoder

Summary: Need a crazy mode for the %-decoder

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-02-09 01:26 UTC by Ian 'Hixie' Hickson
Modified:	2013-04-09 19:24 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Ian 'Hixie' Hickson 2013-02-09 01:26:28 UTC

Fragment identifier parsing in the HTML spec has this crazy thing:

  Let decoded fragid be the result of expanding any sequences of percent-encoded 
  octets in fragid that are valid UTF-8 sequences into Unicode characters as 
  defined by UTF-8. If any percent-encoded octets in that string are not valid 
  UTF-8 sequences (e.g. they expand to surrogate code points), then skip this step 
  and the next one.

Any chance of getting an algorithm for this somehow?

http://www.whatwg.org/specs/web-apps/current-work/#the-indicated-part-of-the-document

Comment 1 Anne 2013-02-12 11:52:15 UTC

So the algorithm you want is:

1. Percent decode /input/ into /bytes/.

2. Run utf-8's decoder on /bytes/. If that emitted an encoder error, return input, otherwise return the result of running utf-8's decoder on /bytes/.

Isn't that simple enough to just put in the HTML specification?

Comment 2 Ian 'Hixie' Hickson 2013-02-13 00:25:57 UTC

Sure, I can do it in the HTML spec if you like.

Comment 3 contributor 2013-04-09 19:24:07 UTC

Checked in as WHATWG revision r7796.
Check-in comment: Update integration with URL spec and Encoding spec.
http://html5.org/tools/web-apps-tracker?from=7795&to=7796