27289 – Host parsing and utf8PercentDecode

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27289 - Host parsing and utf8PercentDecode

Summary: Host parsing and utf8PercentDecode

Status:	RESOLVED INVALID

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Sam Ruby
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Depends on:	25946
Blocks:
	Show dependency tree / graph

Reported:	2014-11-10 00:23 UTC by Sam Ruby
Modified:	2014-11-20 19:56 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Sam Ruby 2014-11-10 00:23:44 UTC

Consider the following URLs:

1) http://intertwingly.net/projects/pegurl/liveview.html#http://Â¡/
2) http://intertwingly.net/projects/pegurl/liveview.html#http://¡/
3) http://intertwingly.net/projects/pegurl/liveview.html#http://%C2%A1/

If you perform a utf-8 decode without BOM on the percent decoding of utf-8 encode on the third URL, you will end up with the second URL.  And indeed browsers treat these the two URLs the same.

Unfortunately, this is also true for the first URL by virtue of the fact that there are no percent signs to decode.  I say unfortunately as browsers do not treat these two URLs the same.

I propose that the spec text in step 3 of https://url.spec.whatwg.org/#host-parsing be changed to reference a utf8PercentDecode function, one that only UTF-8 decodes bytes that were produced by percent decoding.  One possible implementation of such a function can be found in:

http://intertwingly.net/projects/pegurl/url.js

Comment 1 Anne 2014-11-10 09:08:56 UTC

I don't understand why 1) and 2) would be treated the same. Could you elaborate?

Comment 2 Sam Ruby 2014-11-20 19:56:07 UTC

After a more careful reading of the spec, I've come to the conclusion that this bug is invalid.