20602 – [QT3TS] fn-resolve-uri-32

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20602 - [QT3TS] fn-resolve-uri-32

Summary: [QT3TS] fn-resolve-uri-32

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 3.1 (show other bugs)
Version:	Working drafts
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	20672
	Show dependency tree / graph

Reported:	2013-01-08 16:29 UTC by Tim Mills
Modified:	2016-12-16 19:55 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Tim Mills 2013-01-08 16:29:08 UTC

I believe that 

  <assert-string-value>http://www.example.com/this%20doc.html</assert-string-value>

may also be a possible expected result.

Comment 1 Michael Kay 2013-01-08 18:54:57 UTC

OK. The LEIRI spec says spaces SHOULD NOT be %-encoded, but it doesn't say MUST NOT, so I will allow this result.

Comment 2 Tim Mills 2013-01-08 21:45:57 UTC

Thanks.  Are you basing you response on the text

"Conversion from a LEIRI to an IRI or a URI must be performed only when absolutely necessary and as late as possible in a processing chain. In particular, neither the process of converting a relative LEIRI to an absolute one nor the process of passing a LEIRI to a process or software component responsible for dereferencing it should trigger percent-encoding."

or something else?

Comment 3 Michael Kay 2013-01-08 22:26:28 UTC

Yes, that's the text I was relying on. Given that we are dealing with a relative LEIRI that is not an IRI, the advice

the process of converting a relative LEIRI to an absolute one ... should [not] trigger percent-encoding

seems to cover this situation rather precisely.

Comment 4 Tim Mills 2013-01-10 10:31:00 UTC

Confirmed fixed.  Thanks.

I'd argue that because resolve-uri returns a new URI, the implementation is at liberty to return a URI, IRI or LEIRI regardless of the input.  Perhaps this could be clarified in the specification?  It's not _just_ converting from a relative to an absolute URI.

Comment 5 Michael Kay 2015-07-31 09:54:12 UTC

I'm re-opening this as a spec bug, because the final comment suggests that clarifications to the spec are needed. Also, the agreed resolution for this test contradicts the expected result of various XSLT tests including type-functions-0304 and resolve-uri-022, and I don't want to argue for a change to those tests unless and until the spec is clarified.

The spec for resolve-uri currently says 

<quote>
The function resolves the relative IRI reference $relative against the base IRI $base using the algorithm defined in [RFC 3986], adapted by treating any ·character· that would not be valid in an RFC3986 URI or relative reference in the same way that RFC3986 treats unreserved characters. No percent-encoding takes place.
</quote>

This seems to unequivocally say that if there is a space in the input, and if the implementation chooses to accept this (as a LEIRI), then there must be a space in the output, not a "%20". The argument in this bugzilla thread that LEIRI permits escaping seems vacuous, because the resolve-uri() spec does not reference LEIRI for how relative URI resolution is performed. Perhaps it should.

Comment 6 Tim Mills 2015-09-03 09:25:06 UTC

On the one hand, I'm happy to withdraw my original request in Comment #0.

On the other hand, the specification says 

"In addition, the implementation may accept some or all strings that conform to the rules for (absolute or relative) Legacy Extended IRI references as defined in [Legacy extended IRIs for XML resource identification]. "

One way in which an implementation might accept a LEIRI is to convert it to an IRI as per "4 Conversion of Legacy Extended IRIs to IRIs" [1] before proceeding with the rules 1 to 4 which are always stated as operating on IRIs.


[1] http://www.w3.org/TR/leiri/

Comment 7 Josh Spiegel 2015-09-08 15:46:48 UTC

DECISION: The way URI resolution is done for a LEIRI is implementation-defined and may involve use of percent-encoding to translate the LEIRI into a valid IRI.

Comment 8 Michael Kay 2015-09-09 11:06:33 UTC

I'm going to push back on that decision. 

(1) We currently define (implicitly) an algorithm for resolving a relative LEIRI: "using the algorithm defined in [RFC 3986], adapted by treating any ·character· that would not be valid in an RFC3986 URI or relative reference in the same way that RFC3986 treats unreserved characters". (This directly reflects the way RFC 3987 defines URI resolution for IRIs).

(2) This algorithm does not trigger percent-encoding.

(3) This is consistent with what the LEIRI spec says (see http://www.w3.org/TR/leiri/ (section 4)): 

In particular, neither the process of converting a relative LEIRI to an absolute one nor the process of passing a LEIRI to a process or software component responsible for dereferencing it should trigger percent-encoding.

Tim suggests "One way in which an implementation might accept a LEIRI is to convert it to an IRI as per "4 Conversion of Legacy Extended IRIs to IRIs" [1] before proceeding with the rules 1 to 4 which are always stated as operating on IRIs." - but that is inconsistent with the LEIRI spec, which says "Conversion from a LEIRI to an IRI or a URI must be performed only when absolutely necessary and as late as possible in a processing chain."


I don't think the algorithm is completely implementation-defined, and the LEIRI spec, while it doesn't define an algorithm for URI resolution, says that percent-encoding SHOULD NOT take place.

So my proposal is instead to add a Note as follows:

RFC3986 defines an algorithm for resolving relative references in the context of the URI syntax defined in that RFC. RFC3987 describes a modification to that algorithm to make it applicable to IRIs (specifically: additional characters permitted in an IRI are handled the same way that RFC3986 handles unreserved characters). The LEIRI specification does not explicitly define a resolution algorithm, but suggests that it SHOULD NOT be done by converting the LEIRI to a URI, and SHOULD NOT involve percent-encoding. This specification fills this gap by defining resolution for LEIRIs in the same way that RFC3986 defines resolution for IRIs, that is by specifying that additional characters are handled as unreserved characters.

Comment 9 Michael Kay 2015-09-09 11:42:25 UTC

Correction, in the last paragraph replace the last occurrence of RFC3986 with RFC3987.

Comment 10 Michael Kay 2015-09-15 17:30:33 UTC

The change in comments 8/9 was accepted, and has been applied to the spec. The test case has been modified accordingly.