ISSUE-75: Valid plain literals containing #x0 are ill-typed in RDF 1.1
#x0
Valid plain literals containing #x0 are ill-typed in RDF 1.1
- State:
- CLOSED
- Product:
- RDF Concepts
- Raised by:
- Richard Cyganiak
- Opened on:
- 2011-08-19
- Description:
- The lexical space of xsd:string doesn't cover all Unicode strings.
I assume we will end up referring to XSD 1.1 for the definition of xsd:string [1]. That document leaves it up to implementations whether they support the XML 1.0 or XML 1.1; accordingly, the definition of allowed characters in an xsd:string is [2] or [3].
The more permissive one from XML 1.1:
Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This excludes #x0, Unicode codepoint U+0000. XML 1.0 also excludes a number of other control codes in the #x0-#x1F range.
The definition of “lexical form†in RDF 2004 [4] says “Unicode stringâ€, which according to [5] includes *all* codepoints including the control codes.
So, any string that includes #x0 was a valid untagged plain literal in RDF 2004. In RDF 1.1, it will be typed as an xsd:string, and thus will be an ill-typed literal.
(On the other hand, such strings could never be serialized in RDF/XML or XHTML+RDFa; they were serializable only in N-Triples and Turtle.)
Is this a problem? Can we go ahead with the new literal design despite this restriction? Should we acknowledge it in the RDF Concepts spec?
[1] http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/datatypes.html#string
[2] http://www.w3.org/TR/REC-xml/#dt-character
[3] http://www.w3.org/TR/xml11/#NT-Char
[4] http://www.w3.org/TR/rdf-concepts/#dfn-lexical-form
[5] http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf - Related Actions Items:
- No related actions
- Related emails:
- RE: Agenda: JSON-LD Telecon - Tuesday, July 2nd 2013 (from markus.lanthaler@gmx.net on 2013-07-02)
- RE: sandro's review of json-ld-api (from markus.lanthaler@gmx.net on 2013-03-29)
- Re: Status update on LC comments and post-LC changes to R2RML (from richard@cyganiak.de on 2011-11-07)
- Status update on LC comments and post-LC changes to R2RML (from richard@cyganiak.de on 2011-11-07)
- Re: RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from richard@cyganiak.de on 2011-08-21)
- Re: RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from ivan@w3.org on 2011-08-20)
- RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from sysbot+tracker@w3.org on 2011-08-19)
Related notes:
Proposed to resolve as a sentence in RDF Concepts Section 5 and possibly in the RDF Primer or another document motivating changes from RDF 1.0.
David Wood, 13 Oct 2011, 18:01:49Richard will put a statement about this into rdf concepts.
Also:
<Scott_Bauer> sandro: we put it in rdf concepts now?
<AlexHall> how many implementors validate xsd:strings right now?
<iand> we could write a negative test case: :x :y "\u0000" .
<iand> ask implementors to try that test and see if they handle it
<Scott_Bauer> letting cygri create the action item?
<cygri> ACTION: cygri to add a note to RDF Concepts re ISSUE-75
* trackbot noticed an ACTION. Trying to create it.
* RRSAgent records action 10
<trackbot> Created ACTION-107 - Add a note to RDF Concepts re ISSUE-75 [on Richard Cyganiak - due 2011-10-20].
<sandro> gavin_: This wasn't a problem pre-turtle because no syntax could express it.
<Scott_Bauer> davidwood: Ian's says it should be a test case
<Scott_Bauer> gavinc: it can't be expressed in n-triples
<Scott_Bauer> sandro: it's a syntax error -- you expect it to fail
<iand> it can be expressed in ntriples (as above) but it is just datatype invalid
<Scott_Bauer> topic: issue 76
<cygri> sandro++
<Scott_Bauer> sandro: close issue 75 first
<iand> If i can write "x"^^xsd:int then I can write "\u0000"^^xsd:string
Display change log