Warning:
This wiki has been archived and is now read-only.
StringLiterals/LanguageTaggedStringDatatypeProposal
This is a proposal for addressing the following time-permitting item from the charter:
Reconcile various forms of string literals: at the moment we have plain literals, rdf:plainLiteral, and xsd:string literals. They are very very close to one another but they are officially different. In practice this means that, eg, SPARQL queries have to have a three branch UNION to handle all of these. Worth looking at some sort of a reconciliation of these.
This is ISSUE-12.
Contents
Short summary
- Abolish plain literals
- Use xsd:string instead of untagged ones
- Use a new “special datatype” rdf:LanguageTaggedString for tagged ones
- The lexical form of rdf:LanguageTaggedString is not a string like for normal datatypes, but 〈string,langtag〉 pairs
- "foo" and "foo"@en and corresponding forms in other concrete syntaxes are syntactic sugar for the above, and preferred
Details
1. Untagged plain literals are removed from the abstract syntax; an xsd:string typed plain literal is used instead.
2. In concrete syntaxes, the "foo" form SHOULD be used instead of "foo"^^xsd:string. (“SHOULD” for backward compatibility.)
3. Tagged plain literals are removed from the abstract syntax as well.
4. Instead, a new “special datatype” is introduced for tagged string literals only.
5. Let's provisionally call it rdf:LanguageTaggedString for now. A shorter name should be found.
6. Unlike normal datatypes, the lexical space of rdf:LanguageTaggedString is not "lexicalform" strings, but 〈string,langtag〉 pairs. Its value space is the set of 〈string,langtag〉 pairs too, and its L2V mapping is the identity mapping.
7. In concrete syntaxes, the "foo"@en form MUST be used for literals of type rdf:LanguageTaggedString.
8. rdf:PlainLiteral remains as it is -- not to be used as syntax (concrete or abstract).
Some corollaries
9. It's ok to use rdf:LanguageTaggedString and rdf:PlainLiteral in rdfs:range statements. This should probably be documented somewhere, at least in the RDFS spec.
10. In SPARQL, datatype("foo") is now xsd:string without the need for an exception in the spec
11. In SPARQL, datatype("foo"@en) is now rdf:LanguageTaggedString (with a note that legacy implementations might return error)
12. The value space of rdf:PlainLiteral is the union of the value spaces of xsd:string and rdf:LanguageTaggedString.
Comparison of current RDF and proposal
Literals in current RDF | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Kind of literal | Concrete syntaxes | Abstract syntax | Value | ||||||||
Concrete syntax form | Allowed? | Ttl | NT | Spq | SRX | RDFa | R/X | Abstract syntax form | Allowed? | ||
Strings without language tag |
"foo" | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo" | Unicode string | ||
"foo"^^xsd:string | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo"^^xsd:string | ||||
"foo@"^^rdf:PlainLiteral | MUST NOT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo@"^^rdf:PlainLiteral | MUST NOT | ||
Strings with langauge tag |
"foo"@en | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo"@en | <Unicode string, langauge tag> | ||
"foo@en"^^rdf:PlainLiteral | MUST NOT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo@en"^^rdf:PlainLiteral | MUST NOT | ||
Integer numbers | 1 | ✓ | ✓ | "1"^^xsd:integer | Number | ||||||
"1"^^xsd:integer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Decimal numbers | 1.0 | ✓ | ✓ | "1.0"^^xsd:decimal | |||||||
"1.0"^^xsd:decimal | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Booleans | true | ✓ | ✓ | "true"^^xsd:boolean | Boolean value | ||||||
"true"^^xsd:boolean | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Other literals | "lexical"^^datatype | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "lexical"^^datatype | Depends on L2V mapping of datatype |
Blue italics indicate changes between current RDF and new proposal.
Literals in the new proposal | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Kind of literal | Concrete syntaxes | Abstract syntax | Value | ||||||||
Concrete syntax form | Allowed? | Ttl | NT | Spq | SRX | RDFa | R/X | Abstract syntax form | Allowed? | ||
Strings without language tag |
"foo" | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo"^^xsd:string | Unicode string | ||
"foo"^^xsd:string | SHOULD NOT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
"foo@"^^rdf:PlainLiteral | MUST NOT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo@"^^rdf:PlainLiteral | MUST NOT | ||
Strings with langauge tag |
"foo"@en | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | <"foo",@en>^^rdf:LangTaggedString | <Unicode string, langauge tag> | ||
"???"^^rdf:LangTaggedString | impossible, no lexical form defined | ||||||||||
"foo@en"^^rdf:PlainLiteral | MUST NOT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "foo@en"^^rdf:PlainLiteral | MUST NOT | ||
Integer numbers | 1 | ✓ | ✓ | "1"^^xsd:integer | Number | ||||||
"1"^^xsd:integer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Decimal numbers | 1.0 | ✓ | ✓ | "1.0"^^xsd:decimal | |||||||
"1.0"^^xsd:decimal | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Booleans | true | ✓ | ✓ | "true"^^xsd:boolean | Boolean value | ||||||
"true"^^xsd:boolean | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Other literals | "lexical"^^datatype | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | "lexical"^^datatype | Depends on L2V mapping of datatype |
Discussion etc
- Naming proposals: rdf:LanguageTaggedString, rdf:Text, …
- …
There should be some language to the effect that "foo" is preferred, simply for ergonomic reasons. I phrased this as a SHOULD in the proposal. Weaker language might be sufficient in the general case. Or maybe expressing this preference is altogether unnecessary.
Some syntaxes have use cases that are hampered by the variability introduced by syntactic sugar. N-Triples and SPARQL Results XML/JSON, mostly. I think these syntaxes should make a stronger statement in their respective syntax spec. Perhaps forbid one of the forms when serializing. Which one doesn't really matter.