rdf:text:rdf:PlainLiteral: A Datatype for Internationalized TextRDF Plain Literals

W3C Working Draft 21 AprilCandidate Recommendation 11 June 2009

This version:: ~~http://www.w3.org/TR/2009/WD-rdf-text-20090421/~~http://www.w3.org/TR/2009/CR-rdf-plain-literal-20090611/
Latest version:: ~~http://www.w3.org/TR/rdf-text/~~http://www.w3.org/TR/rdf-plain-literal/
Previous version:: ~~http://www.w3.org/TR/2008/WD-rdf-text-20081202/~~ ~~Authors:~~http://www.w3.org/TR/2009/WD-rdf-text-20090421/ (color-coded diff)

Editors:: Jie Bao, Rensselaer Polytechnic ~~Institute,~~ ~~Troy,~~ ~~New~~ ~~York,~~ ~~USA~~Institute; Sandro Hawke, W3C/MIT; Boris Motik, Oxford University; Peter F. Patel-Schneider, Bell Labs Research, Alcatel-Lucent; Axel Polleres, DERI Galway at the National University of ~~Ireland,~~ ~~Galway,~~Ireland

~~Boris~~ ~~Motik~~ , ~~Oxford~~ ~~University,~~ ~~Oxford,~~ UKThis document is also available in these non-normative formats: PDF version.

Abstract

This document presents the specification ~~for~~of a primitive datatype ~~representing~~ ~~internationalized~~ ~~text~~ ~~that~~ is ~~used~~ in ~~both~~for the ~~RIF~~ ~~and~~ ~~OWL~~ 2 ~~languages.~~plain literals of RDF.

Status of this Document

May Be Superseded

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Summary of Changes

This ~~Last~~ ~~Call~~ ~~Working~~ ~~Draft~~ ~~provides~~ ~~some~~ ~~significant~~document has undergone several changes since the previous version of 02 ~~December~~ ~~2008.~~21st April, 2009.

The ~~definition~~name of the ~~value~~ ~~space~~ ~~has~~ ~~been~~datatype was changed ~~such~~ ~~that~~ it is ~~not~~ ~~necessary~~ ~~any~~ ~~more~~from rdf:text to ~~reinterpret~~ ~~the~~ ~~value~~ ~~space~~ of ~~xsd:string~~rdf:PlainLiteral, to ~~make~~ it a ~~subset~~ ofclarify the ~~value~~ ~~space~~role and purpose of ~~rdf:text~~ .the ~~inference~~ ~~rules~~ ~~for~~datatype
The ~~RDF~~ ~~Semantics~~names of the builtins and their namespace were ~~added.~~changed to match the ~~requirement~~ ~~was~~ ~~added~~ ~~that~~ ~~abbreviated~~ ~~forms~~ ~~must~~ be ~~used~~change in ~~all~~ ~~RDF-based~~ ~~serialization.~~ ~~Last~~ ~~Call~~the ~~Working~~ ~~Group~~ ~~believes~~ it ~~has~~ ~~completed~~ ~~its~~ ~~design~~ ~~work~~ ~~for~~name of the ~~technologies~~ ~~specified~~ ~~this~~ ~~document,~~ sodatatype
The introduction and section 4 were rewritten to reframe this isdatatype as having a ~~"Last~~ ~~Call"~~ ~~draft.~~ ~~The~~ ~~design~~ is ~~not~~ ~~expected~~special relationship to ~~change~~ ~~significantly,~~ ~~going~~ ~~forward,~~RDF plain literals.
The notion of an entailment relationship between plain literals and rdf:PlainLiteral typed literals was removed, since rdf:PlainLiterals are now is ~~the~~ ~~key~~ ~~time~~ ~~for~~ ~~external~~ ~~review,~~ ~~before~~more clearly understood to not occur in RDF graph syntaxes.
The ~~implementation~~ ~~phase.~~characters used to delimit pairs was changed, since problems were reported with &lang; and &rang; in some browsers

Please Comment By 12 May30 July 2009

The OWL Working Group and the Rule Interchange Format (RIF) Working Group seek ~~public~~ ~~feedback~~ onto gather experience from implementations in order to increase confidence in the language and meet specific exit criteria. This ~~Working~~ ~~Draft.~~document will remain a Candidate Recommendation until at least 30 July 2009. After that date, when and if the exit criteria are met, the group intends to request Proposed Recommendation status.

Please send ~~your~~ ~~comments~~reports of implementation experience, and other feedback, to public-owl-comments@w3.org (public archive). If ~~possible,~~ ~~please~~ ~~offer~~ ~~specific~~ ~~changes~~ to ~~the~~ ~~text~~ ~~that~~ ~~would~~ ~~address~~ ~~your~~ ~~concern.~~ ~~You~~ ~~may~~ ~~also~~ ~~wish~~ to ~~check~~ ~~the~~ ~~Wiki~~ ~~Version~~Reports of ~~this~~ ~~document~~ ~~and~~ ~~see~~ ifany success or difficulty with the ~~relevant~~ ~~text~~ ~~has~~ ~~already~~ ~~been~~ ~~updated.~~test cases are encouraged. Open discussion among developers is welcome at public-owl-dev@w3.org (public archive).

No Endorsement

Publication as a ~~Working~~ ~~Draft~~Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1 Introduction
2 Preliminaries
3 Definition of the ~~rdf:text~~rdf:PlainLiteral Datatype
4 ~~Relationship~~ ~~with~~ ~~Plain~~Syntax for rdf:PlainLiteral Literals
~~and~~ ~~xs:string~~5 Functions on ~~rdf:text~~rdf:PlainLiteral Data Values
6 Acknowledgments
7 References
8 Changes Since Last Call

1 Introduction

~~Many~~ ~~RDF~~The Resource Description Framework [RDF] ~~applications~~ ~~need~~ a ~~mechanism~~ ~~for~~ ~~representing~~ ~~text~~ in ~~various~~ ~~different~~ ~~languages,~~ ~~retrieving~~ ~~the~~ ~~text~~ ~~written~~ in a ~~specific~~ ~~language,~~ ~~and~~ ~~other~~ ~~kinds~~is defined to have an extensible system of ~~language-specific~~ ~~processing.~~typed literals, based on XML Schema datatypes [XSD], and also to ~~facilitate~~ ~~this,~~ ~~RDF~~ ~~provides~~have plain literals ~~with~~ a ~~language~~ ~~tag~~ , ~~which~~ ~~form~~ ~~the~~ ~~basis~~ ~~for~~ ~~processing~~ ~~text~~ in ~~different~~ ~~languages~~. In ~~RDF.~~ ~~Apart~~ ~~from~~ ~~such~~ ~~literals,~~ ~~however,~~the RDF ~~also~~ ~~provides~~ ~~for~~specification, plain literals ~~without~~ a ~~language~~ ~~tag~~ ~~and~~differ from typed literals . ~~RDF~~ ~~thus~~ ~~provides~~ ~~three~~ ~~distinct~~ ~~types~~ of ~~literals~~ ~~each~~ of ~~which~~ is ~~treated~~in that plain literals have no datatype and can optionally have a ~~separate~~ ~~way,~~ ~~which~~ ~~increases~~ ~~complexity~~language tag, indicating the natural language of the content. (See Tags for ~~specifications~~ ~~based~~ onIdentifying Languages [BCP 47]). These options for expressing RDF literals complicate specifications which interact with RDF, such as RIF and OWL. Furthermore, RDF does not provide a name for the set of all plain literals, which, for example, prevents one from stating in RDFS or OWL that the range of some ~~OWL~~property must be a plain ~~literal~~ ~~with~~ a ~~language~~ ~~tag.~~ To ~~address~~ ~~these~~ ~~deficiencies,~~literal.

In response, this specification ~~defines~~introduces a datatype called ~~rdf:text~~ . ~~This~~rdf:PlainLiteral. The datatype ~~provides~~ a ~~name~~ ~~for~~is in the ~~set~~"rdf:" namespace because it refers to parts of the conceptual model of RDF. This extension, however, does not change that conceptual model, and thus does not affect specifications that depend on it such as SPARQL [SPARQL]. The value space of rdf:PlainLiteral consists of all data values assigned to ~~plain~~ ~~literals,~~ ~~which~~ is ~~why~~ ~~the~~ ~~datatype~~ ~~uses~~ ~~the~~ ~~rdf:~~ ~~prefix.~~ ~~Furthermore,~~ ~~typed~~ ~~rdf:text~~ ~~literals~~ ~~are~~ ~~semantically~~ ~~equivalent~~ toRDF plain literals, which allows ~~specifications~~ ~~built~~ on ~~top~~ ofRDF applications to ~~consider~~ ~~only~~ ~~typed~~ ~~literals.~~ ~~Since~~ ~~the~~ ~~rdf:text~~ ~~datatype~~ ~~just~~ ~~provides~~ ~~additional~~ ~~forms~~ ~~for~~ ~~writing~~explicitly refer to this set (e.g., in rdfs:range assertions).

Because RDF plain ~~literals,~~ ~~its~~ ~~addition~~ ~~does~~ ~~not~~ ~~change~~ ~~the~~ ~~semantics~~literals are already a part of ~~RDF.~~ ~~Furthermore,~~ ~~when~~ ~~exchanging~~RDF ~~graphs~~ ~~between~~and SPARQL syntaxes, rdf:PlainLiteral literals are written as RDF ~~tools,~~ ~~typed~~ ~~rdf:text~~plain literals ~~must~~ be ~~replaced~~in RDF and SPARQL syntaxes.

As with plain literals, ~~thus~~ ~~maximizing~~ ~~interoperability~~ ~~between~~ ~~RDF~~ ~~tools~~ ~~that~~ ~~support~~ ~~rdf:text~~ ~~and~~ ~~those~~ ~~that~~ do ~~not.~~ ~~RDF~~ ~~tools~~ ~~may~~ ~~use~~ ~~other~~ ~~mechanisms~~this datatype can associate language tags with unicode strings, but it does not provide its own facilities for representing ~~text~~ in ~~different~~ ~~languages,~~natural language utterances. Unicode bidirectional control characters [BIDI] may be used within these literals, like all other unicode characters. (Richer, XML-based representations such as XHTML [XHTML] and Ruby annotations [RUBY] can be expressed using the ~~xml:lang~~ ~~attribute~~ on ~~the~~ ~~data~~ ~~values~~ of ~~the~~rdf:XMLLiteral ~~datatype.~~ ~~The~~ ~~rdf:text~~ ~~datatype~~ ~~does~~ ~~not~~ ~~provide~~ a ~~replacement~~ ~~for~~ ~~such~~ ~~mechanisms.~~datatype.)

2 Preliminaries

A character is an atomic unit of text. Each character has a Universal Character Set (UCS) code point [ISO/IEC 10646] (or, equivalently, a Unicode code point [UNICODE]) that MUST match the Char production from XML [XML] thus ensuring compatibility with XML Schema Datatypes, version 1.1 [XML Schema Datatypes]. Code points are sometimes represented in this document as U+ followed by a four-digit hexadecimal value of the code point.

A string is a finite sequence of zero or more characters. The length of a string is the number of characters in it. Strings are written in this specification by enclosing them in double quotes. Two strings are identical if and only if they contain exactly the same characters in exactly the same sequence.

UCS [ISO/IEC 10646] and Unicode [UNICODE] provide for 1,114,112 different code points. The Char production from XML [XML], however, excludes the surrogate code points and the code points U+FFFE and U+FFFF. Thus, rdf:textrdf:PlainLiteral provides a total of 1,112,033 different characters. This number is important, as it can affect the satisfiability of an OWL 2 ontology. Consider the following example:

Functional-Style Syntax:
ClassAssertion( a:i MinCardinality( n a:property DatatypeRestriction( xs:string xs:length 1 ) ) )

This OWL 2 axiom states that the individual a:i is connected by the property a:property to at least n different strings of length one. The number of such strings is limited to 1,112,033 by the above definitions, so this ontology is satisfiable if and only if n is smaller than or equal to 1,112,033.

A language tag is a string matching the langtag production from BCP 47 [BCP 47]. Furthermore, note that this definition corresponds to the well-formed rather than the valid class of conformance in BCP 47. A language tag MAY contain subtags that are not registered in the IANA Language Subtag Registry, although an rdf:textrdf:PlainLiteral implementation MAY also choose to reject such invalid language tags.

The language tag "en-fubar" is not registered with the IANA Language Subtag Registry, so an rdf:textrdf:PlainLiteral implementation is allowed to reject it. This string, however, matches the langtag production from BCP 47, so it is a perfectly valid language tag for the purpose of this specification. Consequently, the value space of rdf:textrdf:PlainLiteral (see Section 3 for its definition) contains, say, the pair ⟨< "some string" , "en-fubar" ⟩ .>.

This specification uses Uniform Resource Identifiers (URIs) for naming datatypes and their components, which are defined in RFC 3986 [RFC 3986]. For readability, URIs prefixes are often abbreviated by a short prefix name according to the convention of RDF [RDF]. The following prefix names are used throughout this document:

the prefix name xs: stands for http://www.w3.org/2001/XMLSchema#
the prefix name rdf: stands for http://www.w3.org/1999/02/22-rdf-syntax-ns#

The ~~prefix~~ ~~name~~ ~~fn:~~names of the built-in functions defined in Section 5 are QNames, as defined in the XML namespaces specification [XML Namespaces]. The following namespace abbreviations are used in Section 5:

fn stands for ~~http://www.w3.org/2005/xpath-functions#~~the prefix name rtfn:http://www.w3.org/2005/xpath-functions namespace
plfn stands for ~~http://www.w3.org/2009/rdf-text-functions#~~the http://www.w3.org/2009/rdf-PlainLiteral-functions namespace

Whether an expression of the form pr:ln denotes an abbreviated URI or a QName should be clear from the context: only the names of the built-in functions in Section 5 are QNames; all other such expressions denote abbreviated URIs.

Datatypes are defined in this document along the lines of XML Schema Datatypes [XML Schema Datatypes]. Each datatype is identified by a URI and is described by the following components:

The value space is a set determining the set of values of the datatype. Elements of the value space are called data values.
The lexical space is a set of strings that can be used to refer to data values. Each member of the lexical space is called a lexical form, and it is mapped to a particular data value.
The facet space is a set of facet pairs of the form ⟨( F v ⟩), where F is a URI called a constraining facet, and v is an arbitrary data value called a constraining value. Each such facet pair is mapped to a subset of the value space of the datatype.

A plain literal is a string with an optional language tag [RDF]. A plain literal without a language tag is interpreted in an RDF interpretation by itself. A plain literal with a language tag iscan be written as "abc"@langTag"abc"@langTag, and itis interpreted in an RDF interpretation as a pair ⟨ ~~"abc"~~< "abc" , "langTag" ⟩ ."langTag" >.

A typed literal consists of a string and a datatype URI [RDF ], it is] and can be written as "abc"^^datatypeURI. Given an RDF datatype identified by datatypeURI, ~~and~~ it is ~~interpreted~~ inan RDF ~~interpretation~~datatyped-interpretation that includes the datatype interprets the typed literal as the data value that the datatype ~~identified~~ by ~~datatypeURI~~assigns to the lexical form "abc".

The italicized keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY specify certain aspects of the normative behavior of tools implementing this specification, and are interpreted as specified in RFC 2119 [RFC 2119].

3 Definition of the `rdf:textrdf:PlainLiteral` Datatype

The datatype identified by the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#texthttp://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral (abbreviated rdf:textrdf:PlainLiteral) is defined as follows.

Value Space. The value space of rdf:textrdf:PlainLiteral consists of

all strings, and
all pairs of the form ⟨ ~~"abc"~~ , ~~"lc-langtag"~~ ⟩< "abc" , "lc-langtag" > where "abc""abc" is a string and "lc-langtag""lc-langtag" is a lowercase language tag.

Lexical Space. An rdf:textrdf:PlainLiteral lexical form is a string of the form "abc@langTag""abc@langTag" where "abc""abc" is an arbitrary (possibly empty) string, and "langTag""langTag" is either the empty string or a (not necessarily lowercase) language tag. Each such lexical form is mapped to a data value dv as follows:

If "langTag""langTag" is empty, then dv is equal to the string "abc""abc" and
If "langTag""langTag" is not empty, then dv is equal to the pair ⟨ ~~"abc",~~ ~~"lc-langtag"~~ ⟩< "abc", "lc-langtag" > where "lc-langtag""lc-langtag" is "langTag""langTag" normalized to lowercase.

The following table shows several rdf:textrdf:PlainLiteral lexical forms and their corresponding data values.

Lexical form	Corresponding data value
`"Family Guy@en"`	⟨< `"Family Guy" , "en"` ⟩>
`"Family Guy@EN"`	⟨< `"Family Guy" , "en"` ⟩>
`"Family Guy@FOX@en"`	⟨< `"Family Guy@FOX" , "en"` ⟩>
`"Family Guy@"`	`"Family Guy"`
`"Family Guy@FOX@"`	`"Family Guy@FOX"`

The following table shows several of strings that are not rdf:textrdf:PlainLiteral lexical forms.

String	The reason for not being an `rdf:textrdf:PlainLiteral` lexical form
`"Family Guy"`	does not contain at least one `@` (U+0040) character
`"Family Guy@12"`	`"12"` is not a language tag according to BCP 47

Facet Space. The facet space of rdf:textrdf:PlainLiteral is defined as shown in Table 1.

Table 1. The Facet Space of `rdf:textrdf:PlainLiteral`
A facet pair `⟨( F v ⟩)` is in the facet space of `rdf:textrdf:PlainLiteral` if...	Each such facet pair is mapped to the subset of the value space of `rdf:textrdf:PlainLiteral` containing...
...`F` is `xs:length`, `xs:minLength`, `xs:maxLength`, `xs:pattern`, `xs:enumeration`, or `xs:assertions` and `⟨( F v ⟩)` is in the facet space of `xs:string`.	...all strings of the form `"abc""abc"` and all pairs of the form ⟨ ~~"abc"~~ , ~~"lc-langtag"~~ ⟩< `"abc" , "lc-langtag"` > such that `"abc""abc"` is contained in the subset of the value space of `xs:string` determined by `⟨( F v ⟩)` as specified by XML Schema Datatypes [XML Schema Datatypes].
...`F` is `rdf:langRange` and `v` is an extended language range as specified in Section 2.2 of [RFC4647].	...all pairs of the form ⟨ ~~"abc"~~ , ~~"lc-langtag"~~ ⟩< `"abc" , "lc-langtag"` > such that `"lc-langtag""lc-langtag"` matches `v` under extended filtering as specified in Section 3.3.2 of [RFC4647].

The facet xs:length can be used to refer to a subset of strings of a particular length regardless of whether they have a language tag or not. Thus, the subset of the value space of rdf:textrdf:PlainLiteral corresponding to the facet pair ⟨( xs:length 3 ⟩) contains the string "abc", as well as the pairs ⟨< "abc" , "en" ⟩> and ⟨< "abc" , "de" ⟩ .>.

The facet rdf:langRange can be used to refer to a subset of strings containing the language tag. Note that the language range need not be in lowercase, and that the matching algorithm is case-insensitive. Thus, the subset of the value space of rdf:textrdf:PlainLiteral corresponding to the facet pair ⟨( rdf:langRange "de-DE" ⟩) contains the pairs ⟨< "abc" , "de-de" ⟩> and ⟨< "abc" , "de-de-1996" ⟩> (because these match the language range "de-DE" according to RFC 4647), but not the string "abc" (because it is not a pair with a language tag) or the pairs ⟨< "abc" , "de-deva" ⟩> and ⟨< "abc" , "de-latn-de" ⟩> (because these do not match the language range "de-DE" according to RFC 4647).

The facet pair ⟨( rdf:langRange "*" ⟩) is mapped to the subset of the value space of rdf:textrdf:PlainLiteral containing all pairs of the form ⟨ ~~"abc"~~ , ~~"lc-langtag"~~ ⟩ .< "abc" , "lc-langtag" >. In languages such as OWL 2, this can be used to specify that a data value must contain the language tag.

4 Relationship with PlainSyntax for rdf:PlainLiteral Literals

~~and~~ ~~xs:string~~It follows from the ~~definition~~ of ~~rdf:text~~ ~~has~~ ~~several~~ ~~important~~ ~~consequences.~~above that in datatyped interpretations that include the rdf:PlainLiteral datatype, the value space of ~~rdf:text~~rdf:PlainLiteral contains exactly all data values assigned to plain literals (with or without a language ~~tag)~~ in an ~~RDF~~ ~~interpretation.~~ ~~Thus,~~tag). The ~~rdf:text~~rdf:PlainLiteral datatype ~~essentially~~ ~~just~~thus provides an explicit way of referring to this set.

~~The~~ ~~value~~ ~~space~~ of ~~rdf:text~~ ~~contains~~ ~~the~~ ~~value~~ ~~space~~ of ~~xs:string~~ , as ~~well~~ as of ~~all~~ ~~XML~~ ~~Schema~~ ~~datatypes~~ ~~derived~~ ~~from~~ ~~xs:string~~ . ~~Typed~~ ~~rdf:text~~ ~~literals~~ ~~are~~ ~~semantically~~ ~~equivalent~~To ~~plain~~ ~~literals~~eliminate another source of syntactic redundancy and ~~typed~~ ~~xs:string~~ ~~literals~~ as ~~shown~~ in ~~Table~~ 2. ~~Thus,~~ in ~~each~~ ~~RDF~~ ~~graph,~~ ~~one~~ ~~can~~ ~~replace~~to retain a ~~literal~~ ~~from~~ ~~the~~ ~~first~~ ~~column~~large degree of ~~Table~~ 2interoperability with applications that do not understand the ~~corresponding~~ ~~literal~~ ~~from~~ ~~the~~ ~~second~~ ~~column~~ ~~and~~ ~~vice~~ ~~versa~~ ~~without~~ ~~affecting~~rdf:PlainLiteral datatype, the ~~semantic~~ ~~meaning~~form of ~~the~~ ~~RDF~~ ~~graph.~~ ~~Table~~ 2. ~~Correspondence~~ ~~between~~rdf:PlainLiteral literals ~~"abc@langTag"^^rdf:text~~ ~~<=>~~ ~~"abc"@langTag~~ ~~"abc@"^^rdf:text~~ ~~<=>~~ ~~"abc"~~ ~~"abc@"^^rdf:text~~ ~~<=>~~ ~~"abc"^^xs:string~~in syntaxes for RDF ~~implementations~~ ~~based~~ on ~~the~~ ~~entailment~~ ~~rules~~ ~~from~~ ~~Section~~ 7 of ~~the~~ ~~RDF~~ ~~Semantics~~ [ ~~RDF~~ ~~Semantics~~ ], ~~this~~ ~~equivalence~~ ~~can~~ be ~~achieved~~ by ~~means~~ of ~~the~~ ~~entailment~~ ~~rules~~ ~~shown~~ in ~~Table~~ 3. ~~These~~ ~~are~~ ~~analogous~~ to ~~rules~~ ~~xsd~~ 1agraphs and ~~xsd~~ 1b offor SPARQL is the ~~RDF~~ ~~Semantics~~ [ ~~RDF~~ ~~Semantics~~ ] ~~that~~ ~~establish~~ ~~semantic~~ ~~equivalence~~ ~~between~~ ~~typed~~ ~~xs:string~~ ~~literals~~ ~~and~~already existing syntax for the corresponding plain ~~literals~~ ~~without~~ a ~~language~~ ~~tag.~~ No ~~rule~~ is ~~necessary~~ to ~~establish~~literal, not the ~~correspondence~~ ~~between~~syntax for a typed ~~rdf:text~~ ~~literals~~ ~~and~~literal. Therefore, typed ~~xs:string~~ ~~literals,~~literals with rdf:PlainLiteral as ~~this~~ is ~~achieved~~ ~~indirectly~~ ~~via~~ ~~xsd~~ 1a , ~~xsd~~ 1b , ~~and~~the ~~rules~~ ~~shown~~datatype are considered by this specification to be not valid in ~~Table~~ 3. ~~Table~~ 3. ~~RDF~~ ~~Entailment~~ ~~Rules~~syntaxes for ~~rdf:text~~ ~~rdft~~ 1a ~~uuu~~ ~~aaa~~ ~~"abc"~~ . ~~uuu~~ ~~aaa~~ ~~"abc@"^^rdf:text~~ . ~~rdft~~ 1b ~~uuu~~ ~~aaa~~ ~~"abc@"^^rdf:text~~ . ~~uuu~~ ~~aaa~~ ~~"abc"~~ . ~~rdft~~ 2a ~~uuu~~ ~~aaa~~ ~~"abc"@langTag~~ . ~~uuu~~ ~~aaa~~ ~~"abc@langTag"^^rdf:text~~ . ~~rdft~~ 2b ~~uuu~~ ~~aaa~~ ~~"abc@langTag"^^rdf:text~~ . ~~uuu~~ ~~aaa~~ ~~"abc"@langTag~~ . ~~Despite~~ ~~the~~ ~~semantic~~ ~~equivalence~~ ~~between~~ ~~typed~~ ~~rdf:text~~ ~~literals~~RDF graphs or SPARQL.

To implement this design and provide this interoperability, applications that employ this datatype MUST use plain ~~literals,~~ ~~the~~ ~~presence~~literals (instead of rdf:PlainLiteral typed ~~rdf:text~~literals) whenever a syntax for plain literals in an ~~RDF~~ ~~graph~~ ~~might~~ ~~cause~~ ~~interoperability~~ ~~problems~~ ~~between~~ ~~RDF~~ ~~tools,~~is provided, such as ~~not~~ ~~all~~ ~~RDF~~ ~~tools~~ ~~will~~ ~~support~~ ~~rdf:text~~ . ~~Therefore,~~ ~~before~~ ~~exchanging~~ an ~~RDF~~ ~~graph~~ ~~with~~ ~~other~~ ~~RDF~~ ~~tools,~~ anin existing syntaxes for RDF ~~tool~~ ~~that~~ ~~suports~~ ~~rdf:text~~ ~~MUST~~ ~~replace~~graphs and SPARQL results.

Additionally, systems may need similar restrictions for non-syntactic public interfaces. For instance, in ~~the~~extended SPARQL basic graph ~~each~~ ~~typed~~ ~~rdf:text~~ ~~literal~~ ~~with~~ ~~the~~ ~~corresponding~~ ~~plain~~ ~~literal.~~matching, the ~~notion~~results of matching SPARQL basic graph ~~exchange~~ ~~includes,~~ ~~but~~ is ~~not~~ ~~limited~~ ~~to,~~ ~~the~~ ~~process~~ of ~~serializing~~patterns in an entailment regime that understands rdf:PlainLiteral MUST provide variable bindings in existing RDF ~~graph~~ ~~using~~ ~~any~~ ~~(normative~~ or ~~nonnormative)~~ ~~RDF~~ ~~syntax.~~plain literal form.

5 Functions on `rdf:textrdf:PlainLiteral` Data Values

This section defines functions that construct and operate on rdf:textrdf:PlainLiteral data values. The terminology used and the way in which these functions are described are in accordance with the XQuery 1.0 and XPath 2.0 Functions and Operators [XPathFunc]. Each function is identified by a QName [XML Namespaces]. The error codes used in this section are given in Appendix G of the XPath 2.0 specification [XPath20] and Appendix C of XQuery and XPath function specification [XPathFunc].

5.1 Functions for Assembling and Disassembling `rdf:textrdf:PlainLiteral` Data Values

5.1.1 `rtfn:text-from-string rtfn:text-from-string( $arg1 as xs:string ) as rdf:text rtfn:text-from-string(plfn:PlainLiteral-from-string-lang`

plfn:PlainLiteral-from-string-lang( $arg1 as xs:string, $arg2 as xs:string) as  rdf:textrdf:PlainLiteral

Summary: returns the data value ⟨< $arg1, lowercase($arg2) ⟩> if $arg2 is present, and returns the data value $arg1 otherwise. Both arguments must be of type xs:string or one of its subtypes, and $arg2 — if present — must be a (nonempty) language tag; otherwise, this function raises type error err:FORG0006. Note that, since in the ~~lexical~~ ~~forms~~value space of rdf:text requirerdf:PlainLiteral language ~~tags~~ to beare in lowercase, this function converts $arg2 to lowercase.

5.1.2 `rtfn:string-from-text rtfn:string-from-text(plfn:string-from-PlainLiteral`

 plfn:string-from-PlainLiteral( $arg as  rdf:text)rdf:PlainLiteral) as xs:string

Summary: returns the string part s ~~from~~ ~~the~~ ~~argument~~if $arg , ~~which~~ ~~must~~ be an ~~rdf:text~~is a rdf:PlainLiteral data value of the form ⟨< s, l ⟩> or of the form s. If $arg is not of type rdf:textrdf:PlainLiteral, this function raises type error err:FORG0006.

5.1.3 `rtfn:lang-from-text rtfn:lang-from-text(plfn:lang-from-PlainLiteral`

 plfn:lang-from-PlainLiteral( $arg as  rdf:textrdf:PlainLiteral ) as xs:lang

Summary: returns the language tag l if $arg is an rdf:textrdf:PlainLiteral data value of the form ⟨< s, l ⟩ ,>, and returns the empty string if $arg is an rdf:textrdf:PlainLiteral data value of the form s. If $arg is not of type rdf:textrdf:PlainLiteral, this function raises type error err:FORG0006.

5.2 The Comparison of `rdf:textrdf:PlainLiteral` Data Values

The notion of collations used in this section is taken from Section 7.3.1 of XPath and XQuery function specification [XPathFunc].

5.2.1 `rtfn:compare rtfn:compare(plfn:compare`

 plfn:compare( $comparand1  as  rdf:text?,rdf:PlainLiteral?, $comparand2 as  rdf:text?rdf:PlainLiteral? ) as xs:integer?

  rtfn:compare(plfn:compare( $comparand1  as  rdf:text?,rdf:PlainLiteral?, $comparand2 as  rdf:text?,rdf:PlainLiteral?, $collation as xs:string )  as xs:integer?

Summary: if either $comparand1 or $comparand2 is not of type rdf:textrdf:PlainLiteral, of if $collation is specified but is not of type xs:string, this function raises type error err:FORG0006. Otherwise, the function returns the empty sequence if one of the arguments is empty, if one of $comparand1 and $comparand2 has a language tag and the other one does not, or if the language parts of $comparand1 and $comparand2 are unequal; otherwise, this function returns -1, 0, or 1 depending on whether the value of the string-part of $comparand1 (or $comparand1 itself, respectively, if it has no language tag) is respectively less than, equal to, or greater than the value of the string-part of $comparand2 (or $comparand2 itself, respectively, if it has no language tag). The collation used by the invocation of this function is determined according to the rules in Section 7.3.1 of the XPath and XQuery functions specification [XPathFunc].

The first version of this function backs up the XQuery operators "eq", "ne", "gt", "lt", "le", and "ge" on rdf:textrdf:PlainLiteral values.

Feature At Risk #1: rtfn:compareplfn:compare

The final version of this specification might not include rtfn:compareplfn:compare, or it might contain an alternative solution: since xs:string values are rdf:textrdf:PlainLiteral data values, the fn:compare function from XPath/XQuery might be extended to cover rdf:textrdf:PlainLiteral values.

Please send feedback to public-owl-comments@w3.org.

The two functions may be viewed as declared XQuery functions with the following definitions:

declare function   rtfn:compare(plfn:compare( $comparand1 as  rdf:text?,rdf:PlainLiteral?, $comparand2 as  rdf:text?rdf:PlainLiteral? ) as xs:integer?
 {
  return
    if ( fn:empty($comparand1) ) then $comparand1
    else if ( fn:empty($comparand2) ) then $comparand2
    else if ( fn:compare (   rtfn:lang-from-text(plfn:lang-from-PlainLiteral( $comparand1 ),  rtfn:lang-from-text(plfn:lang-from-PlainLiteral( $comparand2 ) ) = 0 ) then
       fn:compare (   rtfn:string-from-text(plfn:string-from-PlainLiteral( $comparand1  ) , rtfn:string-from-text(), plfn:string-from-PlainLiteral( $comparand2 ) )
 }

declare function   rtfn:compare(plfn:compare( $comparand1  as  rdf:text?,rdf:PlainLiteral?, $comparand2 as  rdf:text?rdf:PlainLiteral? $collation as xs:string ) as xs:integer?
 {
  return
   if ( fn:empty($comparand1) ) then $comparand1
   else if ( fn:empty($comparand2) ) then $comparand2
   else if ( fn:compare (  fn:lang-from-text(plfn:lang-from-PlainLiteral( $comparand1 ),    rtfn:lang-from-text(plfn:lang-from-PlainLiteral( $comparand2 ) ) = 0 ) then
       fn:compare (   rtfn:string-from-text(plfn:string-from-PlainLiteral( $comparand1 ) ,   rtfn:string-from-text(plfn:string-from-PlainLiteral( $comparand2 ), $collation)
 }

5.3 Other Functions on `rdf:textrdf:PlainLiteral` Data Values

5.3.1 `rtfn:length rtfn:length($argplfn:length`

 plfn:length($arg as  rdf:text)rdf:PlainLiteral) as xs:integer

Summary: returns the number of characters in the string part s if $arg is an rdf:textrdf:PlainLiteral data value of the form ⟨< s, l ⟩> or a string value s, respectively. If $arg is not of type rdf:textrdf:PlainLiteral, this function raises type error err:FORG0006.

Feature At Risk #2: rtfn:lengthplfn:length

The final version of this specification might not include rtfn:lengthplfn:length, or it might contain an alternative solution: since xs:string values are rdf:textrdf:PlainLiteral data values, the fn:string-length function from XPath/XQuery might be extended towards coverage of rdf:textrdf:PlainLiteral values.

Please send feedback to public-owl-comments@w3.org.

This function may be viewed as a declared XQuery function with the following definition:

declare function   rtfn:text-length($argplfn:length($arg as  rdf:text?)rdf:PlainLiteral?) as xs:integer
 {
  return
     fn:string-length (   rtfn:string-from-text(plfn:string-from-PlainLiteral( $arg ) )
 }

5.3.2 `rtfn:matches-language-range rtfn:matches-language-range($argplfn:matches-language-range`

 plfn:matches-language-range($arg as  rdf:text?,rdf:PlainLiteral?, $range as xs:string) as xs:boolean

Summary: This function is only defined if $arg is a sequence of length 0 or 1 of literals of type rdf:text srdf:PlainLiteral and $range is of type xs:string; if the parameters do not satisfy these typing conditions, the function raises a type error err:FORG0006. If the typing conditions are fulfilled, the function returns true in case $arg is an rdf:textrdf:PlainLiteral data value of the form ⟨< s, l ⟩> with l a language tag that matches the extended language range $range as specified by the extended filtering algorithm for "Matching of Language Tags" [BCP-47]; otherwise, it returns false. This means that the function returns false if the argument is a string rdf:textrdf:PlainLiteral data value. An empty input sequence is treated as a rdf:textrdf:PlainLiteral data value consisting of the empty string, and accordingly on such input this function also returns false.

6 Acknowledgments

The RIF WGand ~~the~~OWL WGWorking Groups made parallel efforts to ~~support~~ ~~strings~~ ~~written~~ in ~~different~~ ~~languages.~~ ~~This~~ ~~specification~~ is ~~the~~ ~~outcome~~ ofsupport strings with associated language tags, as found in RDF. This specification is the outcome of a collaboration between the two groups, and it is based on the work on the datatypes rif:text and owl:internationalizedString.

In addition to members and chairs of both Working Groups, the editors would like to thank Addison Phillips, C. Michael Sperberg-McQueen, Eric Prud'hommeaux, Andy Seaborne, and Pat Hayes, along with other participants of the public-rdf-text mailing list, for their assistance in working out the details of this specification.

7 References

[BCP 47]: BCP 47 - Tags for Identifying Languages. A. Phillips and M. Davis, eds. IETF, September 2006. http://www.rfc-editor.org/rfc/bcp/bcp47.txt
[BIDI]: Unicode controls vs. markup for bidi support, Richard Ishida. Retrieved 11 June 2009 from http://www.w3.org/International/questions/qa-bidi-controls.
[ISO/IEC 10646]: ISO/IEC 10646-1:2000. Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane and ISO/IEC 10646-2:2001. Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 2: Supplementary Planes, as, from time to time, amended, replaced by a ~~collaboration~~ ~~between~~new edition or expanded by the ~~two~~ ~~groups,~~addition of new parts. [Geneva]: International Organization for Standardization. ISO (International Organization for Standardization).
[RDF Concepts]: Resource Description Framework (RDF): Concepts and it is ~~based~~ on ~~the~~ ~~work~~ on ~~the~~ ~~rif:text~~ ~~datatype~~ on ~~the~~ ~~RIF~~ ~~side~~Abstract Syntax. Graham Klyne and ~~the~~ ~~owl:internationalizedString~~ ~~datatype~~ on ~~the~~ ~~OWL~~ ~~side.~~ A ~~short~~ ~~description~~ of ~~the~~ ~~design~~ ~~process~~ isJeremy J. Carroll, eds. W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/. Latest version available ~~here~~as http://www.w3.org/TR/rdf-concepts/.
[RDF Semantics]: RDF Semantics. 7 ~~References~~Patrick Hayes, ed., W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/. Latest version available as http://www.w3.org/TR/rdf-mt/.
[RFC 2119]: RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Network Working Group, S. Bradner. ~~Internet~~ ~~Best~~ ~~Current~~ ~~Practice,~~IETF, March ~~1997.~~ ~~[RFC~~ ~~3986]~~1997, http://www.ietf.org/rfc/rfc2119.txt
[RFC 3986 -]: RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. T. Berners-Lee, R. Fielding, and L. ~~Masinter,~~Masinter. IETF, January ~~2005.~~2005, http://www.ietf.org/rfc/rfc3986.txt
[RFC 4647]: RFC 4647 - Matching of Language Tags. A. Phillips and M. Davis, IETF, September 2006.
[Ruby]: Ruby Annotation, M. Sawicki, M. Ishikawa, M. J. Dürst, T. Texin, M. Suignard, Editors, W3C Recommendation, 31 May 2001, http://www.w3.org/TR/2001/REC-ruby-20010531 . Latest version available at http://www.w3.org/TR/ruby/ .
[SPARQL]: SPARQL Query Language for RDF. Eric Prud'hommeaux and Andy Seaborne, eds. W3C Recommendation, 15 January 2008, http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/. Latest version available as http://www.w3.org/TR/rdf-sparql-query/.
[UNICODE]: The Unicode Standard. Unicode The Unicode Consortium, Version 5.1.0, ISBN 0-321-48091-0, as updated from time to time by the publication of new versions. (See http://www.unicode.org/unicode/standard/versions for the latest ~~version~~ ~~and~~ ~~additional~~ ~~information~~ on ~~versions~~ of ~~the~~ ~~standard~~ ~~and~~ of ~~the~~ ~~Unicode~~ ~~Character~~ ~~Database)."~~ ~~[ISO/IEC~~ ~~10646]~~ ~~ISO/IEC~~ ~~10646-1:2000.~~ ~~Information~~ ~~technology~~ — ~~Universal~~ ~~Multiple-Octet~~ ~~Coded~~ ~~Character~~ ~~Set~~ ~~(UCS)~~ — ~~Part~~ 1: ~~Architecture~~ ~~and~~ ~~Basic~~ ~~Multilingual~~ ~~Plane~~ ~~and~~ ~~ISO/IEC~~ ~~10646-2:2001.~~ ~~Information~~ ~~technology~~ — ~~Universal~~ ~~Multiple-Octet~~ ~~Coded~~ ~~Character~~ ~~Set~~ ~~(UCS)~~ — ~~Part~~ 2: ~~Supplementary~~ ~~Planes,~~ ~~as,~~ ~~from~~ ~~time~~ to ~~time,~~ ~~amended,~~ ~~replaced~~ by a ~~new~~ ~~edition~~ or ~~expanded~~ by ~~the~~ ~~addition~~ of ~~new~~ ~~parts.~~ ~~[Geneva]:~~ ~~International~~ ~~Organization~~ ~~for~~ ~~Standardization.~~ ~~ISO~~ ~~(International~~ ~~Organization~~ ~~for~~ ~~Standardization).~~ ~~[BCP~~ ~~47]~~ ~~BCP-47~~ - ~~Tags~~ ~~for~~ ~~Identifying~~ ~~Languages~~ . A. ~~Phillips,~~ M. ~~Davis,~~ ~~eds.,~~ ~~IETF,~~ ~~September~~ ~~2006,~~ ~~http://www.rfc-editor.org/rfc/bcp/bcp47.txt~~ . ~~[RDF]~~ ~~Resource~~ ~~Description~~ ~~Framework~~ ~~(RDF):~~ ~~Concepts~~version and ~~Abstract~~ ~~Syntax~~ . ~~Graham~~ ~~Klyne,~~ ~~Jeremy~~ J. ~~Carroll,~~additional information on versions of the standard and ~~Brian~~ ~~McBride,~~ ~~eds.,~~ ~~W3C~~ ~~Recommendation~~ 10 ~~February~~ ~~2004.~~ ~~[RDF~~ ~~Semantics]~~ ~~RDF~~ ~~Semantics~~ . ~~Patrick~~ ~~Hayes,~~ ~~ed.,~~ ~~W3C~~ ~~Recommendation~~ ~~2004~~ ~~[XML]~~of the Unicode Character Database)."
[XHTML]: XHTML™ 1.0 The Extensible HyperText Markup Language ~~(XML)~~(Second Edition), S. Pemberton, Editor, W3C Recommendation, 1 August 2002, http://www.w3.org/TR/2002/REC-xhtml1-20020801 . Latest version available at http://www.w3.org/TR/xhtml1 .
[XML Namespaces]: Namespaces in XML 1.0 ~~(Fifth~~(Second Edition). Tim Bray, ~~Jean~~ ~~Paoli,~~ C. M. ~~Sperberg-McQueen,~~ ~~Eve~~ ~~Maler,~~Dave Hollander, Andrew Layman, and ~~François~~ ~~Yergeau,~~ ~~eds.,~~Richard Tobin, eds. W3C ~~Recommendation~~ 26 ~~November~~ ~~2008.~~Recommendation, 16 August 2006, http://www.w3.org/TR/2006/REC-xml-names-20060816/. Latest version available as http://www.w3.org/TR/REC-xml-names/.
[XML Schema Datatypes]: W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. D.David Peterson, S.Shudi Gao, A.Ashok Malhotra, C. M. Sperberg-McQueen, H.and Henry S. Thompson, ~~eds.,~~eds. W3C ~~Working~~ ~~Draft~~Candidate Recommendation, 30 April 2009, http://www.w3.org/TR/2009/CR-xmlschema11-2-20090430/. Latest version available as http://www.w3.org/TR/xmlschema11-2/.
[XML]: Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau, eds. W3C Recommendation, 26 November 2008, http://www.w3.org/TR/2008/REC-xml-20081126/. Latest version available as http://www.w3.org/TR/xml/.
[XPathFunc]: XQuery 1.0 and XPath 2.0 Functions and Operators. Ashok Malhotra, Jim Melton, and Norman Walsh, eds. W3C Recommendation 23 January ~~2009.~~2007.
[XPath20]: XML Path Language (XPath) 2.0. Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernández, Michael Kay, Jonathan Robie, and Jérôme Siméon, eds. W3C Recommendation 23 January 2007.

[XPathFunc] XQuery 1.08 Changes Since Last Call

Since the last call draft of 21 April 2009, the following changes have been made:

The name of the datatype was changed from rdf:text to rdf:PlainLiteral, to clarify the role and ~~XPath~~ ~~2.0~~ ~~Functions~~purpose of the datatype
The names of the builtins and ~~Operators~~ . ~~Ashok~~ ~~Malhotra,~~ ~~Jim~~ ~~Melton,~~their namespace were changed to match the change in the name of the datatype
The introduction and ~~Norman~~ ~~Walsh,~~ ~~eds.~~ ~~W3C~~ ~~Recommendation~~ 23 ~~January~~ ~~2007.~~section 4 were rewritten to reframe this datatype as having a special relationship to RDF plain literals.
The notion of an entailment relationship between plain literals and rdf:PlainLiteral typed literals was removed, since rdf:PlainLiterals are now more clearly understood to not occur in RDF graph syntaxes.
The characters used to delimit pairs was changed, since problems were reported with &lang; and &rang; in some browsers

rdf:text:rdf:PlainLiteral: A Datatype for Internationalized TextRDF Plain Literals

W3C Working Draft 21 AprilCandidate Recommendation 11 June 2009

Abstract

Status of this Document

May Be Superseded

Summary of Changes

Please Comment By 12 May30 July 2009

No Endorsement

Patents

Table of Contents

1 Introduction

2 Preliminaries

3 Definition of the rdf:textrdf:PlainLiteral Datatype

4 Relationship with PlainSyntax for rdf:PlainLiteral Literals

5 Functions on rdf:textrdf:PlainLiteral Data Values

5.1 Functions for Assembling and Disassembling rdf:textrdf:PlainLiteral Data Values

5.1.1 rtfn:text-from-string rtfn:text-from-string( $arg1 as xs:string ) as rdf:text rtfn:text-from-string(plfn:PlainLiteral-from-string-lang

5.1.2 rtfn:string-from-text rtfn:string-from-text(plfn:string-from-PlainLiteral

5.1.3 rtfn:lang-from-text rtfn:lang-from-text(plfn:lang-from-PlainLiteral

5.2 The Comparison of rdf:textrdf:PlainLiteral Data Values

5.2.1 rtfn:compare rtfn:compare(plfn:compare

5.3 Other Functions on rdf:textrdf:PlainLiteral Data Values

5.3.1 rtfn:length rtfn:length($argplfn:length

5.3.2 rtfn:matches-language-range rtfn:matches-language-range($argplfn:matches-language-range