RDF/XML Syntax Specification (Revised)

This W3C Working Draft revises the specification of the XML syntax of RDF as originally described in RDF Model & Syntax. This document presents the syntax as amended and clarified by the RDF Core Working Group with the specification now based on the XML Information Set along with mapping rules for creating RDF models as described in the RDF Model Theory W3C Working Draft.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a W3C Working Draft for the RDF Core Working Group produced as part of the W3C Semantic Web Activity. It incorporates decisions made by the Working Group updating the XML syntax for RDF from the original RDF Model & Syntax ([RDFMS]) document and includes a re-representing of the syntax in terms of the XML Information Set with rules for generation of RDF models.

This document is being released for review by W3C members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations. This is the current state of an ongoing work on the syntax and mapping process and may not yet record all of the work in the grammar section of the original document.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

1 Introduction

This document describes the XML ([XML]) syntax for RDF as originally defined in the RDF Model & Syntax ([RDFMS]) W3C Recommendation. Subsequent implementations of this syntax and comparison of the resulting RDF models have shown that there was ambiguity - implementations generated different models and certain syntax forms were not widely implemented. These issues were generally made as either feedback to the www-rdf-comments@w3.org (archive) or from discussions on the RDF Interest Group list www-rdf-interest@w3.org (archive) .

The RDF Core Working Group is chartered to respond to the need for a number of fixes, clarifications and improvements to the specification of RDF's abstract model and XML syntax. The working group invites feedback from the developer community on the effects of its proposals on existing implementations and documents.

Several decisions including amendments and deletions to the grammar are referred to below. The definitive record of the decisions is the RDF Core WG issues list.

This document re-represents the original EBNF grammar in terms of the XML Information Set ([INFOSET]) items which moves from the rather low-level details, such as particular forms of empty elements. This allows the grammar to be more precisely recorded and the mapping from the XML syntax to the RDF model more clearly shown. The mapping to the RDF model (a graph) is done by emitting statements in the form defined in the N-Triples section of RDF Test Cases ([RDF-TESTS]) Working Draft which creates an RDF model, that has semantics defined by RDF Model Theory ([RDF-MODEL]) Working Draft.

This document illustrates one way to create triples from the XML - any other method that results in the same RDF graph may be used.

2 An XML syntax for RDF

The RDF Model Theory ([RDF-MODEL]) provides a formal description of RDF. This can be thought of as a graph consisting of nodes and arcs. The node describe resources that can be labelled with URIs, string literals or are blank. The arcs connect the nodes and are all labelled with URIs. This graph is more precisely called a directed edge-labelled graph; each edge is an arc with a direction (an arrow) connecting two nodes. These edges can be described as triples of subject node, at the blunt end of the arrow/arc, property arc and an object node at the sharp end of the arrow/arc. The property arc is also interpreted as an attribute, relationship or predicate of the resource with a value given by the object node content.

In order to encode the graph in XML, the nodes and arcs are turned into XML elements, attributes, element content and attribute values. The URI labels for properties and object nodes are written in XML via XML Namespaces ([XML-NS]) which gives a namespace URI for a short prefix along with namespace-qualified elements and attributes names called local names. The (namespace URI, local name) pair are chosen such that concatenating them forms the original node URI. The URIs labelling subject nodes are stored in XML attribute values. The nodes labelled by string literals (which are always object nodes) become element text content or attribute values.

This transformation turns paths in the graph of the form Node, Arc, Node, Arc, Node, Arc, ... into sequences of elements inside elements. This results in a striping when the elements are written down; alternating between node elements and property elements. The Node at the start of the sequence is always a subject node and turns into a containing element called an rdf:Description that is written at the top level of RDF/XML, below the XML document element (in this case rdf:RDF). So the chains of stripes start at the top of an RDF/XML document and always begin with nodes.

For example, here is a graph written as ASCII saying "there exists a document (this one) with a title, RDF/XML Syntax Specification (Revised)" and "this document has an editor, the editor has a name "Dave Beckett" and a home page http://purl.org/net/dajobe/. [URI] is used for a node with a URI, [] for a blank node, and --[property]--> is used for an arc.

Which consists of some nodes with known URIs that can be filled in and others that remain blank:

There are several abbreviations that can be used to make very common uses more easy to write down. It is typical for the same resource to be described with multiple properties and values at the same time, so multiple child elements can be put inside rdf:Description, all of which are properties of that node.

When the property value is a string it can be encoded more simply as an XML attribute and value, as an attribute of the node element. This is known as a property attribute.

Another very common use is when a node is an instance of a class with rdf:type relationship, usually called a typed node. This shorthand is done by replacing the rdf:Description element name with the namespaced-element corresponding to the URI of the value of the type relationship.

The above forms the basis of the RDF/XML syntax and although there are some other abbreviated forms, such as for generating the RDF list properties and for skipping having to write down a blank element node, which breaks the striping but is useful for, amongst other uses, encoding properties with multiple-values.

3 Data Model

This syntax operates on an XML document as a sequence of nodes in document order in the style of [XPATH] Information Set Mapping serialised into document-order. The resulting nodes are intended to be similar to the events that are produced by the [SAX2] XML API. This model is conceptual only and does not mandate any implementation method; in particular [XPATH] is not required.

The syntax does not support non-well-formed XML documents, nor documents that otherwise don't have an XML Information Set; for example, that don't conform to XML Namespaces W3C Recommendation ([XML-NS]).

This specification requires an information set as defined in [INFOSET] which supports at least the following information items and properties:

This specification does not require any destructive alterations to the input information set; no items are added, removed or modified..

This section is intended to satisfy the requirements for Conformance in the [INFOSET] specification.

There are six types of node defined in the following subsections. Most nodes are constructed from an Infoset information item (except for Identifier). The effect of a node constructor is to create a new node with a unique identity, distinct from all other nodes. Nodes have properties, and all have the string-value property that may be part of the node or computed from the string-value of contained nodes.

3.1 Root Node

3.2 Element Node

Created from an Element Information Item and takes the following properties and their values from the element information item: local-name, namespace-name, children, attributes and parent. When this node is created from such values, the URI property is defined with a string value of the concatenation of the value of the namespace-name property and the value of the local-name property. On creation the li-counter property is added with initial integer value 1.

The subject property may be added and takes the value of an Identifier node. This is used on elements that deal with one node in the RDF model, this generally being the subject of a statement.

3.3 End Element Node

Takes no properties but marks the end of the containing element in the sequence.

3.4 Attribute Node

Created from an Attribute Information Item and takes the properties local-name, namespace-name and owner element and their values from respective element information item properties. When this node is created from such values, two properties and values are defined. Firstly the string-value property is defined with the normalized value as specified by [XML]. An attribute whose normalized value is a zero-length string is not treated specially: it results in an attribute node whose string-value is a zero-length string. Secondly the URI property is defined with a string value of the concatenation of the value of the namespace-name property and the value of the local-name property.

3.5 Text Node

Created from a sequence of one or more consecutive Character Information Items. Has the single property string-value which has the value of the string made from concatenating the character code property of each of the character information items. [NOTE: Identical to XPath.]

3.6 Identifier Node

The string-value property is defined from the other properties as follows: If identifier-type is "URI" then the value is the concatenation of "<", the value of the identifier property and ">". If identifier-type is "bnodeID" then the value is the concatenation of "_:" and the value of the identifier property.

3.7 Information Set Mapping

To transform the Infoset into the sequence of nodes, each information item is transformed as described above to generate a tree of nodes with properties and values. Each element node is then replaced as described below to turn the tree of nodes into a sequence in document order.

3.8 The RDF Namespace

The RDF Namespace URI is http://www.w3.org/1999/02/22-rdf-syntax-ns# and is typically used in XML with the prefix rdf although this is not required. The namespace contains the following names only:

Throughout this document the terminology rdf:name will be used to indicate name is from the RDF namespace and it has a URI of the concatenation of the RDF Namespace URI and name. For example, rdf:type has the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#type

3.9 Identifiers

The RDF model uses three types of identifiers (or labels) for nodes and arcs in the graph - absolute URI references, literals and unlabelled or blank nodes. The latter are given local identifiers in the N-Triples serialisation of the model in order to represent the graph correctly. These identifiers can be generated and must match the name production in N-Triples.

The URI references can be either given as absolute URIs, relative URIs that have to be resolved from the document URI, or constructed. The constructed URIs in RDF are either made from XML Namespace qualified element or attributes names (QNames) or from the value of rdf:ID or rdf:bagID attribute values.

XML QNames give URIs by concatenating the namespace URI and the XML local name. For example, if the XML Namespace prefix foo has URI http://example.org/somewhere/ then the QName foo:bar would correspond to the URI http://example.org/somewhere/bar. Note that this restricts which URIs can be made and the same URI can be given in multiple ways.

The rdf:ID and rdf:bagID values generate URIs by considering them as equivalent to the relative URI "#" concatenated with the attribute value. This can then be resolved relative to the document URI to give the absolute URI.

4 Notation

4.1 Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 ([KEYWORDS]).

4.2 Grammar Notation

Notation for nodes and grammar EBNF.
Notation	Meaning
property=value	A node property with a given value
node.property	Returns the value of the given node property
root(prop1=value1, prop2=value2, ...)	A root node with properties
start_element(prop1=value1, prop2=value2, ...) children end_element()	A sequence of element node with properties, a possibly empty list of nodes as element content and an end element node
attribute(prop1=value1, prop2=value2, ...)	An attribute node with properties
identifier(prop1=value1, prop2=value2, ...)	An identifier node with properties
text()	A text node
base-uri	The value of the base-uri property of the root node
list(item1, item2, ...); list()	An ordered list of items in document order; an empty list
set(item1, item2, ...); set()	An unordered set of items; an empty set
*	Zero or more of preceding term
?	Zero or one of preceding term
+	One or more of preceding term
A \| B \| ...	The A, B, ... terms are alternatives.
A - B	The term A but not the term B
"ABC"	A string of characters A, B, C in order.
concat(A, B, ..)	A string created by concatenating the terms in order.
anyURI	Any legal URI.
anyString	Any string.
rdf:X	See section 3.8

4.3 Notation Forms

5 RDF/XML Grammar

5.1 Grammar start

If the RDF/XML is a standalone XML content, then the grammar starts with Root Node doc.

If the content is known to be RDF/XML by context, such as when RDF/XML is embedded inside other XML content, then the grammar can either start at Element Node RDF (only when an element is legal at that point in the XML) or at production nodeElementList (only when element content is legal, since this is a list of elements). For such embedded RDF/XML, the base-uri value must be initialised from the containing XML since no Root Node will be available. Note that if such embedding occurs, the grammar may be entered several times but no state is expected to be preserved.

5.2 Production doc

5.3 Production RDF

5.4 Production nodeElementList

5.5 Production nodeElement

The processing of some of the attributes have to be done before other work such as dealing with children nodes or other attributes. These can be processed in any order:

If an attribute a with a.URI = rdf:bagID is present, create a new node n = identifier(identifier=concat(base-uri, "#", a.string-value), identifier-type="URI") and add the following statement to the model:

Then for all statements generated above (except the immediately previous statement) are reified with node n using the reification rules in section 5.26.

5.6 Production ws

5.7 Production propertyEltList

5.8 Production propertyElt

If element e has e.URI = rdf:li then apply the list expansion rules on element e.parent in section 5.27 to give a new URI u and set the value of e.URI to be u.

5.9 Production resourcePropertyElt

For element e, and the single contained nodeElement n the following statement is added to the model:

5.10 Production literalPropertyElt

For element e, and the text node t the following statement is added to the model:

5.11 Production parseTypeLiteralPropertyElt

For element e and the literal l, if l is empty then the statement object value is "" and the following statement is added to the model:

5.12 Production parseTypeResourcePropertyElt

Generate a local blank node identifier i and use it to create a new node n with the value of identifier(identifier=i, identifier-type="bnodeID").

If the element content c is not an empty, then use node n to create a new sequence of nodes as follows:

5.13 Production parseTypeOtherPropertyElt

The processing of rdf:parseType string values other than "Resource" or "Literal" is currently to treat the content as if it were "Literal". Processing MUST then continue at production parseTypeLiteralPropertyElt.

5.14 Production emptyPropertyElt

Choose one of the following combinations of allowed attributes. Note in particular that rdf:ID and rdf:resource are alternatives, or both can be omitted and furthermore that bagID cannot be used when there are no propertyAttr given.

5.15 Production idAttr

Note that the names used as values of rdf:ID and rdf:bagID attributes must be unique in a single RDF/XML document since they come from the same set of names.

5.16 Production aboutAttr

5.17 Production bagIdAttr

Note that the names used as values of rdf:ID and rdf:bagID attributes must be unique in a single RDF/XML document since they come from the same set of names.

5.18 Production propertyAttr

5.19 Production resourceAttr

5.20 Production parseLiteral

5.21 Production parseResource

5.22 Production parseOther

5.23 Production URI-reference

5.24 Production literal

5.25 Production rdf-id

5.26 Reification Rules

5.27 List Expansion Rules

For the given element e, generate a new URI u with value concat("http://www.w3.org/1999/02/22-rdf-syntax-ns#_", e.li-counter) property, increment the value of the e.li-counter property by 1 and return u.

6 Serialising an RDF Graph to RDF/XML

It is not possible for all graphs that can be expressed in the RDF Model Theory ([RDF-MODEL]) to be encoded in this syntax. If you do a round trip from RDF/XML to RDF graph and then back to RDF/XML the meaning will be the same but don't expect the RDF/XML that comes out to be exactly the same.

The basic serialisation is recommended for applications in which the output RDF/XML is to be used only in further RDF processing. Where the intent is for the output RDF/XML file to be read by people, the basic serialisation proves unsatisfactory. The basic serialisation does not conform to more restricted sub-dialects of RDF, such as RSS[RSS] or CC/PP[CC/PP]. Hence, it is not appropriate for such applications, for which dialect specific serialisers are needed.

If more human readable output is needed the following factors should be considered:

It is not possible to use the RDF/XML serialisation for serialising an RDF graph in which any triple has a property label which cannot be expressed as a XML namespace-qualified name (QName).

An approach to serialising RDF/XML using the full grammar in a top-down recursive descent fashion is discussed in [UNPARSING].

7 Acknowledgments (Informative)

8 References

Normative References

Informational References

Appendix A: Issues affecting RDF/XML Syntax (Non-Normative)

This section records local issues to be resolved and issues that were reported to the RDF Core WG related to the XML syntax and their disposition. This section is not the definitive list or description of the latter - see the RDF Core WG issues list. Decided issues may also have associated test cases which can be found in the RDF Test Cases W3C Working Draft.

A.1: Document Issues / Tasks (Non-Normative)

A.2: RDF Core WG Open Issues affecting RDF/XML Syntax (Non-Normative)

The resolution texts here are suggestions only and not agreed by the working group.

A.3: RDF Core WG Decided Issues affecting RDF/XML Syntax (Non-Normative)

A.4: RDF Core WG Postponed Issues affecting RDF/XML Syntax (Non-Normative)

B Syntax Schemas (Non-Normative)

Two schema language authors submitted schemas for RDF/XML based on the revised grammar in the previous version of this draft. We include pointers to these schemas for information purposes and an example schema; they are not part of this specification.

B.1 RELAX NG Schema - Non XML (Non-Normative)

RELAX NG Schema (Non-XML) for RDF/XML

#
# RELAX NG Schema (non-XML) for RDF/XML Syntax
#
# This schema is for information only and NON-NORMATIVE
#
# It is based on one originally written by James Clark in
# http://lists.w3.org/Archives/Public/www-rdf-comments/2001JulSep/0248.html
# and updated with later changes.
#

namespace local = ""
namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

start = doc
doc = 
  RDF

RDF =
  element rdf:RDF { nodeElementList }

nodeElementList = 
  nodeElement*

  # Should be something like:
  #  ws* , (  nodeElement , ws* )*
  # but RELAXNG does this by default, ignoring whitespace separating tags.

nodeElement =
  element * - (local:*
               |rdf:RDF
	       |rdf:ID|rdf:about
	       |rdf:bagID|rdf:parseType|rdf:resource
               |rdf:li ) {
      (idAttr | aboutAttr )?, bagIdAttr?, propertyAttr*, propertyEltList
  }

  # FIXME: Not sure if it is possible to say "and not things
  # beginning with _ in the rdf: namespace".

ws = 
  " "

  # Not used in this RELAX NG schema; but should be any legal XML
  # whitespace defined by http://www.w3.org/TR/2000/REC-xml-20001006#NT-S


propertyEltList = 
  propertyElt*

  # Should be something like:
  #  ws* , ( propertyElt , ws* )*
  # but RELAXNG does this by default, ignoring whitespace separating tags.

propertyElt = 
  resourcePropertyElt | 
  literalPropertyElt | 
  parseTypeLiteralPropertyElt |
  parseTypeResourcePropertyElt |
  parseTypeOtherPropertyElt |
  emptyPropertyElt

resourcePropertyElt = 
  element * - (local:*
	       |rdf:RDF|rdf:Description
	       |rdf:ID|rdf:about
	       |rdf:bagID|rdf:parseType|rdf:resource) {
      idAttr?, nodeElement
  }

literalPropertyElt =
  element * - (local:*
               |rdf:RDF|rdf:Description
	       |rdf:ID|rdf:about
	       |rdf:bagID|rdf:parseType|rdf:resource) {
      idAttr?, text 
  }

parseTypeLiteralPropertyElt = 
  element * - (local:*
               |rdf:RDF|rdf:Description
               |rdf:ID|rdf:about
               |rdf:bagID|rdf:parseType|rdf:resource) {
      idAttr?, parseLiteral, literal 
  }

parseTypeResourcePropertyElt = 
  element * - (local:*
               |rdf:RDF|rdf:Description
               |rdf:ID|rdf:about
               |rdf:bagID|rdf:parseType|rdf:resource) {
      idAttr?, parseResource, propertyEltList
  }

parseTypeOtherPropertyElt = 
  element * - (local:*
               |rdf:RDF|rdf:Description
               |rdf:ID|rdf:about
               |rdf:bagID|rdf:parseType|rdf:resource) {
      idAttr?, parseOther, any
  }

emptyPropertyElt =
   element * - (local:*
                |rdf:RDF|rdf:Description
                |rdf:ID|rdf:about
		|rdf:bagID|rdf:parseType|rdf:resource) {
       (idAttr | resourceAttr)?, bagIdAttr?, propertyAttr* 
   }

idAttr = 
  attribute rdf:ID { 
      IDsymbol 
  }

aboutAttr = 
  attribute rdf:about { 
      URI-reference 
  }

bagIdAttr = 
  attribute rdf:bagID {
      IDsymbol
  }

propertyAttr = 
  attribute * - (local:* 
                 |rdf:RDF|rdf:Description
                 |rdf:ID|rdf:about
		 |rdf:bagID|rdf:parseType|rdf:resource
		 |rdf:li) {
      string
  }

resourceAttr = 
  attribute rdf:resource {
      URI-reference 
  }

parseLiteral = 
  attribute rdf:parseType {
      "Literal" 
  }

parseResource = 
  attribute rdf:parseType {
      "Resource" 
  }

parseOther = 
  attribute rdf:parseType {
      text
  }

URI-reference = 
  string

literal =
  any

IDsymbol = 
  xsd:NMTOKEN

any =
  mixed { element * { attribute * { text }*, any }* }

B.2 Other Syntax Schemas (Non-Normative)

Two schema language authors submitted schemas for RDF/XML based on the new grammar in the previous version of this draft. We include pointers to these schemas for information purposes; they are not part of this specification.

C Original Grammar

This section contains the EBNF grammar of the RDF/XML syntax from RDF Model & Syntax Formal Grammar for RDF section. The only changes made here were to make it legal XHTML via tidy and to change the links to the productions to point to those in the original document.

(Note: there are EBNF bugs in the 6.30 production where the </rdf:li> tags are not fully enclosed in quotes as '</rdf:li>')

D Updated Grammar after RDF Core decisions

This section updates the original grammar in Appendix C by amending and deleting various productions according to the recorded RDF Core WG decisions. Some productions are also removed since they are no longer needed, once the above changes are made.

Key:
This text should be added If it is not, your browser will not display this section properly.
~~This text should be deleted. If it is not, your browser will not display this section properly.~~

Production Number	Production Name	Definition
6.1	RDF	"<rdf:RDF>" ~~obj~~ description* "</rdf:RDF>" \| description
~~6.2~~	~~obj~~	~~description \| container~~
6.3	description	"<rdf:Description" idAboutAttr? bagIdAttr? propAttr* "/>" \| "<rdf:Description" idAboutAttr? bagIdAttr? propAttr* ">" propertyElt* "</rdf:Description>" \| typedNode
~~6.4~~	~~container~~	~~sequence \| bag \| alternative~~
6.5	idAboutAttr	idAttr \| aboutAttr \| aboutEachAttr
6.6	idAttr	" rdf:ID=\"" IDsymbol "\""
6.7	aboutAttr	" rdf:about=\"" URI-reference "\""
6.8	aboutEachAttr	" rdf:aboutEach=\"" URI-reference "\"" ~~\| " aboutEachPrefix=\"" string "\""~~
6.9	bagIdAttr	" rdf:bagID=\"" IDsymbol "\""
6.10	propAttr	typeAttr \| propName "=\"" string "\"" (with embedded quotes escaped)
6.11	typeAttr	" rdf:type=\"" URI-reference "\""
6.12	propertyElt	"<" propName idAttr? ">" value "</" propName ">" \| "<" propName idAttr? parseLiteral ">" literal "</" propName ">" \| "<" propName idAttr? parseResource ">" propertyElt* "</" propName ">" \| "<" propName idRefAttr? bagIdAttr? propAttr* "/>"
6.13	typedNode	"<" typeName idAboutAttr? bagIdAttr? propAttr* "/>" \| "<" typeName idAboutAttr? bagIdAttr? propAttr* ">" propertyElt* "</" typeName ">"
6.14	propName	Qname
6.15	typeName	Qname
6.16	idRefAttr	idAttr \| resourceAttr
6.17	value	~~obj~~ description \| string
6.18	resourceAttr	" rdf:resource=\"" URI-reference "\""
6.19	Qname	[ NSprefix ":" ] name
6.20	URI-reference	string, interpreted per [URI]
6.21	IDsymbol	any legal XML name symbol
6.22	name	any legal XML name symbol
6.23	NSprefix	any legal XML namespace prefix
6.24	string	any XML text, with "<", ">", and "&" escaped
~~6.25~~	~~sequence~~	"<rdf:Seq" idAttr? ">" member* "</rdf:Seq>" \| "<rdf:Seq" idAttr? memberAttr* "/>"
~~6.26~~	~~bag~~	"<rdf:Bag" idAttr? ">" member* "</rdf:Bag>" \| "<rdf:Bag" idAttr? memberAttr* "/>"
~~6.27~~	~~alternative~~	"<rdf:Alt" idAttr? ">" member+ "</rdf:Alt>" \| "<rdf:Alt" idAttr? memberAttr? "/>"
~~6.28~~	~~member~~	~~referencedItem \| inlineItem~~
~~6.29~~	~~referencedItem~~	~~"<rdf:li" resourceAttr "/>"~~
~~6.30~~	~~inlineItem~~	"<rdf:li" ">" value </rdf:li>" \| "<rdf:li" parseLiteral ">" literal </rdf:li>" \| "<rdf:li" parseResource ">" propertyElt* </rdf:li>"
~~6.31~~	~~memberAttr~~	~~" rdf:_n=\"" string "\"" (where n is an integer)~~
6.32	parseLiteral	" rdf:parseType=\"Literal\""
6.33	parseResource	" rdf:parseType=\"Resource\""
6.34	literal	any well-formed XML