Steven Pemberton, W3C/CWI
Version date: 2009-05-14
RDFa is a thin layer of markup you can add to your web pages that makes them understandable for machines as well as people. You could describe it as a CSS for meaning. By adding it, browsers, search engines, and other software can understand more about the pages, and in so doing offer more services or better results for the user. For instance, if a browser knows that a page is about an event such as a conference, it can offer to add it to your calendar, show it on a map, locate hotels or flights, or any number of other things.
This document introduces RDFa and gives examples of its use.
rel
about
attributeIf you know HTML markup, you will know that you can add metadata to an HTML
document by adding <meta>
and <link>
elements in the head. For instance:
<meta name="description" content="A site about fish" />
gives a description of the current document. You could say that the current
page has a description
property, whose value is "A site
about fish
".
Similarly you can say:
<link rel="next" href="thecod.html" />
which says that if you consider this page as one in a series of pages, the
next one is thecod.html
. In other words, this page has a
next
relation to thecod.html
.
There are a smattering of other places in HTML where you can add some
metadata, such as the title
element and attribute in places, and
the cite
attribute on <blockquote>
and others,
but that is about it.
In passing, you might wonder why you can't say
<meta name="description">A site about fish</meta>
and the answer is simply that at the time this feature was added to HTML,
some browsers would incorrectly have displayed the text in the meta element,
even though it was in the <head>
and so to prevent that
happening the content was put in an attribute instead (this, by the way, is
being fixed in XHTML2).
Typically the metadata in a document is used for several purposes:
title
element can provide a title for windows and
bookmarks,title
attribute can provide a hover-over
information pop-up,description
property in search
results,next
, prev
and other properties so that you can
navigate between the different pages, or see the copyright information
about a pageand so on.
In the time since the meta
element was added to HTML, a
generalised way of representing metadata has been defined at W3C. This is
called RDF, the resource description framework ('resource' roughly
speaking means 'document' here, but you'll see examples of other things than
documents later).
RDF is a very simple framework. Essentially all knowledge is gathered as assertions of the form:
URI — property — value
where 'URI' is the URI of the thing being described, 'property' is (the URI of) a property, and 'value' is the value that that property can take, either another URI, a literal string, or a chunk of XML.
So assuming the example document above has a URL of
http://www.example.com/home.html
, then the RDF assertion, or
triple as it is often called, for the description property
is
http://www.example.com/home.html — [html:description] — "A site about fish"
and the RDF triple for the next relation would be
http://www.example.com/home.html — [html:next] — http://www.example.com/thecod.html
The value [html:next]
means here "the url that represents the
HTML next property", and is expressed here as a Compact URI
or CURIE for short. More on those later.
RDFa extends the possibilities of metadata in XHTML, by generalising the
attributes on meta
and link
and allowing them to be
used on any element, not just meta
and link
(so that you may now have metadata in the body of the page as well as the head)
and then defining how those attributes can be interpreted as RDF.
To take a simple example, many people add a number of so-called Dublin Core properties to their pages, such as title and author (which is called creator in Dublin Core, since the properties can be used with other things, such as paintings):
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/dcq-html/"> <title>John Smith's Fish of the World</title> <meta name="DC.title" content="Fish of the World"/> <meta name="DC.creator" content="John Smith"/> </head> <body> <h1>Fish of the World</h1> <p>by John Smith</p>
The Dublin Core Metadata Initiative organization defined these and other properties for defining the metadata about books, works of art and so on. You can see in the example above that they duplicate information in the document itself. A nice thing about RDFa is that you can attach the properties to the document text instead:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/"> <head> <title>John Smith's Fish of the World</title> </head> <body> <h1 property="dc:title">Fish of the World</h1> <p>by <span property="dc:creator">John Smith</span></p>
What this does is declare that we are going to use the Dublin Core
properties, and prefix them with dc:
. It then attaches the Dublin
Core properties title
and creator
to the relevant
parts of the text. Of course, a major advantage of this is that the visible
versions in the text don't get out of sync with the metadata versions.
In the last example we had property="dc:title"
. This says "the
property called title from the vocabulary identified by dc:".
But we also said earlier that a property was kept as a URI. A form such as
dc:title
is called a Compact URI, or CURIE for
short. The URI it represents is just the concatenation of the URI in the
declaration of the prefix (in this case
xmlns:dc="http://purl.org/dc/elements/1.1/"
) and whatever follows
the colon. So in this case dc:title
is a short form of the full
URI http://purl.org/dc/elements/1.1/title
. (You can now probably
see why CURIEs are nice to have.)
In the case that there is no prefix (as in the case of something like
rel="index"
), then a default prefix is used. For XHTML that
default is http://www.w3.org/1999/xhtml/vocab#
.
rel
Using the property
attribute like this gives you an equivalent
of the meta
element, but then in the text of your page. To get the
equivalent of a link
element, you use the rel
attribute. For instance, pages often have a clickable "Next" link to take you
to the next page:
<a href="thecod.html">Next</a>
It can be expressed like this:
<a href="thecod.html" rel="next">Next</a>
Similarly,
<a href="page2.html">Back</a>
can be written
<a href="page2.html" rel="prev">Back</a>
Another typical use for rel is to use it to point to the copyright or licensing information of a page. Instead of:
<a href="copyright.html">Copyright</a>
You can write
<a href="copyright.html" rel="copyright">Copyright</a>
(By the way, it doesn't matter what order you put the href
and
rel
in.)
Of course, you could already do this in HTML. What is new is that it is now
defined how to interpret this as RDF, and, as you will later see, you can apply
it to more than just <a>
elements.
Most of the metadata in HTML only allows you to talk about the document itself, and in all the examples we have given so far, we have been giving metadata about the page in question. But you may want to be able to talk about other things than just the current document (and you will see more examples of this shortly).
For this you can use the about
attribute to specify what it is
the information applies to. For instance, suppose you link to some data:
Here is a plot of the data: <img src="plot.png" alt="Rainfall 1900-1999"/>. The <a href="rainfall.csv">raw data</a> is available.
and you want to include the licensing conditions of that data:
The data is available under <a href="license.html">these conditions</a>.
then you can say this:
The data is available under <a about="rainfall.csv" rel="license" href="license.html">these conditions</a>.
If you use about
on a container element, like a
<p>
then the about applies to all the contained
relations:
<p about="rainfall.csv"> The data <strong property="dc:title">Rainfall 1900-1999</strong> is the property of <em property="dc:creator">Data Be We, Inc</em> and is available under <a rel="license" href="license.html">these conditions</a>. </p>
about
attributeNote that the about
attribute contains a URI. It can point to
anything on the Web:
<p about="http://www.w3.org/TR/rdfa-syntax">The title of the RDFa specification is <em property="dc:title">RDFa in XHTML: Syntax and Processing</em>...</p>
Occasionally you may want to use a CURIE instead of a URI in
about
(as you will see shortly), and so to distinguish a CURIE
from a URI in those cases, you enclose a CURIE in square brackets. For
instance, suppose you had defined
xmlns:tr="http://www.w3.org/TR/"
, then you could write the above
in the following way:
<p about="[tr:rdfa-syntax]">The title of the RDFa specification is <em property="dc:title">RDFa in XHTML: Syntax and Processing</em>...</p>
Up to now we have been talking about assigning properties to things with URIs. But there is a problem: not everything that you might want to talk about has a URI. The city of Amsterdam doesn't have a URI. Nor does a person, or an object like a car, or a concept like love. Of course, these things have pages about them, but that is different. It is important not to confuse a website about something with that thing itself.
To take an example to explain the difference, suppose we want to say that
T.S. Eliot is the author of the poem The Waste Land. Well, we might do
a search for the poem, and find
http://en.wikipedia.org/wiki/The_Waste_Land
. You might then be
tempted to say:
<span about="http://en.wikipedia.org/wiki/The_Waste_Land" property="dc:creator">T.S. Eliot</span>
Unfortunately, this says that T.S. Eliot created the Wikipedia page, which is patently not true. So what do we do?
Well, RDFa has a notation that allows you to create a local name for something that doesn't have a URI (or that has a URI that you don't know), and say something about it anyway:
<link about="[_:TheWasteLand]" rel="foaf:isPrimaryTopicOf" href="http://en.wikipedia.org/wiki/The_Waste_Land" />
The "_:" is a reserved prefix for this notation. You can put any identifier
after the colon. What this says is "There is something (which we shall call
'TheWasteLand') which is the primary topic of the page at
http://en.wikipedia.org/wiki/The_Waste_Land
."
Now that we have uniquely identified the poem we can record that its creator was 'T.S. Eliot'":
<span about="[_:TheWasteLand]" property="dc:creator">T.S. Eliot</span>
(By the way, the foaf properties are identified by
xmlns:foaf="http://xmlns.com/foaf/0.1/"
).
In this way we can mint all sorts of names for people, places, organizations and other things that haven't got URIs, and uniquely identify them. A person:
<link about="[_:StevenPemberton]" rel="foaf:isPrimaryTopicOf" href="http://www.cwi.nl/~steven/" />
A place:
<link about="[_:Amsterdam]" rel="foaf:isPrimarytopicOf" href="http://www.amsterdam.nl/" />
An organization:
<link about="[_:W3C]" rel="foaf:isPrimaryTopicOf" href="http://www.w3.org/" />
And then we can use those names in order to talk about them:
<a about="[_:W3C]" rel="foaf:homepage" href="http://www.w3.org/">W3C</a>
These special CURIEs beginning "_:" are called blank nodes or bnodes. Note that they are local to a document, so you have to redeclare them in each document that you use them.
By the way, the important thing with blank nodes is to uniquely identify
them by some means if you can. foaf:isPrimaryTopicOf
is one way,
but any property that is unique will work. For instance:
<link about="[_:StevenPemberton]" rel="foaf:mbox" href="mailto:steven@w3.org" />
is just as good, since there is only one person who has that email address, and so we have uniquely identified that person.
Note that since an empty URI "" means 'the current page', on your own home page you can add code like
<link about="[_:StevenPemberton]" rel="foaf:primaryTopicOf" href=""/>
which says "The thing we call StevenPemberton is the primary topic of this page".
Sometimes although the content contains information that needs to be tagged, it is not always in the form you need it. For instance:
<p>Amsterdam is located at latitude 52°22'23"N and longitude 4°53'32"E</p>
While there are properties for recording latitude and longitude, they expect the values to be decimal numbers. Well we can write this:
<p about="[_:Amsterdam]"><span property="dc:name">Amsterdam</span> is located at latitude <span property="geo:lat" content="52.373">52°22'23"N</span> and longitude <span property="geo:long" content="4.892">4°53'32"E</span></p>
This is of course the same content
attribute you
know from the
meta
element. Its value overrides whatever is in the
content of the element.
(The geo properties are at
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
)
A lesser-used but nevertheless useful relationship in HTML is the
reverse relationship rev
. This relationship is like
rel
, but reverses the relationship. For instance, if a document
doc.html
is indexed by the page index.html
, then
doc.html
can record this fact with the link:
<link rel="index" href="index.html"/>
However, index.html
can also record the relationship:
<link rev="index" href="doc.html"/>
which says "this page is the index for doc.html".
You can use rev
similarly in RDFa. All it does is swap the
subject (the 'about') with the object (the 'href'). For instance, suppose we
have a set of data about a person:
<p about="[_:StevenPemberton]"> Name: <span property="foaf:name">Steven Pemberton</span> Mail: <a rel="foaf:mbox" href="mailto:steven@w3.org">steven@w3.org</a> </p>
Now, foaf has a property img
that says that a particular image
is a picture of some person. But the relationship is from the picture, to the
person. What we would like to say is:
<link about="Steven.jpg" rel="foaf:img" href="[_:StevenPemberton]"/>
except that at the moment we are talking about the person, and not the
image. So if we want to add this information to the block above, we just
reverse the relationship with rev
:
<p about="[_:StevenPemberton]"> Name: <span property="foaf:name">Steven Pemberton</span> Mail: <a rel="foaf:mbox" href="mailto:steven@w3.org">steven@w3.org</a> Mugshot: <a rev="foaf:img" href="Steven.jpg">Photo</a> </p>
Note that you can have (if you want) both rel and rev on an element:
<a rel="next" rev="prev" href="page2.html">Next</a>
(Not that this example gives you very much in terms of extra information!)
You now know enough to use RDFa for day-to-day use, but there are a few extras you might find useful.
Alongside the href
attribute, there is also a
resource
attribute with the same purpose, but usable when you
don't want the link to be clickable, or you want to use a CURIE (since you
can't use a CURIE in href
):
The photo is entitled <em about="Steven.jpg" rel="foaf:img" resource="[_:StevenPemberton]">Steven in London</em>
Note in passing that you may have more than one relation on an element. So we could also say:
The photo is entitled <em about="Steven.jpg" rel="foaf:img" resource="[_:StevenPemberton]" property="dc:title">Steven in London</em>
Often a group of properties together make up a whole. For instance an event
can have a title, a description, a location, and a start and end date. If you
want to say that a section of markup contains such a group of properties, you
can use the typeof
attribute. For instance, to mark up a
conference:
<div xmlns:event="http://www.w3.org/2002/12/cal#" typeof="event:Vevent"> <h3 property="event:summary">WWW 2009</h3> <p property="event:description">18th International World Wide Web Conference</p> <p>To be held from <span property="event:dtstart" content="2009-04-20">20th April 2009</span> until <span property="event:dtend" content="2009-04-24">24th April</span>, in <span property="event:location">Madrid, Spain</span>.</p> </div>
or a TV program:
<div typeof="event:Vevent"> <h3 property="event:summary">Have I Got Old News For You</h3> <p property="event:location">BBC2</p> <p><span property="event:dtstart" content="2008-06-28T21:00:00">Saturday 28 June, 9pm</span>-<span property="event:dtend" content="2008-06-28T21:30:00">9.30pm</p> <p property="event:description">Team captains Paul Merton and Ian Hislop are joined by returning guest host Jeremy Clarkson and panellists Danny Baker and Germaine Greer for the topical news quiz. <abbr title="in stereo">[S]</abbr></p> </div>
Note the use of content
here to get the dates and times into a
machine-readable format.
Occasionally you may want to specify that a particular property is of a
certain data type. The datatype
attribute is precisely for this
purpose:
<span property="event:dtend" datatype="xsd:date" content="2009-04-24">24th April</span>
This would need an
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
.
If you want to make sure your page validates correctly, you should ensure
your pages have the following at the top of the document (before the
<html>
).
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
The validator at http://validator.w3.org/ will check your pages.
There are a number of online services willing to extract all the properties from a RDFa-enabled page, and tell you what they are. For instance, the RDFa Distiller at http://www.w3.org/2007/08/pyRdfa/.
href
or resource
. The subject is either
specified by the closest about or src attribute, @@content
attribute is present) or a piece of markup otherwise
(the content of the element that the property attribute is on).property
attributerev
and rel
attributes. Takes precedence over the resource
attribute.rev
and rel
attributes if href
is not present.property
attribute (either in the content
attribute, or the content
of the element that the datattype
attribute is on.) By
default, data in the content attribute is of type string
,
and data in the content of an element has type xml:Literal
.
If datatype=""
is used, then for the RDF the element content
is stripped of markup, and is of type string.There are many vocabularies available across the web (called taxonomies by some), and there are more being created all the time. Here is a selection:
See the RDFa Wiki list of vocabularies and RDFa examples in the wild for some more.
RDFa Specification - not written for beginners, and therefore hard going, but the final arbiter on RDFa
RDFa Primer - another introduction to RDFa
rdfa.info - news and information about developments.
RDFa Wiki - community meeting place for RDFa.