Copyright © 2003, 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document presents GRDDL, a mechanism for encoding RDF statements in XHTML and XML to be extracted by programs such as XSLT transformations.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
As part of the work of the W3C Semantic Web Activity, the Semantic Web Coordination Group (Member-only) and the HTML Working Group started a task force on RDF in XHTML. This draft is a snapshot of one of the designs discussed in that task force.
Please send review comments, implementation experience reports, etc. to public-rdf-in-xhtml-tf@w3.org, a mailing list with public archive.
The EmbeddingRDFinHTML wiki topic is also available as a shared space for collected wisdom on related topics.
A related design history and rationale discusses contribution of this draft to RDF issues such as faq-html-compliance and rdfms-validating-embedded-rdf and Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8.
This is something of a design sketch, but it is backed by running code. We provide pair of online services, one demo for XHTML and one demo for generic XML on an experimental, best-effort basis.
The editors are aware of a few remaining issues,
marked up like this @@@
.
A log of changes is appended.
Publication as a Coordination Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
An article by J. Kunze in 1999, Encoding Dublin Core Metadata in HTML, explains one way that the Dublin Core community encodes its metadata in HTML documents. This metadata can also be expressed in the Resource Description Framework (RDF).
The mapping between the HTML encoding and the RDF encoding can be represented as an XSLT transformation, dc-extract.xsl:
If the HTML author understood and agreed to these encoding conventions, then their HTML document will conform to the syntactic conventions. In this case, the mapping preserves the author's meaning. But an author may have accidentally conformed to the syntactic conventions without any knowledge of Dublin Core at all. In that case, the mapping most likely does not preserve the author's meaning.
The HTML specification, in section 7.4.4.3 Meta data profiles provides a mechanism for authors to use particular metadata vocabularies and thereby indicate the author's intent to use those terms in accordance with the conventions of the community that originated the terms.
Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.
GRDDL is such a profile; it's a mechanism for Gleaning Resource Descriptions from Dialects of Languages. Use of the http://www.w3.org/2003/g/data-view profile indicates that RDF statements that result from transformation of the HTML document to RDF by designated algorithms are part of the document's meaning.
In this profile, the transformation link relationship relates a document to an algorithm for for gleaning resource descriptions from the dialect the document is written in.
@@@ Should we namespace-qualify token used in
rel
?cf Profiles
attribute: A format to be defined Karl Dubost 15 Jan 2004.
For example:
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> <meta name="DC.Subject" content="ADAM; Simple Search; Index+; prototype" /> ... </head> ... </html>
The following RDF statement is part of the meaning of this document:
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <rdf:Description rdf:about=""> <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject> </rdf:Description> </rdf:RDF>
Transformation algorithms should be represented in XSLT. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics. Other representations may be used by prior agreement of all concerned parties.
Transformation algorithms should be well-defined functions whose
only input is the source document. The use of the XSLT
document()
function to incorporate other data at transformation
time is an error.
Limitations on xsl:import
?
Note that an XHTML document may conform to a number of dialects simultaneously and link to more than one decoding algorithm. For example, the fictional Joe Lambda's Homepage demonstrates a mixture of Dublin Core, Creative Commons, RSS, FOAF, and geoURL dialects.
The GRDDL profile mechanism is a special case of GRDDL designed to fit within the syntax of XHTML 1.0. The general form of GRDDL is an attribute suitable for use with a wide variety of XML dialects.
Use of the interpreter
attribute in the
http://www.w3.org/2003/g/data-view#
namespace on the root
element of an XML document indicates that RDF statements that result from
transformation of the HTML document to RDF by designated algorithms are part
of the document's meaning.
The value of the grddl:interpreter
attribute designates a
list of algorithms by URI reference. @@@IRI
reference?
For example: update to P3Q example?
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:data-view="http://www.w3.org/2003/g/data-view#"
data-view:interpreter="http://www.example.org/2004/01/svg2dc.xsl"
width="4cm" height="8cm"
version="1.1" baseProfile="tiny" >
The RDF property
http://www.w3.org/2003/g/data-view#namespaceTransformation
links an XML Namespace to an interpreter that may be applied to any document
which has its root element in that namespace, such that the output of the
interpreter will be an RDF/XML form of some (or all) of the information
content of the document.
For instance, given the XML Namespace
http://www.example.net/fooML
,
<rdf:Description rdf:about="http://www.example.net/fooML">
<namespaceTransformation xmlns='http://www.w3.org/2003/g/data-view#'
rdf:resource='http://www.example.net/fooML2rdf.xsl' />
</rdf:Description>
asserts that if an XML document has a root element in the
http://www.example.net/fooML
namespace, and it is run through
the XSLT style sheet http://www.example.net/fooML2rdf.xsl
then the result will be valid RDF/XML which is information which can be
considered to have been expressed by the document.
RFC 2046, in section 9. Security Considerations says:
Implementors should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the "application/postscript" type may serve as a model for considering other media types with remote execution capabilities.
Given the expressive power of XSLT, and the possibility to access external
resources from a XSLT style sheet (e.g. through the document
function or the xsl:import
mechanism), implementors should take
the appropriate measures to prevent malicious usage of this mechanism.
The Nov 2003 draft is a predecessor of this spec.
An editor's working draft is also available; v1.11 was announced in a message of 16Jan.