Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document contains information about embedding metadata in W3C Technical Reports (TR) using RDFa.
This document is for review by the Semantic Web Deployment Working Group (SWD) and is subject to change without notice. This document has no formal standing within W3C. Please consult the group's home page and the W3C technical reports index for information about the latest publications by this group.
W3C publishes a number of Technical Reports (TR). Prior to publication, these documents are checked against some strict publication rules ("pubrules"). Once published, these documents are indexed at http://www.w3.org/TR/.
In their current version, pubrules do not require that machine-readable explicit and comprehensive metadata are added to the documents. However, pubrules dictate that the documents themselves must contain a notable amount of self-descriptive data in their headers and their first paragraphs. These information pieces must be formatted and edited according to some conventions.
The W3C internal "TR Automation" Project aims to simplify the publication of Technical Reports. It has produced a XSLT style sheet [XSLT2] that exploits the strict formatting rules of Technical Reports to generate metadata about them in RDF [RDFPrimer]. This style sheet is used at W3C to keep an up-to-date RDF document containing descriptions of all the documents published under http://www.w3.org/TR/. The present document discusses a different approach, based on making the metadata explicit in the document using RDFa [RDFaPrimer].
A combination of some W3C and third-party vocabularies can be used to formally capture the Technical Reports metadata in RDF. The following list summarizes these vocabularies:
Editor's note: This is the new SKOS namespace, but it is a feature at risk. It might need to be changed.
Note that some of these vocabularies are published by W3C, but they have no formal standing (they are not W3C Recommendations).
In the following, it is assumed that the following namespace aliases are defined:
Prefix | Namespace |
---|---|
rec: | http://www.w3.org/2001/02pd/rec54# |
org: | http://www.w3.org/2001/04/roadmap/org# |
mat: | http://www.w3.org/2002/05/matrix/vocab# |
doc: | http://www.w3.org/2000/10/swap/pim/doc# |
con: | http://www.w3.org/2000/10/swap/pim/contact# |
dct: | http://purl.org/dc/terms/ |
skos: | http://www.w3.org/2008/05/skos# Editor's note: This is the new SKOS namespace, but it is a feature at risk. It might need to be changed. |
xsd: | http://www.w3.org/2001/XMLSchema# Editor's note: Not sure about the final hash |
xhtml: | http://www.w3.org/1999/xhtml |
An analysis of W3C Technical Reports and their associated publication process shows that there are several pieces of metadata which could be useful to associate to the documents. The following table is a non-exhaustive list of the metadata. For each piece, a suggestion is made on which RDF properties can be used to encode them:
Metadata item | Suggested properties | Use notes |
---|---|---|
Document title and subtitle | dct:title | In addition to the title, some documents have a subtitle. Due to the lack of a widely-used property to encode subtitles, the title and subtitle can be concatenated and captured with the dct:title property. The @content attribute from RDFa may be useful to specify the full title of the document. |
Abstract | dct:abstract | |
Maturity level of the document: Working Draft, Note, Recommendation... | See use notes. | The maturity levels of a W3C TR are defined as classes in the rec: namespace: rec:REC, rec:NOTE, rec:WD... RDFa's @typeof attribute can be used to declare the document as a instance of one of these classes. |
Name, affiliation and contact address of the editors / authors | rec:editor, con:fullName, con:mailbox | Each editor should be described as a different resource. The FOAF vocabulary [FOAF] may be used to create expressive descriptions. |
Publication date | dct:date | The datatype xsd:date from XML Schema Datatypes [XMLSchema2] may be used to format the date. |
Link to previous published version | doc:obsoletes | Editor's Note: add example of how to distinguish from supersedes |
Link to previous documents that are obsoleted or superseded by the present version (i.e.: "replaces") | rec:supersedes | Editor's Note: add example of how to distinguish from obseletes |
Link to the most up-to-date published version of the current document | doc:versionOf | |
Link to the implementation report | mat:hasImplReport | |
Link to the errata | mat:hasErrata | |
Link to translated versions | mat:hasTranslations | |
Link to the W3C Activity that has produced the document | rec:cites | |
Link to the W3C Working Group that has produced the document | org:deliveredBy, con:homePage | The WG should be described as a different resource. If the URI of the WG is not known, an anonymous resource can be used. |
Link to the patent policy | org:patentRules | The patent policy is a property of the working group that produces the document, and not a property of the document itself. Formally, the domain of org:patentRules is org:Group. Therefore, this property should be used to describe the WG resource (see previous row). |
Deadline for feedback (e.g., for comments to Last Call documents, implementation feedback, etc.) | rec:lastCallFeedBackDue, rec:implementationFeedbackDue, rec:lastCallFeedBackDue | |
Links to / full citations of referenced documents, which can be normative and non-normative | dct:references | |
Links to companion documents, for documents which are released as part of a series, such as the RDF specifications. | dct:isPartOf | Editor's Note: add info about needing a URI for the series. |
Name of the series editor | rec:editor | Editor's Note: add info about needing a URI for the series. |
Link to license | xhtml:license | Editor's Note: would dct:license work beter? |
Link to the diff/changelog | skos:changeNote | Note that the domain of the SKOS documentation properties is not restricted, therefore, they can be used to annotate any resource [SKOSRef]. |
The list above is a superset of the metadata that is extracted by the style sheet of the TR automation process. The latter can be easily obtained by means of the W3C Online XSLT 2.0 service. For instance, the RDF metadata extracted by the style sheet for this document follows:
Editors' Note: Replace this mock example with more realistic data. The XSLT fails to extract some of these triples because the headings of this document are not complete.
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:doc="http://www.w3.org/2000/10/swap/pim/doc#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rec="http://www.w3.org/2001/02pd/rec54#" xmlns:org="http://www.w3.org/2001/04/roadmap/org#" xmlns="http://www.w3.org/2001/02pd/rec54#" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <TRPub rdf:about=""> <dct:date>0001-01-01</dct:date> <dct:title>Adding Metadata to W3C Technical Reports</dct:title> <doc:versionOf rdf:resource=""/> <editor rdf:parseType="Resource"> <contact:fullName>UNKNOWN Diego Berrueta</contact:fullName> </editor> </TRPub> </rdf:RDF>
Note that the XSLT style sheet simply extracts the full name of the editors/authors and their contact address. As part of the W3C internal process to automate the listing of TR documents, this information is later matched against a manually-maintained list of "known" people. The insufficient mark-up in the original documents makes it impossible to fully automate the extraction of people's data.
Editor's Note: discuss how the internal structure of the document can be described with RDFa, for instance, to indicate which sections are normative and which are just informative. The SALT ontologies can be useful for this purpose.
Although the RDFa technology [RDFaSyntax] has not reached yet the W3C Recommendation status, the pubrules allow Technical Reports (except for Recommendations) to use XHTML+RDFa (see June 24, 2008 announcement and current TR pubrules concerning normative representations).
RDFa can be used in enrich TR with comprehensive metadata. Moreover, the strict structure enforced by the pubrules makes it easy to decorate the markup with RDFa attributes. In many cases, there is no need to introduce redundant mark-up or data, although fine-grained annotation may require auxiliary mark-up.
At the moment, RDFa has only been specified for XHTML 1.1. Technical Reports using HTML4 or XHTML 1.0 cannot include RDFa attributes, because they will not successfully validate their mark-up. Similarly, those TR editors which use non-HTML formats in their documents (e.g., XML Spec), and later convert them to (X)HTML, must wait until RDFa support becomes available in the tools they use.
The use of RDFa to add metadata to a W3C Technical Report is illustrated by this document, which has been augmented with RDFa markup. It successfully passes the W3C markup validator, and its metadata can be extracted with the W3C RDFa Distiller service. Check the HTML source of this document for details, or read the example below.
Some steps to add RDFa to a Technical Report are described below. Note, however, that the authoriative source of information on RDFa usage is the RDFa Syntax [RDFaSyntax] and the RDFa Primer [RDFaPrimer]. The present document is not a substitute for either of these sources.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:rec="http://www.w3.org/2001/02pd/rec54#" xmlns:org="http://www.w3.org/2001/04/roadmap/org#" xmlns:mat="http://www.w3.org/2002/05/matrix/vocab#" xmlns:doc="http://www.w3.org/2000/10/swap/pim/doc#" xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#" xmlns:dct="http://purl.org/dc/terms/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > ... </html>
<html ... about="http://www.example.org/tr-metadata-20081002" > ... </html>
<body typeof="rec:WD" > ... </body>
Editors' Note: The class of "Editor's drafts" is not defined in the rec: ontology. Therefore, this document (and this example) use the rec:WD class, although at this point, the document is not a WD, but a ED.
<h1 id="title" property="dct:title"> Adding Metadata to W3C Technical Reports </h1> <h2 id="w3c-doctype"> W3C Working Draft <span property="dct:date" datatype="xsd:date" content="2008-08-31"> 31 August 2008 </span> </h2>
<dl> ... <dt>Previous version:</dt> <dd> <a rel="doc:obsoletes" href="http://www.example.org/TR/2006/WD-20060314/" >http://www.example.org/TR/2006/WD-20060314/</a> </dd> ... </dl>
<dl> ... <dt>Editors:</dt> <dd rel="rec:editor"> <span typeof="con:Person"> <span property="con:firstName"> Diego </span> <span property="con:familyName"> Berrueta </span> <span rel="owl:sameAs" resource="http://berrueta.net/foaf.rdf#me"/> </span> , FundaciĆ³n CTIC </dd> ... </dl>
The elaboration of W3C Technical Reports follows a formal process. As part of this process, many revisions (iterations) of a single document are produced. All the revisions of a document, even the obsolete ones, are archived, and are always available at a "dated" URI, i.e., a URI that contains the date of publication in its path component. "Dated" URIs allow you to make unambiguous references to particular revisions of the document. For instance, the SKOS Reference Working Draft dated 29 August 2008 is (and will always be) available by dereferencing the following "dated" URI: http://www.w3.org/TR/2008/WD-skos-reference-20080829/.
However, many readers are interested in just the latest version of the document. For their convenience, W3C offers a URI for each document that identifies the latest version. For instance, the latest published revision of the SKOS Reference is available at http://www.w3.org/TR/skos-reference/. In the following, this kind of "non-dated" URI is called the "latest version" URI.
When a web agent retrieves the "latest version" URI, it is not redirected to the "dated" URI, but it is directly served the most up-to-date revision available. Therefore, the latest version of a document is available at two different URIs (the "dated" one and the "latest version" one).
As the "latest version" URI is a moving target, it should not be used to describe any metadata element that may change in an upcoming revision, i.e., almost every metadata element. The "dated" URI must be used instead. Otherwise, the URI that was used to retrieve the document will be used by default. In the case of a "latest version" URI, this would result in statements involving multiple versions of a document being merged together, producing a nonsensical mishmash of assertions.
Consequently, in order to ensure that the "dated" URI is always used in metadata descriptions the @about RDFa attribute must be used to explicitly set the Document URI that is being described.
GRDDL [GRDDL] is a W3C Recommendation of a mechanism for declaring that a document contains RDF-compatible data and for linking to algorithms that can extract these data from the document. Typically, these algorithms are codified in XSLT [XSLT2].
Unfortunately, the XSLT style sheet produced by the TR automation project cannot be directly used with GRDDL due to its internal modular structure.
Editor's Note: should we mention Expressing Dublin Core metadata using HTML/XHTML meta and link elements
(To be completed).