This service is now discontinued and the underlying software not maintained any more. The underlying software is available publicly if someone is interested re-establishing the service somewhere.
Warning: This version implements
RDFa 1.1 Core, including the handling of the
Role Attribute.
The distiller can also run in XHTML+RDFa 1.0 mode (if the incoming XHTML content uses the RDFa 1.0 DTD and/or sets the version
attribute).
The package available for download, although it may be slightly out of sync with the code running this service.
If you intend to use this service regularly on large scale, consider downloading the package and use it locally. Storing a (conceptually) “cached” version of the generated RDF, instead of referring to the live service, might also be an alternative to consider in trying to avoid overloading this server…
RDFa 1.1 is a specification for attributes to be used with XML languages or with HTML5 to express structured data. The rendered, hypertext data of XML or HTML is reused by the RDFa markup, so that publishers don’t need to repeat significant data in the document content. The underlying abstract representation is RDF, which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. pyRdfa is a distiller that generates RDF triples from an XML or HTML5 file annotated by RDFa in various RDF serialization formats. It can either be used directly from a command line or via a CGI service. It corresponds to the RDFa 1.1 Core document, XHTML+RDFa, and HTML+RDFa specifications, as well as to the SVG Tiny 1.2 Recommendation for the SVG version. The forms above can be used to start the service installed at this site. To learn more about RDFa, please consult the RDFa 1.1 Core Document. See also below for the possibilities to download the package.
As installed, this service is a server-side implementation of RDFa. This also means that pages that generate their (X)HTML content dynamically (e.g., using AJAX) will not be properly processed by this distiller.
format
; values:
turtle
, xml
, json
, nt
;
default: turtle
) rdfa_lite
;
values: true
, false
; default: false
) true
, a warning will be issued if RDFa 1.1 Core
attributes, that are not part of the RDFa
1.1 Lite specification, are used. The separate rdfagraph
option
should be used to make these warnings visible. host_language
;
values: xhtml
, html
, svg
, atom
,
xml
; default: html
)rdfagraph
;
values: output
, processor
, processor,output
;
default: output
)processor
is set, then those triples are returned, too. See the RDFa 1.1. Core document
for further details. vocab_expansion
;
values: true
, false
; default: false
)vocab
attribute, i.e., to retrieve the corresponding RDF file and follow the possible
subclass and subproperty relationships. See the
RDFa 1.1. Core document for further details.embedded_rdf
;
values: true
, false
; default: true
)space_preserve
;
values: true
, false
; default: true
)vocab_cache
;
values: true
, false
; default: true
)vocab_cache_report
;
values: true
, false
; default: false
)rdfagraph
option to processor
or processor,default
(depending on the original setting of rdfagraph
).vocab_cache_refresh
;
values: true
, false
; default: false
)When the RDFa resource is accessed through HTTP, the host language is determined based on the content type of the return header as follows:
metadata
element is also extracted and added to the output.If you use Firefox, Safari, Chrome, or Opera, you can also drag the following bookmarklets to your browser bar and use them to distill the current page: “RDFa it (Turtle)!”, “RDFa it (RDF/XML)!”, “RDFa it (N triples)!”.
When using the distiller URI directly, the option names for the default options can be ommited. Some examples:
http://www.example.com/rdfa.html
, with
whitespace preservation and without warnings, serialized in Turtle:http://www.w3.org/2012/pyRdfa/extract?uri=http://www.example.com/rdfa.html
http://www.example.com/rdfa.html
, with
whitespace preservation and without warnings, serialized in RDF/XML:http://www.w3.org/2012/pyRdfa/extract?format=xml&uri=http://www.example.com/rdfa.html
http://www.example.com/rdfa.html
, with
whitespace preservation and including warnings, serialized in Turtle:http://www.w3.org/2012/pyRdfa/extract?graph=default,processor&uri=http://www.example.com/rdfa.html
http://www.w3.org/2012/pyRdfa/extract?uri=referer
The distiller adds either error, warning, or informaation triples into the processor graph. Some of those are defined by the RDFa Core document, some additional messages are generated by the distiller. The latter category includes, e.g., HTTP 404 errors; these are reported using the same error structure as the ones defined by the standard.
The underlying package, called pyRdfa, implemented as a Python package, is available for download from GitHub. The package is based on the standard Python 2.x.y distribution, where 'x' should be 5 or higher. (It has been tested on version 2.7.2, which is the highest, and probably the last stable release in Python 2.x; if possible, better use that one). The module does not run (yet) on the Python 3.x family. The documentation of the package can be consulted on-line (but is also part of the distribution).
The core package relies on the RDFLib package. It has been tested on the RDFLib 3.1.0, but it also runs with the RDFLib 2.x versions. RDFLib 3.x is preferred: the serialization modules are superior in quality. (Note, however, that the JSON serialization does not run on RDFLib 2.x versions!) The Python HTML5 parser is used to process HTML5. The general package also relies on a slightly modified version of Deron Meranda’s httpheader module. Finally, for reasons that I do not really understand, in some cases the RDFLib distribution generates an import error on a module called isodate that has to be installed manually. (The HTML5 Parser, the httpheader, and the isodate modules are included in the distribution to make installation easier.)
For the JSON-LD serialization, two more external packages are used: Armin Ronacher’s Ordered Dictionary (odict) package, as well as Bob Ippolito’s simplejson package. odict is needed unless Python 2.7.x is used (an ordered dictionary module has been added to the standard distribution of Python 2.7.x); simplejson is needed for Python 2.5 (json has been added to the standard Python 2.6.x distribution).
To install the package, download the distribution file from github and either move the pyRdfa directory to your PYTHONPATH or modify your PYTHONPATH to to include that directory. Alternatively, you can use the standard 'setup.py' script. The odict and httpheader modules (each consisting of a single Python file) have been added to the pyRdfa package under ‘extras’; you do not have to do anything special to install these. The HTML5 parser must be installed independently; to make this step easier, the compressed tar file has been added to the pyRdfa distribution file. The same is true for the simplejson package although, if you run Python 2.6.x or higher, that module can be ignored.
This software is available for use under the W3C® SOFTWARE NOTICE AND LICENSE