Part 3
Publishing Triples on the Web
- The Mechanics of Publication
- Various Platforms
- Data Changes
- Catalogs
- The Politics of Publication
- Aligning Governance
- Continuity Policies
- Maintaining Provenance
Patterns for Publishing
- Harvestable RDF
- RDFa embedded in web pages
- XHTML or XML and GRDDL
- provide XSLT stylesheet that translates XML to RDF/XML
- RDF formats generated from underlying database
- Queryable RDF
- store RDF within triplestore
- provide SPARQL endpoint
- layer user-friendly APIs on top of endpoint
Mechanics of Publication
What do you need to publish your triples?
- Hardware
- Software
- Expertise
?
Platforms
- Static Documents
- Web Platforms
- SQL-Based
- Triplestores
- Custom Code
Static Documents
http://www.w3.org/TR/swbp-vocab-pub/
Generate by hand, or output from existing systems.
Web Platforms
Drupal 6, Drupal 7
Semantic MediaWiki
Some RDFa is easy.
SQL-Based
D2R Server
Maybe built into MySQL, Oracle, ...
RDB2RDF Working Group
Triplestores
http://www.w3.org/2001/sw/wiki/Category:Triple_Store
Custom Servers
jena rdflib redland swipl
http://www.w3.org/2001/sw/wiki/Category:Programming_Environment
Linked Data API
- Easy-to-use APIs built on linked data
- queryable through URI parameters
- return simple JSON or XML
For example:
- /doc/school => list of schools
- /doc/school?_page=2 => second page of schools
- /doc/school?constituency.code=142 => list of schools in Dulwich and West Norwood
- /doc/school/consitituency/142 => list of schools in Dulwich and West Norwood
- /doc/school/consitituency/142?min-highAge=7&max-lowAge=7 => list of schools in Dulwich and West Norwood that accept seven year olds
Note:
- Easy to implement (existing implementations in PHP, Java)
- API 'meta' tells you the SPARQL generated
See project slides
Data Changes
This is an API. Every change affects someone.
Design for change.
The World Changes
A set of triples should be true for some time range
Suggestion: use dc:temporal to declare that time range.
One URL for archival copy:
- schools_2010_01
- schools_2010_02
- schools_2010_03
- ...
Another URL for "latest":
- schools_latest
- which will be the same as schools_2010_05 for a few more days
This is good practice for many kinds of web pages.
Link among the versions.
Corrections
Similar archive/latest mechanism, but different reasons.
"restated financial statements" for some time period.
Metadata can indicate the difference, causes.
Push and Pull Feeds
Dataset Dynamics
- enable efficient local mirroring
- news of changes
Catalogs
dcat Data Catalog Vocabulary
- metadata to catalog
- metadata from catalogs
Politics of Publication
Tim Berners-Lee's five stars:
- Publish the data on the Web in any format (eg .pdf)
- Publish in a machine-readable format (eg .xls)
- Publish in a non-proprietary format (eg .csv)
- Publish as RDF Linked Data (eg .rdf)
- Establish useful links between resources
Maybe you're already at 2 or 3.
Jumping in at 5 might be easiest.
Aligning Governance
- Government data is usually created and governed by someone
- Try to use existing governance structures for Linked Data publishing
- Operates at different levels
- Who can have a .gov domain?
- How to mint URIs?
- Who should mint URIs?
- Which URIs should I use?
- What URIs are promoted for wider use within government?
Continuity Policies
Who will serve the URI if the agency changes names?
Who will serve the URI if the agency is shut down?
Redirections vs Content
Role of Archives Organizations
Maintaining Provenance
- Important for government data and a key part of responsible publishing
- Helps data consumers know what they are dealing with
- Operates at different levels
- Organisational level - who made this data, how and when?
- File level - what processing was done to make this file, when?
- Can be done simply (eg Dublin Core Terms) or with more sophistication (eg using OPMV specialisations)
Next Steps
Local Semantic Web Meetups
Participate in W3C eGov Interest Group
Email sandro@w3.org subject "tutorial"