Mission Possible: Deploying Government Linked Data (Pt3)

Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke
John L. Sheridan, @johnlsheridan
gov 2.0 expo, May 25-26, 2010, Washington DC
http://www.w3.org/2010/Talks/0525-publishing-triples (wiki)

Part 3

Publishing Triples on the Web

The Mechanics of Publication
- Various Platforms
- Data Changes
- Catalogs
The Politics of Publication
- Aligning Governance
- Continuity Policies
- Maintaining Provenance

Patterns for Publishing

Harvestable RDF
- RDFa embedded in web pages
- XHTML or XML and GRDDL
  - provide XSLT stylesheet that translates XML to RDF/XML
- RDF formats generated from underlying database
Queryable RDF
- store RDF within triplestore
- provide SPARQL endpoint
- layer user-friendly APIs on top of endpoint

Mechanics of Publication

What do you need to publish your triples?

Hardware
Software
Expertise

?

Platforms

Static Documents
Web Platforms
SQL-Based
Triplestores
Custom Code

Static Documents

http://www.w3.org/TR/swbp-vocab-pub/

Generate by hand, or output from existing systems.

Web Platforms

Drupal 6, Drupal 7

Semantic MediaWiki

Some RDFa is easy.

SQL-Based

D2R Server

Maybe built into MySQL, Oracle, ...

RDB2RDF Working Group

Triplestores

http://www.w3.org/2001/sw/wiki/Category:Triple_Store

Custom Servers

jena rdflib redland swipl

http://www.w3.org/2001/sw/wiki/Category:Programming_Environment

Linked Data API

Easy-to-use APIs built on linked data
- queryable through URI parameters
- return simple JSON or XML

For example:

/doc/school => list of schools
/doc/school?_page=2 => second page of schools
/doc/school?constituency.code=142 => list of schools in Dulwich and West Norwood
/doc/school/consitituency/142 => list of schools in Dulwich and West Norwood
/doc/school/consitituency/142?min-highAge=7&max-lowAge=7 => list of schools in Dulwich and West Norwood that accept seven year olds

Note:

Easy to implement (existing implementations in PHP, Java)
API 'meta' tells you the SPARQL generated

See project slides

Data Changes

This is an API. Every change affects someone.

Design for change.

The World Changes

A set of triples should be true for some time range

Suggestion: use dc:temporal to declare that time range.

One URL for archival copy:

schools_2010_01
schools_2010_02
schools_2010_03
...

Another URL for "latest":

schools_latest
- which will be the same as schools_2010_05 for a few more days

This is good practice for many kinds of web pages.

Link among the versions.

Corrections

Similar archive/latest mechanism, but different reasons.

"restated financial statements" for some time period.

Metadata can indicate the difference, causes.

Push and Pull Feeds

Dataset Dynamics

enable efficient local mirroring
news of changes

Catalogs

dcat Data Catalog Vocabulary

- metadata to catalog
- metadata from catalogs

Politics of Publication

Tim Berners-Lee's five stars:

Publish the data on the Web in any format (eg .pdf)
Publish in a machine-readable format (eg .xls)
Publish in a non-proprietary format (eg .csv)
Publish as RDF Linked Data (eg .rdf)
Establish useful links between resources

Maybe you're already at 2 or 3.

Jumping in at 5 might be easiest.

Aligning Governance

Government data is usually created and governed by someone
Try to use existing governance structures for Linked Data publishing
Operates at different levels
- Who can have a .gov domain?
- How to mint URIs?
- Who should mint URIs?
- Which URIs should I use?
- What URIs are promoted for wider use within government?

Continuity Policies

Who will serve the URI if the agency changes names?

Who will serve the URI if the agency is shut down?

Redirections vs Content

Role of Archives Organizations

Maintaining Provenance

Important for government data and a key part of responsible publishing
Helps data consumers know what they are dealing with
Operates at different levels
- Organisational level - who made this data, how and when?
- File level - what processing was done to make this file, when?
Can be done simply (eg Dublin Core Terms) or with more sophistication (eg using OPMV specialisations)

Next Steps

Local Semantic Web Meetups

Participate in W3C eGov Interest Group

Email sandro@w3.org subject "tutorial"