Warning:
This wiki has been archived and is now read-only.

Why Graphs 6.1

From RDF Working Group Wiki
Jump to: navigation, search


This page shows the Why Graphs use cases solved by Proposal 6.1.

1 Shared Web Crawler

Several systems want to use the data gathered by one RDF crawler. They don't need previous versions of the data.

1.1 Simple Solution

The crawler publishes its data as TriG where the graph label is the URL from which the data was fetched.

For example, Example Corporation might run a crawler that publishes its accumulated data at http://example.com/all. A GET of that location would return a document like this:

Original:

 <http://www.w3.org/People/Berners-Lee/card> {
   # ... the triples recently fetched from that URL
 }
 <http://www.dbpedia.org/resource/Tim_Berners-Lee> {
   # ... the triples recently fetched from that URL
 }

Per 6.1:

 { <http://www.w3.org/People/Berners-Lee/card> a rdf:GraphStateResource.
   <http://www.dbpedia.org/resource/Tim_Berners-Lee> a rdf:GraphStateResource. }
 <http://www.w3.org/People/Berners-Lee/card> {
   # ... the triples recently fetched from that URL
 }
 <http://www.dbpedia.org/resource/Tim_Berners-Lee> {
   # ... the triples recently fetched from that URL
 }

Note that the declarations that the locations are GraphStateResources are required to communicate unambiguously to clients how the the trig document is to be understood. In a closed environment where clients know this already, these triples are not necessary.

2 Archiving Web Crawler

Several systems want to use the data gathered by one RDF crawler. They want the crawler to keep previous versions of the data. This might be used for showing users when particular parts of the data changed or for providing a consistent view of the crawled data at some point in the past.

2.1 Simple Solution

Use TriG with the graph label being some new identifier created at the time the retrieval was done. Some other data, in the default graph, connects that identifier with the URL used to fetch the content.

Example:

<http://crawler.example.org/r8571> { ... triples fetched in retrieval 8671 }
<http://crawler.example.org/r8572> { ... triples fetched in retrieval 8672 }
{ 
   <http://crawler.example.org/r8571> eg:source <http://example.org> ;
                                      eg:date "2011-01-04T00:03:11"^^xs:dateTime .
   <http://crawler.example.org/r8572> eg:source <http://example.org> ;
                                      eg:date "2011-01-05T00:04:18"^^xs:dateTime
}
_:r8571                     { ... triples fetched in retrieval 8671 }
<tag:eric@w3.org,2012:r8571> { ... triples fetched in retrieval 8672 }
{ 
   _:r8571>                   a rdf:SnappedResource
                              eg:source <http://example.org> ;
                              eg:date "2011-01-04T00:03:11"^^xs:dateTime .
   tag:eric@w3.org,2012:r8572 a SnappedResource
                              eg:source <http://example.org> ;
                              eg:date "2011-01-05T00:04:18"^^xs:dateTime
}

3 Endorsement

A system wants to convey to another system in RDF that some person agrees with or disagrees with certain RDF triples.

3.1 Simple Solution

Use TriG with the graph label being an identifier for an RDF Graph (g-snap), so that it can be referred to in the default graph.

For example:

{ eg:sandro eg:endorses <http://a.example/g1> }
<http://a.example/g1> { ... the triples I'm endorsing ... }

endorse all

{ eg:sandro eg:endorses <http://a.example/g1> .
  <http://a.example/g1> a rdf:GraphStateResource }
<http://a.example/g1> { ... the triples I'm endorsing ... }

endorse part

{ eg:sandro eg:endorses _:s2 .
  _:s1 a rdf:SnappedResource ;
       eg:source <http://a.example/g1>
       rdf:hasSubgraph _:s2 }
_:s2 { ... the triples I'm endorsing ... }

4 Separation of Inference

People run forward-chaining inference rules on their RDF data, and they want to be able to keep the inferred triples separatable from the given ones. This allows them, among other things, to delete and regenerate the inferred triples after the underlying data changes.

This is a simpler use case than keeping the full derivation information, but may be enough for these design purposes.

4.1 Simple Solution

TriG or N-Quads where bnodes are allowed to be shared.

For example:

Alice knows Dan Brickley, who sometimes likes to not have a URI:

 AlicezPage {
   eg:Alice foaf:knows _:u1.
  _:u1 foaf:mbox_sha1sum="70c053d15de49ff03a1bcc374e4119b40798a66e";
       foaf:name="Dan Brickley" }

From the FOAF spec and namespace document:

 foaf: {
   foaf:mbox_sha1sum rdfs:domain foaf:Agent
 }

By RDFS semantics, we can infer:

 _:u1 rdf:type foaf:Agent
{ _:u2 a rdf:PartialInference ;
       eg:regime rdfs: ;
       eg:sources ( AlicezPage: foaf: ) }
_:u2 { eg:Alice a rdf:Resource ;
       _:u1 a foaf:Agent… }

Then a query for the foaf:name of every agent will show "Dan Brickley", whereas it would not without the inference. (The domain of foaf:name is owl:Thing.)

Now, how do we keep the inferred triple separate from the givens? In SPARQL, we could put the givens in one arbitrarily named graph and the conclusions in another. But if we dump and restore, we'll need to use a format that allows the bnode to be shared between those graphs.

5 SPARQL Backup and Restore

People running SPARQL systems want to be able to dump the contents of their database to a file and be able to restore it later. The format needs to be standards so they can load it on a different vendor's SPARQL system, or give it to someone else to load on their SPARQL system.

5.1 Simple Solution

TriG, with no additional semantics:

{ eg:s eg:p eg:o . }
eg:g { eg:s eg:p eg:o. }

5.2 Other Designs

5.2.1 N-Quads, with no additional semantics

eg:s eg:p eg:o .
eg:s eg:p eg:o eg:g .

5.2.2 SPARQL Update Subset

As people do with SQL, have the dump format be a sequence of SPARQL Update statements which reconstruct the database. That is:

INSERT DATA { eg:s eg:p eg:o. }
INSERT DATA { GRAPH eg:g { eg:s eg:p eg:O. } }
  • Does not preserve bNode relationships across graphs, or the same graph is split across { } blocks.

5.3 Potential Complications

  1. Some SPARQL systems (eg 4store) maintain the default graph as a merge of the named graphs. Should the dump include every triple twice to show this? Or should there be am indicator it's a dump of this kind of system? What happens if you try to restore a dump with content in the default graph into this kind of a system?
  2. It seems important to keep the solution semantics-free, like the named graphs in SPARQL. Otherwise a SPARQL dump will be asserting things not asserted by the SPARQL database it conveys.