This document makes an attempt to provide a minimal set of concepts around the vague term of “Graph Identification” that may serve as the basis for a consensus in the RDF Working Group. By concentrating on that minimal level and the issues listed in this document it can be hoped that the RDF WG can move away from its current deadlock on this subject.
The goal is to rely and reuse the corresponding notions in the SPARQL 1.1 specification and introduce new notions when necessary and for completeness. In particular, this specification introduces the notion of RDF spaces—modifiable places to store RDF triples. Examples of RDF spaces include: an HTML page with embedded RDFa or microdata, a file containing RDF/XML or Turtle data, and a SQL database viewable as RDF using R2RML. RDF spaces provide a mutable counterpart of SPARQL’s named graphs appearing in datasets. Figure 1 gives an overview of the relationships among the different concepts as described in this document.
There is no intended destination document for the material in this section - it is presented solely to facilitate discussion within within the RDF Working Group. Document editors may pull from this material as they see fit.
The Resource Description Framework (RDF) provides a simple declarative way to store and transmit information. It also provides a trivial but effective way to combine information from multiple sources, with graph merging. This allows information from different people, different organizations, different units within an organization, different servers, different algorithms, etc, to all be combined and used together, without any special processing or understanding of the relationships among the providers.
For some applications, the basic RDF merge operation is overly simplistic, as extra processing and an understanding of the relationships among the providers may be useful. This document specifies a way to conveniently handle information coming from multiple sources, by modeling each one as a separate space, and using RDF to express information about these spaces. In addition to this important concept, we provide a pair of languages—extensions to existing RDF syntaxes— which can be used to store or transmit in one document the contents of multiple spaces as well as information about them.
The RDF WG recognises that many existing implementations include the notion of modifiable places to store RDF triples for eminently practical reasons. Implementations using SPARQL 1.1, the SPARQL Protocol, the Linked Data API, Linked Open Data and various evolving forms of Linked Data for enterprises have created names for mutable RDF graphs that are coincident with their operational URLs. The RDF WG is thus encouraged to discover a formalisation of graph identification concepts that align with implementation experience.
The intended destination document for this material is RDF 1.1 Concepts.
Figure 1 gives an overview of the relationships among the different concepts as described in this document.
The term "space" might change. The final terminology has not yet been selected by the Working Group. Other candidates include "g-box", "data space", "graph space", "(data) surface", "(data) layer", "sheet", and "(data) page". The contributors also note that the term “resource” was considered, and could be used but for possible ambiguities with other, partially overlapping, uses of that term. The term “RDF space” is intended to be synonymous with the term “g-box”, as defined by the RDF Working Group.
This document is only concerned with resources that have state, and doesn’t take a particular stance on the question what kind of resources can have state. For more on this, see URI/Resource Relationships in AWWW.
An RDF space is anything that can reasonably be said to explicitly
contain zero or more RDF triples and has an identity distinct from the triples
it contains. Therefore, an RDF space is a mutable container, like a “set” data
structure in programming. It may hold some RDF triples. Two spaces can happen to
have the same contents (right now) while being distinct from each other. Spaces’
contents may change: today a particular space might contain the triples {
my:a my:b _:x. my:a my:c _:x }
, and tomorrow it might instead contain {
my:a my:b _:x. my:a my:c2 _:x }
.
The term “RDF space” is intended to be synonymous with the term “slot” used in SPARQL 1.1 Update (in place of the immutable RDF Graph currently used in that document) when used in the context of a SPARQL Graph Store and its contents. However, an RDF space is intended to be a more broadly applicable term to be used whenever referring to a mutable RDF container. The state of an RDF Space at any time is an RDF Graph.
Examples of an RDF space include but are not limited to the following:
...provided that the requirement for mutability is maintained. That is, each of the above examples would not but spaces if the only met the definition of an RDF Graph.
Examples of things that are not spaces:
A dataset is defined by SPARQL 1.1 as a structure consisting of:
This definition forms the basis of the SPARQL Query semantics; each query is performed against the information in a specific dataset.
Although the term is sometimes used more loosely, a dataset is a pure mathematical structure, like an RDF Graph or a set of integers, with no identity apart from its contents. Two datasets with the same contents are in fact the same dataset, and one dataset cannot change over time.
The word “default” in the term “default graph” refers to the fact that, in SPARQL, this is the graph a server uses to perform a query when the client does not specify which graph to use. The term is not related to the idea of a graph containing default (overridable) information. The role and purpose of the default graph in a dataset varies with application.
SPARQL formally defines a named graph, to be any of the (name, graph) pairs in a dataset.
In practice, the term is often used to refer to the graph part of those pairs. This is the usage we follow in this document, saying that a graph is a named graph in some dataset if and only if it appears as the graph part of a (name, graph) pair in that dataset. Note that “named graph” is a relation, not a class: we say that something is a named graph of a dataset, not simply that it is a named graph.
SPARQL 1.1 Update defines a mutable (time-dependent) structure corresponding to a dataset, called a Graph Store. It is defined as:
SPARQL's notion of a Graph Store is a “mutable container of RDF graphs managed by a single service” that can be manipulated through the SPARQL Update language and/or through the SPARQL HTTP Graph Store Protocol.
The definition in the SPARQL 1.1 clearly refers to a mutable graph for a “slot’; in other words, a “slot” in this definition is actually an RDF space. The “distinguished slot” corresponds to the default graph of a dataset.
A dataset can be thought of as the state of a Graph Store, just like an RDF graph can be thought of as the state of an RDF space.
Note that the term “named graph” is also sometimes used to refer to the slot part of the (name, slot) pairs in a Graph Store. For example, the text of SPARQL 1.1 Update says, “This example copies triples from one named graph to another named graph”. For clarity, we avoid calling these “named graphs” (which refer to immutable content) and instead call them “named slots”, or RDF spaces, of the Graph Store.
Figure 1 gives an overview of the relationships among the different concepts.
The intended destination document for this material is RDF 1.1
Semantics.
The interpretation of an RDF dataset is the interpretation of its default graph. The presence or absence of named graphs does not affect the truth of a dataset.
This semantics can also been referred to as “quoting” semantics, because an interpretation has no relevance to the triples inside the individual named graphs, only to the triples in the Default Graph. This quoting behavior is considered to be important; it avoids the “superman” effects that plagued RDF reification.
A semantic issue related to dataset, and not reflected by the statment above, is whether a “name” can be a blank node or not. This is a decision to be taken by the Working Group.
This section needs revision by experts in formal semantics. It is
intended to express the same interpretation as the preceding section, but may
require more work to indeed do so. If no suitable mathematical formalism can be
used, or if the resulting formalism would become too complicated, the Working
Group may decide not to add anything more than the formal sentence above to the
RDF Semantics.
This section suggests an interpretation of RDF Datasets, as a possible extension to the various RDF and RDFS interpretations defined in the RDF Semantics document.
In this section the “equality” of graphs in a dataset means that they are
mutually inferable through simple entailment.
Let DS = (DG, (u1,G1),…,(un,Gn))
be a dataset. The vocabulary for the dataset is defined as V(DS) = V(DG)
∪ {ui: i = 1,…,n} ∪ rdfV, where V(DG) is the
vocabulary set of DG, and rdfV is the RDF Vocabulary (as defined
in the RDF Semantics document). The following conditions on V(DG) also
hold:
Let I be an RDF interpretation on V(DS) for which the following conditions also hold:
then I is also an interpretation of the RDF Dataset. Replacing rdfV by the corresponding RDFS or OWL Vocabulary the same definition automatically extends to these (in the case of OWL that means the RDF Compatible Semantics of OWL).
There have been discussions in the group on (slightly) more complex
semantics for datasets (see, e.g., on
of the proposals). An earlier discussion occurred around a possible
extension point that would give the possibility for different communities and/or
applications to define their own semantics. If the group finds a consensus on this
(or similar mechanism) then this could end up in the final documents, otherwise
the group may stay silent on this.
A possible extension point for the Semantics is to assign types to graph names. By default, in case of a named graph pair (n,G), the additional
n rdf:type rdf:Graph .
triple also holds (this must be added to the semantic constraints of the interpretation function). Further classes can be defined by communities; for example, a community may define
ex:nonQuote rdfs:subClassOf rdf:Graph . n rdf:type ex:nonQuote .
which signals a reasoner that the content of G should be merged with the default graph for the purpose of graph interpretation and inference. Another example is
ex:GetSemantics rdfs:subClassOf rdf:Graph . n rdf:type ex:GetSemantics .
which signals the RDF environment that doing an HTTP GET operation on 'n' should result in a serialization of the graph 'G'.
The RDF Working Group has not decided whether to define some or any of these additional classes or not. By default, the definition of these classes is intended to be left to communities.
The intended destination documents for this material are the individual syntax specification documents.
This section contains specifications of languages for serializing datasets. Dataset information may also be conveyed and manipulated using SPARQL or using RDF triple-based tools and languages.
Specification of TriG is possibly the subject of a separate
Recommendation or Note to be published by the Working Group.
The current TriG grammar, slightly reformulated to link to the current Turtle Grammar, is as follows:
[1g] | trigDoc |
::= | statement* |
[2g] | statement |
::= | directive
"." | namedGraph
| wrappedDefault |
[3g] |
namedGraph |
::= |
iri "="?
"{" triples "}"
"."? | "{"
"}" "."? |
[4g] |
wrappedDefault |
::= |
"{" triples
"}" "."? |
"{" "}" "."? |
Where the grammar symbols directive
,
triples
, and iri
are
defined in the
Turtle
Grammar
Some notes on this grammar:
=
” character between the name and
the graph, and an optional “.
” after the graph.An issue with the current grammar is its incompatibility with the SPARQL grammar. As Turtle has been brought together with SPARQL as a result of WG the resolution of ISSUE-1, similar argument can hold for TriG grammar: try to ensure, as much as possible, compatibility with SPARQL. The corresponding, alternative syntax may therefore be:
[1g] | trigDoc |
::= | statement* |
[2g] | statement |
::= | directive
"." | triples
| namedGraph
| wrappedDefault |
[3g] |
namedGraph |
::= |
"GRAPH"? iri "="?
"{" triples "}"
"."? | "{"
"}" "."? |
[4g] |
wrappedDefault |
::= |
"{" triples
"}" "."? |
"{" "}" "."? |
This syntax:
GRAPH
keyword preceding the graph
nameNote that the usage of the “=
” remains as a possible
source incompatibility but maintaining it ensures that deployed TriG
content remain valid. (It is unclear how widely that particular idiom is used,
i.e., how much deployed material would be broken if it was removed from the
grammar.)
The Working Group has to make a decision on whether the SPARQL
compatible syntax should be chosen over the current TriG syntax, and whether the
usage of the "=
" character should remain in case the SPARQL
compatible syntax is chosen.
The current syntax allows for an empty graph to be expressed in TriG. That detail has to be reinforced or invalidated by a WG resolution.
Should we call this something other than Trig, since it’s a bit different? Also, to avoid confusion, it may be useful to refer to this language explicitly as an extension to Turtle. Qurtle? Mugr (multi-graph-rdf)? Turtle2? Turtle Full?
Are blank node labels scoped to the document, the curly-brace expression, or the graph name? Assuming document-scope for now. This is Issue-21.
If TriG is to be published as a document by the RDF Working Group, the Working Group should register a media type for TriG that is different from the media type of Turtle.
Several possible extensions to the TriG syntax were considered, but rejected because they would break compatibility both with SPARQL 1.1 and deployed TriG content. Some of these are:
g1 { ... }; :lastModified ....
JSON-LD already has a syntax for datasets; this section is just a placeholder for further synchronization between the current JSON-LD terminology and the RDF Working Group's evolving notions.
There are no plans to extend the RDF/XML syntax to include named graphs.
This document takes no position on syntactical changes to
N-Quads, on whether N-Quads should be standardized separately or published as a
WG Note. This has to be decided by the Working Group.