In an earlier article, I introduced the RDF and SQL datamodels. That article explored a simple addressbook to describe at a high-level the structure of SPARQL and SQL queries. This article compares more of the language features of SPARQL and SQL, comparing two representations of a database of space craft. The first is the Talis Space Database and the second is an SQL database derived from it. This article will cover:
The relational database used in the examples is described by this UML diagram:
Property selection is similar in SPARQL and SQL; both use the keyword SELECT
to introduce a list if "columns" to return to the querier.
I'm using "columns" to correspond to the information slots in SQL and SPARQL.
In SQL, these are named attributes in the tables listed in the later FROM clause.
In SPARQL, these are variables bound in the later WHERE clause.
For both SPARQL and SQL, the symbol *
calls for the return of all columns.
The trivial SQL way to investigate the properties of a record is to use SELECT * FROM someTable WHERE someAttribute=someValue
.
Let's use Apollo 7's launch for example:
SELECT * FROM Launch WHERE launchword="1968-089";
-- ask for all attributes -- from the records in theLaunch
table -- which have a particularlaunchword
.
id | date | launchword | site | vehicDesc |
---|---|---|---|---|
764 | 1968-10-11 | 1968-089 | 94 | Saturn 1B |
The analog in SPARQL is similar, though the launchword
above is embedded in a URL used for the subject of some triples:
PREFIX craft: <http://nasa.dataincubator.org/launch/> SELECT * { launch:1968-089 ?p ?o }
# Prefixes shorten the body of the query. # Return tbe predicate?p
and object?o
# for all triples with the given subject.
?p | ?o |
---|---|
<http://purl.org/net/schemas/space/launched> | "1968-10-11"^^<http://www.w3.org/2001/XMLSchema#date> |
<http://purl.org/net/schemas/space/launchsite> | <http://nasa.dataincubator.org/launchsite/capecanaveral> |
<http://purl.org/net/schemas/space/launchvehicle> | "Saturn 1B" |
<http://purl.org/net/schemas/space/spacecraft> | <http://nasa.dataincubator.org/spacecraft/1968-089A> |
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://purl.org/net/schemas/space/Launch> |
The first thing we notice is that the axes are different; factoids come in triples in RDF and in n-ary units in relational databases. We see also that SPARQL gave us URLs where SQL gave us the integers used in foreign keys. These integers are more compact, and, if you know the schema and have access to this particular database, can be used to join against other tables to get the attributes of the refrenced items as well. By contrast, the Semantic Web identifiers reported by SPARQL are in a global information space. Authors and custodians of other data are invited to re-use the same identifiers, promoting connectivity between databases and greatly enhancing data integration.
The queries in the last example returned the attributes of a launch where, if we wanted to see some attributes of a launch, we'd have to look in the LaunchSite
table, which in turn references the PlaceName
and CountryName
tables.
SELECT Spacecraft.* FROM Spacecraft INNER JOIN Launch ON Launch.id=Spacecraft.launch INNER JOIN CountryName ON CountryName.site=Launch.site WHERE CountryName.label REGEXP "Spain";
-- All attributes of Spacecraft. -- Step fromSpacecraft
-- throughSpacecraft
.launch -- toLaunch
.id . --Launch
's foreign key toLaunchSite
-- =CountryName
's foreign key toLaunchSite
. -- Launched by Spain.
id | launch | mission | designator | name | agency | mass | homepage | craftDesc |
---|---|---|---|---|---|---|---|---|
4500 | 3923 | NULL | 1997-018A | Minisat 01 | Spain | 200 | <htt…8A> | Minis…es. |
As it turns out, the agency that oversaw the launch happened to be "Spain" as well, but that's certainly not a rule.
Also, because this database only has "mission" data for Apolly missions, the mission
attribute is NULL
.
For comparison, here's that same query in SPARQL:
PREFIX space: <http://purl.org/net/schemas/space/>
SELECT ?id ?p ?o {
?site space:country ?country
FILTER REGEX (?country, "Spain")
?launch space:launchsite ?site .
?id space:launch ?launch .
?id ?p ?o .
}
(This is this same as
PREFIX space: <http://purl.org/net/schemas/space/> SELECT ?id ?p ?o { ?id ?p ?o . ?id space:launch [ space:launchsite [ space:country ?country ] ] FILTER REGEX (?country, "Spain") }
# Similar to previous SPARQL query # but looking for a spacecraft.
, rewritten for speed.)
?p | ?o |
---|---|
<http://purl.org/dc/elements/1.1/description> | "Minis…es. " |
<http://purl.org/net/schemas/space/agency> | "Spain" |
<http://purl.org/net/schemas/space/alternateName> | "24779" |
<http://purl.org/net/schemas/space/discipline> | <http://nasa.dataincubator.org/discipline/engineering> |
<http://purl.org/net/schemas/space/discipline> | <http://nasa.dataincubator.org/discipline/astronomy> |
<http://purl.org/net/schemas/space/internationalDesignator> | "1997-018A" |
<http://purl.org/net/schemas/space/launch> | <http://nasa.dataincubator.org/launch/1997-018> |
<http://purl.org/net/schemas/space/mass> | "200.0" |
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://purl.org/net/schemas/space/Spacecraft> |
<http://xmlns.com/foaf/0.1/homepage> | <http://nssdc.gsfc.nasa.gov/database/MasterCatalog?sc=1997-018A> |
<http://xmlns.com/foaf/0.1/name> | "Minisat 01" |
There are lot of attibutes to sort through. To make it easy, here they are lined up:
row identifier | discipline | discipline | designator | launch | mission | designator | name | agency | mass | homepage | craftDesc |
---|---|---|---|---|---|---|---|---|---|---|---|
4500 | 3923 | NULL | 1997-018A | Minisat 01 | Spain | 200 | <htt…8A> | Minis…es. | |||
<htt…1997-018A> | engineering | astronomy | "24779" | <htt…1997-018> | "1997-018A" | "Minisat 01" | "Spain" | 200 | <htt…8A> | Minis…es. |
We can see that where SQL returned a NULL
mission
, SPARQL didn't return any triple at all.
We also see that SPARQL gave us many more properties than did the SQL query.
That's because all of the many-to-many properties in SQL are normalized out into their own tables.
A more appropriate analog for the above SPARQL query is:
SELECT Spacecraft.*, disc.labelword FROM Spacecraft INNER JOIN Launch ON Launch.id=Spacecraft.launch INNER JOIN CountryName ON CountryName.site=Launch.site LEFT OUTER JOIN AltCraftName ON AltCraftName.spacecraft=Spacecraft.id LEFT OUTER JOIN ( SELECT SpacecraftDiscipline.spacecraft, Discipline.labelword FROM SpacecraftDiscipline INNER JOIN Discipline ON Discipline.id=SpacecraftDiscipline.discipline ) AS disc ON disc.spacecraft=Spacecraft.id WHERE CountryName.label REGEXP "Spain";
-- -- -- -- -- -- -- LEFT OUTER JOIN (described below) so the query -- succeeds even without correspondingAltCraftName
s. -- -- The ()s demark a subquery described below. -- Match against 0 or more disciplines -- by joining throughSpacecraftDiscipline
-- toDiscipline
. -- -- -- --
id | craftword | launch | mission | designator | name | agency | mass | homepage | craftDesc | labelword |
---|---|---|---|---|---|---|---|---|---|---|
4500 | 1997-018A | 3923 | NULL | 1997-018A | Minisat 01 | Spain | 200 | <ht…18A> | Min…es. | astronomy |
4500 | 1997-018A | 3923 | NULL | 1997-018A | Minisat 01 | Spain | 200 | <ht…18A> | Min…es. | engineering |
Constraints in SPARQL and SQL are somewhat similar.
In SPARQL, they expressed in a FILTER
clause.
We've seen an example of this in the previous section.
In SQL, constraints in the WHERE
and ON
clauses additionally capture the foreign keys connecting the tables.
Examining all of the constraints in the previous SQL query:
… ON Launch.id=Spacecraft.launch … ON CountryName.site=Launch.site … ON Discipline.id=SpacecraftDiscipline.discipline … ON disc.spacecraft=Spacecraft.id … ON AltCraftName.spacecraft=Spacecraft.id … WHERE CountryName.label REGEXP "Spain"
-- foreign key -- foreign key -- foreign key -- foreign key -- solution restriction criteria
A couple more minor differences between SQL and SPARQL constraints
&&
and ||
where SQL uses AND
and OR
.are illustrated in this query:
SELECT agency, name, mass FROM Spacecraft WHERE (agency="United States" AND mass>100000) OR (agency="U.S.S.R" AND mass>70000);
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?agency ?name ?mass { ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name FILTER ( (?agency="United States" && xsd:float(?mass)>100000) ||(?agency="U.S.S.R" && xsd:float(?mass)>70000)) }
?agency | ?name | ?mass |
---|---|---|
"U.S.S.R" | "Buran" | 73000.0 |
"United States" | "STS 35/Astro 1" | 102420.0 |
"United States" | "STS 95" | 103069.0 |
The last SPARQL query invoked a function, xsd:float(?mstr)
, in the SELECT clause and the FILTER clause.
This was required because the database has strings where it should have floats.
Most of the SPARQL 1.1 functions and operators are inspired by SQL.
SPARQL's functions are extensible directives, like xsd:float(X)
, and the operators built in features of the language, e.g. X || Y
or ?mass > 70000
.
While we're on the subject of functions, and hurling heavy things into space, let's compare some minor stats for the US and USSR.
Above, we stumbled across the largest vehicles the two superpowers launched.
SPARQL and SQL have functions like min
, max
, avg
, sum
and count
for this.
What are the heaviest payloads that each agency has launched?
SELECT agency, MAX(mass) AS m FROM Spacecraft GROUP BY agency ORDER BY m
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?agency (MAX(xsd:float(?mstr)) AS ?m) { ?craft space:agency ?agency ; space:mass ?mstr } GROUP BY ?agency ORDER BY ?m
?agency | ?m |
---|---|
"Denmark" | 3.0 |
"The Netherlands" | 6.5 |
…31 rows elided… | … |
"Thailand" | 6505.0 |
"Peoples Republic of China" | 7600.0 |
"International" | 11000.0 |
"Russia" | 19640.0 |
"Luxembourg" | 23000.0 |
"U.S.S.R" | 73000.0 |
"United States" | 103069.0 |
Apparently Luxembourg launched Luxembourg.
In both SPARQL and SQL, a UNION is equivalent to performing the queries on either side of the UNION and concatenating the results. The Largest US and USSR Spacecraft query we performed earlier can be re-expressed as a UNION to illustrate this point:
SELECT agency, name, mass FROM Spacecraft WHERE agency="United States" AND mass>100000 UNION SELECT agency, name, mass FROM Spacecraft WHERE agency="U.S.S.R" AND mass>70000;
Note the requirements for type consistency in UNIONs described previous article.
SPARQL results carry bindings of variables, insulating the user from errors in ordering SELECTed variables. They also carry the type in every returned term, so even if a variable is bound to terms of different type in different rows, the results include those types.
Below is a SPARQL equivalent to the SQL UNION:
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?agency ?name ?mass { { ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name FILTER ((?agency="United States" && xsd:float(?mass)>100000)) } UNION { ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name FILTER ((?agency="U.S.S.R" && xsd:float(?mass)>70000)) } }
A common use of UNION
in SPARQL is to find all of the arcs connecting to an RDF node.
Find all of the arcs with <http://nasa.dataincubator.org/launch/1997-018>
as the subject or object:
SELECT * { { ?sIn ?pIn <http://nasa.dataincubator.org/launch/1997-018> } UNION { <http://nasa.dataincubator.org/launch/1997-018> ?pOut ?oOut } }
?sIn | ?pIn | ?pOut | ?oOut |
---|---|---|---|
<http://nasa.dataincubator.org/spacecraft/1997-018A> | <http://purl.org/net/schemas/space/launch> |
NULL
|
NULL
|
<http://nasa.dataincubator.org/spacecraft/1997-018B> | <http://purl.org/net/schemas/space/launch> |
NULL
|
NULL
|
NULL
|
NULL
|
<http://purl.org/net/schemas/space/launched> | "1997-04-21"^^<http://www.w3.org/2001/XMLSchema#date> |
NULL
|
NULL
|
<http://purl.org/net/schemas/space/launchsite> | <http://nasa.dataincubator.org/launchsite/canaryislands> |
NULL
|
NULL
|
<http://purl.org/net/schemas/space/launchvehicle> | "Pegasus XL" |
NULL
|
NULL
|
<http://purl.org/net/schemas/space/spacecraft> | <http://nasa.dataincubator.org/spacecraft/1997-018B> |
NULL
|
NULL
|
<http://purl.org/net/schemas/space/spacecraft> | <http://nasa.dataincubator.org/spacecraft/1997-018A> |
NULL
|
NULL
|
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://purl.org/net/schemas/space/Launch> |
These sorts of queries are very common in getting to know a dataset.
In RDF, the graph of data is explicit while in SQL the connections within the data are implicitly defined via the foreign keys.
While in principle, SQL's INFORMATION_SCHEMA
provides users with a way to query for foreign keys, the implementation varies wildly across databases.
Even if it were consistent, it is syntactically impossible in SQL to derive an attribute name and use it in a selection or constraint.
For example, in order to get data analogous to the results of the SPARQL query, we'd need query the INFORMATION_SCHEMA
to compose the joins against referenced tables.
SELECT Spacecraft.designator AS spacecraft, Launch.launchword, Launch.date, LaunchSite.labelword AS launchsite, Launch.vehicDesc FROM Launch INNER JOIN Spacecraft ON Spacecraft.launch=Launch.id INNER JOIN LaunchSite ON LaunchSite.id=Launch.site WHERE Launch.launchword="1997-018";
The second SQL query for attributes of spacecraft launched from Spain, we saw an example of a subquery.
In SPARQL, subqueries are nested inside a pair of { }
s, i.e. { SELECT … { … } }
.
Subqueries are less common in SPARQL than in SQL as idioms
id |
---|
764 |
@@@
@@@
?p |
---|
<http://purl.org/dc/elements/1.1/description> |
Another difference is that in SQL, those table headings you see are like smoke; you cannot calculate them.
id |
---|
764 |
@@@
@@@
?p |
---|
<http://purl.org/dc/elements/1.1/description> |
$Revision: 1.5 $ of $Date: 2012/06/03 21:54:14 $ by $Author: eric $