Comparing SPARQL and SQL by Example

Property Selection

Property selection is similar in SPARQL and SQL; both use the keyword SELECT to introduce a list if "columns" to return to the querier. I'm using "columns" to correspond to the information slots in SQL and SPARQL. In SQL, these are named attributes in the tables listed in the later FROM clause. In SPARQL, these are variables bound in the later WHERE clause. For both SPARQL and SQL, the symbol * calls for the return of all columns.

All Properties of a Launch

The trivial SQL way to investigate the properties of a record is to use SELECT * FROM someTable WHERE someAttribute=someValue. Let's use Apollo 7's launch for example:

SELECT *
  FROM Launch
 WHERE launchword="1968-089";

-- ask for all attributes
-- from the records in the Launch table
-- which have a particular launchword.

id	date	launchword	site	vehicDesc
764	1968-10-11	1968-089	94	Saturn 1B

The analog in SPARQL is similar, though the launchword above is embedded in a URL used for the subject of some triples:

PREFIX craft: <http://nasa.dataincubator.org/launch/>
SELECT * {
  launch:1968-089 ?p ?o
}

# Prefixes shorten the body of the query.
# Return tbe predicate ?p and object ?o
#   for all triples with the given subject.

?p	?o
<http://purl.org/net/schemas/space/launched>	"1968-10-11"^^<http://www.w3.org/2001/XMLSchema#date>
<http://purl.org/net/schemas/space/launchsite>	<http://nasa.dataincubator.org/launchsite/capecanaveral>
<http://purl.org/net/schemas/space/launchvehicle>	"Saturn 1B"
<http://purl.org/net/schemas/space/spacecraft>	<http://nasa.dataincubator.org/spacecraft/1968-089A>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.org/net/schemas/space/Launch>

The first thing we notice is that the axes are different; factoids come in triples in RDF and in n-ary units in relational databases. We see also that SPARQL gave us URLs where SQL gave us the integers used in foreign keys. These integers are more compact, and, if you know the schema and have access to this particular database, can be used to join against other tables to get the attributes of the refrenced items as well. By contrast, the Semantic Web identifiers reported by SPARQL are in a global information space. Authors and custodians of other data are invited to re-use the same identifiers, promoting connectivity between databases and greatly enhancing data integration.

SQL Joins

The queries in the last example returned the attributes of a launch where, if we wanted to see some attributes of a launch, we'd have to look in the LaunchSite table, which in turn references the PlaceName and CountryName tables.

Attributes of Spacecraft Launched from Spain

SELECT Spacecraft.*
  FROM Spacecraft
  INNER JOIN Launch
     ON Launch.id=Spacecraft.launch
  INNER JOIN CountryName
     ON CountryName.site=Launch.site
  WHERE CountryName.label REGEXP "Spain";

-- All attributes of Spacecraft.
-- Step from Spacecraft
--  through Spacecraft.launch
--  to Launch.id .
-- Launch's foreign key to LaunchSite
--  = CountryName's foreign key to LaunchSite .
-- Launched by Spain.

id	launch	mission	designator	name	agency	mass	homepage	craftDesc
4500	3923	`NULL`	1997-018A	Minisat 01	Spain	200	<htt…8A>	Minis…es.

As it turns out, the agency that oversaw the launch happened to be "Spain" as well, but that's certainly not a rule. Also, because this database only has "mission" data for Apolly missions, the mission attribute is NULL.

For comparison, here's that same query in SPARQL:

PREFIX space: <http://purl.org/net/schemas/space/>
SELECT ?id ?p ?o {
  ?site space:country ?country
  FILTER REGEX (?country, "Spain")
  ?launch space:launchsite ?site .
  ?id space:launch ?launch .
  ?id ?p ?o .
}

(This is this same as

PREFIX space: <http://purl.org/net/schemas/space/>
SELECT ?id ?p ?o {
  ?id ?p ?o .
  ?id space:launch [ space:launchsite [ space:country ?country ] ]
  FILTER REGEX (?country, "Spain")
}

# Similar to previous SPARQL query
# but looking for a spacecraft.

, rewritten for speed.)

?p	?o
<http://purl.org/dc/elements/1.1/description>	"Minis…es. "
<http://purl.org/net/schemas/space/agency>	"Spain"
<http://purl.org/net/schemas/space/alternateName>	"24779"
<http://purl.org/net/schemas/space/discipline>	<http://nasa.dataincubator.org/discipline/engineering>
<http://purl.org/net/schemas/space/discipline>	<http://nasa.dataincubator.org/discipline/astronomy>
<http://purl.org/net/schemas/space/internationalDesignator>	"1997-018A"
<http://purl.org/net/schemas/space/launch>	<http://nasa.dataincubator.org/launch/1997-018>
<http://purl.org/net/schemas/space/mass>	"200.0"
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.org/net/schemas/space/Spacecraft>
<http://xmlns.com/foaf/0.1/homepage>	<http://nssdc.gsfc.nasa.gov/database/MasterCatalog?sc=1997-018A>
<http://xmlns.com/foaf/0.1/name>	"Minisat 01"

There are lot of attibutes to sort through. To make it easy, here they are lined up:

row identifier	discipline	discipline	designator	launch	mission	designator	name	agency	mass	homepage	craftDesc
4500				3923	`NULL`	1997-018A	Minisat 01	Spain	200	<htt…8A>	Minis…es.
<htt…1997-018A>	engineering	astronomy	"24779"	<htt…1997-018>		"1997-018A"	"Minisat 01"	"Spain"	200	<htt…8A>	Minis…es.

We can see that where SQL returned a NULL mission, SPARQL didn't return any triple at all. We also see that SPARQL gave us many more properties than did the SQL query. That's because all of the many-to-many properties in SQL are normalized out into their own tables. A more appropriate analog for the above SPARQL query is:

SELECT Spacecraft.*, disc.labelword
  FROM Spacecraft
  INNER JOIN Launch
     ON Launch.id=Spacecraft.launch
  INNER JOIN CountryName
     ON CountryName.site=Launch.site
   LEFT OUTER JOIN AltCraftName
     ON AltCraftName.spacecraft=Spacecraft.id
   LEFT OUTER JOIN
     (
      SELECT SpacecraftDiscipline.spacecraft,
             Discipline.labelword
        FROM SpacecraftDiscipline
       INNER JOIN Discipline
          ON Discipline.id=SpacecraftDiscipline.discipline
     ) AS disc ON disc.spacecraft=Spacecraft.id
  WHERE CountryName.label REGEXP "Spain";

-- 
-- 
-- 
-- 
-- 
-- 
-- LEFT OUTER JOIN (described below) so the query
-- succeeds even without corresponding AltCraftNames.
-- 
-- The ()s demark a subquery described below.
-- Match against 0 or more disciplines
--  by joining through SpacecraftDiscipline
--  to Discipline.
-- 
-- 
-- 
--

id	craftword	launch	mission	designator	name	agency	mass	homepage	craftDesc	labelword
4500	1997-018A	3923	NULL	1997-018A	Minisat 01	Spain	200	<ht…18A>	Min…es.	astronomy
4500	1997-018A	3923	NULL	1997-018A	Minisat 01	Spain	200	<ht…18A>	Min…es.	engineering

Constraints

Constraints in SPARQL and SQL are somewhat similar. In SPARQL, they expressed in a FILTER clause. We've seen an example of this in the previous section. In SQL, constraints in the WHERE and ON clauses additionally capture the foreign keys connecting the tables. Examining all of the constraints in the previous SQL query:

…    ON Launch.id=Spacecraft.launch
…    ON CountryName.site=Launch.site
…         ON Discipline.id=SpacecraftDiscipline.discipline
…              ON disc.spacecraft=Spacecraft.id
…    ON AltCraftName.spacecraft=Spacecraft.id
… WHERE CountryName.label REGEXP "Spain"

-- foreign key
-- foreign key
-- foreign key
-- foreign key
-- solution restriction criteria

A couple more minor differences between SQL and SPARQL constraints

Most SPARQL FILTERs must be enclosed ()s.
SPARQL uses && and ||where SQL uses AND and OR.

are illustrated in this query:

Largest US and USSR Spacecraft

SELECT agency, name, mass
  FROM Spacecraft
 WHERE
    (agency="United States" AND mass>100000)
 OR (agency="U.S.S.R"       AND mass>70000);

PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?agency ?name ?mass {
  ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name
  FILTER (
    (?agency="United States" && xsd:float(?mass)>100000)
  ||(?agency="U.S.S.R"       && xsd:float(?mass)>70000))
}

?agency	?name	?mass
"U.S.S.R"	"Buran"	73000.0
"United States"	"STS 35/Astro 1"	102420.0
"United States"	"STS 95"	103069.0

Aggregates

While we're on the subject of functions, and hurling heavy things into space, let's compare some minor stats for the US and USSR. Above, we stumbled across the largest vehicles the two superpowers launched. SPARQL and SQL have functions like min, max, avg, sum and count for this.

Max Vehicle Weight by Agency

What are the heaviest payloads that each agency has launched?

SELECT agency, MAX(mass) AS m
 FROM Spacecraft
GROUP BY agency
ORDER BY m

PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?agency (MAX(xsd:float(?mstr)) AS ?m) {
  ?craft space:agency ?agency ; space:mass ?mstr
} GROUP BY ?agency ORDER BY ?m

?agency	?m
"Denmark"	3.0
"The Netherlands"	6.5
…31 rows elided…	…
"Thailand"	6505.0
"Peoples Republic of China"	7600.0
"International"	11000.0
"Russia"	19640.0
"Luxembourg"	23000.0
"U.S.S.R"	73000.0
"United States"	103069.0

Apparently Luxembourg launched Luxembourg.

Unions

In both SPARQL and SQL, a UNION is equivalent to performing the queries on either side of the UNION and concatenating the results. The Largest US and USSR Spacecraft query we performed earlier can be re-expressed as a UNION to illustrate this point:

UNION of US and USSR Largest Spacecraft

 SELECT agency, name, mass
   FROM Spacecraft
  WHERE agency="United States" AND mass>100000
UNION
 SELECT agency, name, mass
   FROM Spacecraft
  WHERE agency="U.S.S.R"       AND mass>70000;

Note the requirements for type consistency in UNIONs described previous article.

SPARQL results carry bindings of variables, insulating the user from errors in ordering SELECTed variables. They also carry the type in every returned term, so even if a variable is bound to terms of different type in different rows, the results include those types.

Below is a SPARQL equivalent to the SQL UNION:

PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?agency ?name ?mass {
  {
  ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name
  FILTER ((?agency="United States" && xsd:float(?mass)>100000))
  }
UNION
  {
  ?craft space:agency ?agency ; space:mass ?mass ; foaf:name ?name
  FILTER ((?agency="U.S.S.R"       && xsd:float(?mass)>70000))
  }
}

A common use of UNION in SPARQL is to find all of the arcs connecting to an RDF node.

Arcs In and Out

Find all of the arcs with <http://nasa.dataincubator.org/launch/1997-018> as the subject or object:

SELECT * {
  { ?sIn ?pIn <http://nasa.dataincubator.org/launch/1997-018> }
UNION
  { <http://nasa.dataincubator.org/launch/1997-018> ?pOut ?oOut }
}

?sIn	?pIn	?pOut	?oOut
<http://nasa.dataincubator.org/spacecraft/1997-018A>	<http://purl.org/net/schemas/space/launch>	`NULL`	`NULL`
<http://nasa.dataincubator.org/spacecraft/1997-018B>	<http://purl.org/net/schemas/space/launch>	`NULL`	`NULL`
`NULL`	`NULL`	<http://purl.org/net/schemas/space/launched>	"1997-04-21"^^<http://www.w3.org/2001/XMLSchema#date>
`NULL`	`NULL`	<http://purl.org/net/schemas/space/launchsite>	<http://nasa.dataincubator.org/launchsite/canaryislands>
`NULL`	`NULL`	<http://purl.org/net/schemas/space/launchvehicle>	"Pegasus XL"
`NULL`	`NULL`	<http://purl.org/net/schemas/space/spacecraft>	<http://nasa.dataincubator.org/spacecraft/1997-018B>
`NULL`	`NULL`	<http://purl.org/net/schemas/space/spacecraft>	<http://nasa.dataincubator.org/spacecraft/1997-018A>
`NULL`	`NULL`	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.org/net/schemas/space/Launch>

These sorts of queries are very common in getting to know a dataset. In RDF, the graph of data is explicit while in SQL the connections within the data are implicitly defined via the foreign keys. While in principle, SQL's INFORMATION_SCHEMA provides users with a way to query for foreign keys, the implementation varies wildly across databases. Even if it were consistent, it is syntactically impossible in SQL to derive an attribute name and use it in a selection or constraint. For example, in order to get data analogous to the results of the SPARQL query, we'd need query the INFORMATION_SCHEMA to compose the joins against referenced tables.

SELECT Spacecraft.designator AS spacecraft, Launch.launchword,
       Launch.date, LaunchSite.labelword AS launchsite, Launch.vehicDesc
  FROM Launch
 INNER JOIN Spacecraft
    ON Spacecraft.launch=Launch.id
 INNER JOIN LaunchSite
    ON LaunchSite.id=Launch.site
 WHERE Launch.launchword="1997-018";