SPARQL/Extensions/Aggregates
SQL contains aggregate functions to select and return aggregate functions of multiple result values after grouping query solutions in a certain way. The SPARQL specification contains no machinery for dealing with aggregates, though there are ways to query for some universal aggregates like MIN
or MAX
.
Several SPARQL implementations support aggregates:
- OpenLink Virtuoso supports
COUNT
,COUNT DISTINCT
,MAX
,MIN
andAVG
in queries and subqueries. Virtuoso does not implement an explicitGROUP BY
clause, instead implicitly grouping solution results by all variables appearing in aggregate functions in a query projection. Virtuoso does not implement aHAVING
clause, but the functionality can be emulated via subqueries. Virtuoso allows aggregate functions to be used as arguments to other projected expressions and allows the arguments to aggregate functions to be arbitrary expressions. - ARQ supports
COUNT
andCOUNT DISTINCT
. ARQ implements aGROUP BY
clause that can act on either variables or expressions. Expressions in aGROUP BY
can be named and then selected from the query, providing a way of selecting arbitrary expressions. IfGROUP BY
is omitted, then ARQ groups on all variables in the query pattern. ARQ implements aHAVING
clause that can filter the result set after grouping. - ARC supports
COUNT
,MAX
,MIN
,AVG
, andSUM
. ARC requires that aggregate functions in a query's projection be named with theAS
keyword. ARC implements aGROUP BY
clause that must be present if anything other a single aggregate is selected. ARC only allows variables (not expressions) in aggregate functions orGROUP BY
conditions. - Glitter, part of Open Anzo, supports
COUNT
andCOUNT DISTINCT
. Glitter implements aGROUP BY
clause that can only contain variables.
A paper on RAP's SPARQL DB engine discusses aggregates. ?? Does RAP implement aggregates?
Design Questions
What happens when aggregate functions are applied to results with unbound values or mixed data types?
does anyone have an answer?
Fundamentals
SQL is a very old language, and the meaning of all but the simplest aggregation statements in SQL is opaque because of the notation, and is also highly implementation-dependent.
- This is not true wrote Chimezie...
Chimezie -- If you are interested, I can supply an example SQL query on which Oracle and MySql return different results, both of which are intuitively wrong to most people. The different results are concrete evidence of a shortcoming. The general problem is that there is no model theory or other implementation independent standard that specifies what the results of any query should be. -- Adrian
This raises the question -- Why stick with 1970s style SQL-like syntax for SPARQL aggregation?
adriandwalker-at-gmail-dot-com suggests that, instead of the 1970s-style SQL aggregation notation, it would benefit SPARQL to use a rule-based notation similar to the examples in
www.reengineeringllc.com/demo_agents/Aggregation.agent