RDF Stream Processors Implementation
In this wiki page we post shared knowledge on system and techniques to process RDF Stream Models.
C-SPARQL and C-SPARQL Engine
Continuous SPARQL (C-SPARQL) [1] is a language for continuous queries over streams of RDF data that extends SPARQL 1.1 [1]. For instance, hereafter, we represent in RDF the tweet that Tim Berners-Lee posted live from the middle of the Olympic stadium during the opening ceremony of London 2012 Olympic Games:
[] sioc:content "This is for everyone #london2012 #oneweb #openingceremony"; sioc:has_creator :timberners_lee; sioc:topic :london2012, :oneweb, :openingceremony .
C-SPARQL queries consider windows, i.e., the most recent triples of such streams, observed while data is continuously flowing. The following C-SPARQL query, for instance, counts for each hashtag the number of tweets in a time window of 15 minutes that slides every minute.
1. REGISTER STREAM HashtagAnalysis AS 2. CONSTRUCT { [] sld:about ?tag ; sld:count ?n . } 3. FROM STREAM <http://.../London2012> [RANGE 15m STEP 1m] 4. WHERE { { SELECT ?tag (COUNT(?tweet) AS ?n) 5. WHERE { ?tweet sioc:topic ?tag . } 6. GROUP BY ?tag } 7. }
The REGISTER STREAM clause, at Line 1, asks to register the continuous queries that follows the AS clause. The query considers a sliding window of 15 minutes that slides every minute (see clause [RANGE 15m STEP 1m], at Line 3) and opens on the RDF stream of tweets about the Olympic games (see clause FROM STREAM at Line 3). The WHERE clause, at Line 5, matches the hashtags of each tweet in the window. Lines 6 asks to group the matches by hashtag. Line 4 projects for each hashtag the number of tweets that contains it. Finally, Line 2 constructs the RDF triples that are streamed out for further down-stream analyses.
The figure aside presents a high-level description of the C-SPARQL Engine architecture.
The engine is capable of registering queries and running them continuously, accordingly to the configuration of the engine. To this end, the C-SPARQL Engine uses two sub-components, a Data Stream Management System (DSMS) and a SPARQL Engine.
The former is responsible of executing continuous queries over RDF Streams, producing temporal RDF Snapshots, the latter gets a RDF Snapshot as input and runs a standard SPARQL query against it, producing a continuous result formatted as specified in the REGISTER clause of the continuous query: if the SELECT or ASK forms are used the result is a Data Stream, if the CONSTRUCT form is used, the result is an RDF Stream. Both the SPARQL Engine and the DSMS are plug-ins of the C-SPARQL engine. The binaries of the C-SPARQL Engine use Esper and Apache Jena-ARQ.
For more information have a look to:
References
[1] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010)
CQELS and CQELS-QL
CQELS-QL [2] is a declarative query language built from SPARQL 1.1 grammar. As C-SPARQL allows it extends SPARQL with operators to query stream. Differently form C-SPARQL, which adds the RDF stream as a new type of dataset, CQELS-QL adds a query pattern to present window operators on RDF Stream into the GraphPatternNotTriples pattern.
GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | MinusGraphPattern | GraphGraphPattern | StreamGraphPattern | ServiceGraphPattern | Filter |Bind
Assuming that each stream has an IRI as identification, the StreamGraphPattern pattern is defined as follows. StreamGraphPattern ::= ‘STREAM’ ‘[’ Window ‘]’ VarOrIRIref ‘{’TriplesTemplate‘}’
Window ::= Rangle|Triple|‘NOW’|‘ALL’ Range ::= ‘RANGE’ Duration (‘SLIDE’ Duration |'TUMBLING')? Triple ::= ‘TRIPLES’ INTEGER Duration ::= (INTEGER ‘d’|‘h’|‘m’|‘s’|‘ms’|‘ns’)+
where VarOrIRIRef and TripleTemplate are patterns for the variable/IRI and triple template of SPARQL 1.1, respectively. Range corresponds to a time-based window while Triple corresponds to a triple-based window. The keyword SLIDE is used for specifying the sliding parameter of time-based window, whose time interval is specified by Duration. [NOW] window is used to indicate that only the triples at the current timestamp are kept. [ALL] window is used to indicate that all the triples will be kept in the window. Note that VarOrIRIRef fragment allows the queries with variables on stream URIs, this is used for run-time stream discovery scenarios.
For example, the following query finds two people who can reach each other from two different and directly connected locations. It is worth to note the usage of the two stream graph patterns at Line 6 and 7. The one at Line 6 determines where people are now. The one at Line 7 determines where people have been in the last 3 seconds. The rest of the query is plain SPARQL 1.1. The FILTER clause at Line 8 excludes the cases when two sensors have detected the same person in two different locations, while the graph pattern at Line 5 checks that locations are connected. Note that this query is composed from two windows on one single stream <http://deri.org/streams/rfid> (line 6,7).
1. PREFIX lv: <http://deri.org/floorplan/> 2. SELECT ?person1 ?person2 3. FROM NAMED <http://deri.org/floorplan/> 4. WHERE { 5. GRAPH <http://deri.org/floorplan/> {?loc1 lv:connected ?loc2} 6. STREAM <http://deri.org/streams/rfid> [NOW] {?person1 lv:detectedAt ?loc1} 7. STREAM <http://deri.org/streams/rfid> [RANGE 3s] {?person2 lv:detectedAt ?loc2} 8. FILTER (?person1!=?person2) 9. }
Apart from allowing variables on stream URIS and multiple windows on a single stream, another difference between C-SPARQL and CQELS-QL is in the R2S operator supported. CQELS-QL supports only Istream, whereas C-SPARQL supports only Rstream.
Differently from C-SPARQL Engine that uses a “black box” approach which delegates the processing to other engines,
CQELS (Continuous Query Evaluation over Linked Streams) [2] proposes a “white box” approach (see Figure aside) and implements the required query operators natively to avoid the overhead and limitations of closed system regimes. CQELS provides a flexible query execution framework with the query processor dynamically adapting to the changes in the input data. During query execution, it continuously reorders operators according to heuristics that improve query execution in terms of delay and complexity. Moreover, external disk access on large Linked Data collections is reduced with the use of data encoding and caching of intermediate query results. It returns both data streams and RDF streams depending on the query form used.
References
[2] Danh Le Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, Manfred Hauswirth: A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data. International Semantic Web Conference (1) 2011: 370-388
SPARQLstream and Morph-streams processor
SPARQLstream [4]
is an extension of SPARQL that supports operators over RDF streams such as time windows. It is very similar to C-SPARQL. The two main differences are:
• SPARQLstream provides syntax to express all the R2S operators while C-SPARQL only support the Rstream one, and • C-SPARQL has the timestamp function that allows to perform basic temporal comparisons on the streamed triples, while SPARQLstream does not have it. Morph-streams is an RDF stream query processor that uses Ontology-based data access techniques [5] for the continuous execution of SPARQLstream queries against virtual RDF streams that logically represent concrete data streams. Morph-streams uses R2RML [6] to define mappings between ontologies and data streams. SPARQLstream queries are rewritten in streaming query over the data streams. The rewriting process does not translate directly a SPARQLstream query in a continuous query in the target language of the underlying DSMS. It represents, instead, the query as a relational algebra expressions extended with time window constructs. This allows performing logical optimizations (including pushing down projections, selections, and join and union distribution) and translating the algebraic representation into a target language or REST API request. For instance, the SPARQLstream query below asks for observations whose value is larger than 10 in the last 5 seconds.
SELECT ?proximity FROM STREAM <http://streamreasoning.org/SensorReadings.srdf> [NOW–5 S] WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?proximity; FILTER (?proximity>10) }
Morph-stream translates this query into the algebraic representation: πtime,prox (σprox>10(ω5 seconds(O)) where O is the stream of observations, ω is the window operator and σ and π are the selection and projection operators of relational algebra. The algebraic representation can be translated into both DSMS continuous queries, e.g., for SNEE or ESPER, and Sensor Middleware REST API invocation, e.g., GSN or Cosm /Xively . The results of the translation process are shown in Table 1.
Language | Query |
---|---|
SNEE | SELECT prox FROM sensors [FROM NOW-5 MINUTES TO NOW] WHERE prox >10 |
Esper | SELECT prox FROM sensors.win:time(5 minute) WHERE prox >10 |
GSN | http://…/multidata?vs[0]=sensors&field[0]=proximity_field& c_min[0]=10& from=15/05/2012+05:00:00&to=15/05/2012+10:00:00 |
Cosm/Xively | http://api.cosm.com/v2/feeds/…/datastreams/…?start=2012-05-15T05:00:00Z&end=2012-05-15T10:00:00Z |
Of course several limitations in the target language may prevent a complete translation. Table 2 reports on the possibility to translate from SPARQLstream into the various target languages.
Features | Esper | SNEE | GSN | Cosm/Xively |
---|---|---|---|---|
Projection | Ok | Ok | Ok | Fixed |
Proj expression | Ok | Ok | No | No |
Joins | Ok | Ok (only window) | No | No |
Union | No | Ok (not windows) | Ok | No |
Selection | Ok | Ok | Ok | Limited |
Aggregates | Ok | Ok | No | No |
Time window | Ok | Ok | Ok | Ok |
Tuple window | Ok | Ok | Ok | No |
R2S | Ok | Ok | No | No |
Conjunction, disjunction | Ok | No | No | No |
Repetition pattern | Ok | No | No | No |
Sequence | Ok | No | No | No |
References
- [4] Jean-Paul Calbimonte, Óscar Corcho, Alasdair J. G. Gray: Enabling Ontology-Based Access to Streaming Data Sources. International Semantic Web Conference (1) 2010: 96-111
- [5] Jean-Paul Calbimonte, Hoyoung Jeung, Óscar Corcho, Karl Aberer: Enabling Query Technologies for the Semantic Sensor Web. Int. J. Semantic Web Inf. Syst. 8(1): 43-63 (2012)
- [1] Short description
EP-SPARQL and ETALIS
EP-SPARQL
extends SPARQL language with Event Processing (EP) capabilities.
ETALIS is a rule-based engine for Event Processing and Stream Reasoning.
ETALIS (Event TrAnsaction Logic Inference System) [7] is a Complex Event Processing system (CEPs) and a Stream Reasoner. As input data format it uses RDF statements annotated with two timestamps (see also Section 1.2). Users can specify event processing tasks in ETALIS using two declarative rule-based languages called: ETALIS Language for Events (ELE) and Event Processing SPARQL (EP-SPARQL) [8]. Both languages have the same semantics: complex events are derived from simpler events by means of deductive prolog rules. ETALIS not only supports typical event processing constructs (e.g., such sequence, concurrent conjunction, disjunction, negation), but it also supports reasoning about events. For example, as CEPs, it allows to check for A -> B: an event of type A followed by an event of type B). However, it also allows stating that C is a subclass of A. Therefore the condition A -> B will be matched also if an event of type C is followed by an event of type B, because all events of type C are also events of type A. Moreover, the two languages supports: all operators from Allen's interval algebra (e.g., during, meets, starts, finishes etc.); count-based sliding windows; event aggregation for count, avg, sum, min, max; event filtering, enrichment, projection, translation, and multiplication; processing of out-of-order events (i.e. events that are delayed due to different circumstances e.g. network anomalies etc.); and event retraction (revision). ETALIS is a pluggable system that can use multiple prolog engines such as YAP , SWI , SICStus , XSB , tuProlog and LPA Prolog .
References
- [7] Darko Anicic, Paul Fodor, Sebastian Rudolph, Roland Stühmer, Nenad Stojanovic, Rudi Studer. ETALIS: Rule-Based Reasoning in Event Processing, Chapter in Reasoning in Event-based Distributed Systems, Series in Studies in Computational Intelligence, Sven Helmer, Alex Poulovassilis and Fatos Xhafa. 2010. Springer.
- [8] Darko Anicic, Paul Fodor, Sebastian Rudolph, Nenad Stojanovic. EP-SPARQL: A Unified Language for Event Processing and Stream Reasoning, WWW 2011: Proceedings of the Twentieth International World Wide Web Conference, India.
- [2] Brief overview on EP-SPARQL
- [3] ETALIS-related presentations
- [4] ETALIS homepage
INSTANS
INSTANS (Incremental eNgine for STANding Sparql) has the following characteristics:
- Multiple interconnected SPARQL queries and rules can be compiled into a Rete-like structure
- Supports selected features from SPARQL Query and Update v. 1.1
- Requires no extensions to RDF or SPARQL
- Performs continuous evaluation of incoming RDF data against the compiled set of queries, storing intermediate results
- When all the conditions of a query are matched, the result is instantly available
Even though there are no language extensions for time windows, the same results can be computed as needed. An example of a generic SPARQL-encoded window library processing both tumbling and non-tumbling windows with parameters specified as RDF can be found here. Examples of all the eight types of event processing agents described by Etzion, Niblett and Luckham in "Event Processing in Action" (Manning Publications, July 2010) are described in this paper (presentation also available) with the complete set of queries available in INSTANS github.
References
[5] INSTANS homepage
[6] INSTANS Introduction Slides
[7] Instructions for installation and use
[8] Rinne, M., Nuutila, E., Törmä, S.: INSTANS: High-Performance Event Processing with Standard RDF and SPARQL. Poster in ISWC2012.
RSP query features
The table below summarizes some of the main query features present in RSPs. For completeness we include time-aware RDF approaches which are not RSPs properly. The table may not be complete or updated (recent changes/additions in RSPs), and people are invited to modify it.