Abstract

This document describes how dense geospatial raster data can be represented using the W3C RDF Data Cube (QB) ontology [vocab-data-cube] in concert with other popular ontologies including the W3C/OGC Semantic Sensor Network ontology (SSN) [vocab-ssn], the W3C/OGC Time ontology (Time) [owl-time], the W3C Simple Knowledge Organisation System (SKOS) [skos-reference], W3C PROV-O [prov-o] and the W3C/OGC QB4ST [qb4st]. It offers general methods supported by worked examples that focus on Earth observation imagery. Current triple stores, as the default database architecture for RDF, are not suitable for storing voluminous data like gridded coverages derived from Landsat satellite sensors. However we show here how SPARQL queries can be served through an OGC Discrete Global Grid System for observational data, coupled with a triple store for observational metadata. While the approach may also be suitable for other forms of coverage, we leave the application to such data as an exercise for the reader.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This is expected to be the final release of this document by the Spatial Data on the Web Working Group.

For OGC This is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) — a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.

This document was published by the Spatial Data on the Web Working Group as a Working Group Note. Comments regarding this document are welcome. Please send them to public-sdw-comments@w3.org (subscribe, archives).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2017 W3C Process Document.

1. Introduction

Publishing data on the Web using Linked Data technologies makes it more accessible, easier to discover, and machine-readable. In the context of the rapidly growing availability and importance of earth observation data, this work aims to leverage the Linked Data approach to data publishing to make such data both much more easily usable by non-specialists and much more easily integrated with other Web data in applications. Linked Data has worked well for multi-dimensional statistical data using the RDF Data Cube [vocab-data-cube]. Following this success, Earth Observation imagery can be readily modelled as a Data Cube with the three dimensions of latitude, longitude, and time. This simple conceptualisation and its encoding as Linked Data may be convenient for scientists and consumer app developers everywhere, and especially to statisticians such as those in National Statistics Organisations.

Satellite imagery is commonly modelled as a multidimensional grid coverage, as discussed in [sdw-bp]. The large number of data points that is typical of coverage data such as Landsat imagery means that publishers may be justifiably reluctant to address the size explosion that accompanies converting data to RDF. While such a conversion provides maximum machine-readability, many benefits of Linked Data can be realized with a compromise approach where only the metadata is directly expressed in RDF. Further benefits can be realized by storing voluminous gridded coverage data in more efficient storage representations and using specialised middleware to generate an RDF representation on-the-fly to respond to service requests.

This document illustrates that approach showing how Earth Observation imagery can be published as Linked Data using the RDF Data Cube vocabulary [vocab-data-cube] in concert with other relevant ontologies including the W3C/OGC Semantic Sensor Network ontology (SSN) [vocab-ssn], the W3C/OGC Time ontology (Time) [owl-time], the W3C Simple Knowledge Organisation System (SKOS) [skos-reference], W3C PROV-O [prov-o] and the W3C/OGC QB4ST [qb4st]. We show how SPARQL queries can be served through a scalable OGC Discrete Global Grid System for observation data, coupled with a triple store for observational metadata.

Throughout the document we refer to relevant Use Cases and Requirements of the Spatial Data on the Web Working Group (UCR) [sdw-ucr] and Best Practices of the Spatial Data on the Web Working Group (BP) [sdw-bp]. Those references may be helpful to provide real-world applications and further rationale for the approach described here. We refer to extracts from a small example for illustration. The complete source file for the example is ANU-LED example.

2. The RDF Data Cube

The RDF Data Cube [vocab-data-cube] is a standard for representing multidimensional data as RDF. It is typically used for numerical data that is associated with geographic regions (e.g. suburbs) and classifications (e.g. age, industry, or time periods). Common practice includes using the SKOS vocabulary to define the concepts being reported [ Observed property in coverage]. The RDF Data Cube vocabulary allows the publisher to define all the relevant components of their data and the concepts they quantify, including:

These techniques can be easily adapted to coverages, as the data model is flexible enough to define the appropriate attributes. Here we follow BP Choose the coordinate reference system to suit your user's applications and BP State how coordinate values are encoded. By assigning a temporal dimension to the datacube, BP Describe properties that change over time (option 3) is straightforward.

Example 1
:lat a qb:DimensionProperty ;
    rdfs:subPropertyOf geo:lat .

:long a qb:DimensionProperty ;
    rdfs:subPropertyOf geo:long .

:time a qb:DimensionProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:dataPixelValue a qb:MeasureProperty ;
    rdfs:range xsd:integer ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

# in pixels per degree
:resolution a qb:AttributeProperty ;
    rdfs:range xsd:double .

The ontology QB4ST [qb4st] extends the Data Cube for extra power and consistency when describing spatio-temporal aspects of data. [Georeferenced spatial data]. Any number of such dimensions can be defined, allowing for 1D, 2D, 3D or 4D coverages [Support for 3D, Time series, 4D model of space-time].

Example 2
:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

2.1 Metadata and data

Traditionally, there is a distinction between data, that is the observations proper such as Landsat pixels and metadata, which adds context to the observations such as resolution. In Linked Data modelling, this distinction is not strict. However, it is possible to separate the two in a typical Data Cube.

The value of an RDF Data Cube component can be attached to each individual observation or to the dataset as a whole. Dataset-wide metadata can therefore be distinguished from the rest of the dataset, because it is attached to the qb:DataSet object. This makes it easy to fetch the metadata alone with a simple SPARQL query. This dataset-wide description alone is already a useful (and web-of-data friendly) approach to publishing spatial data [ Spatial metadata].

Here we demonstrate BP Describe the positional accuracy of spatial data, BP Include spatial metadata in dataset metadata, and BP Provide geometries on the Web in a usable way. Further, BP Use globally unique persistent HTTP URIs for spatial things is applied at the level of image pixels. We can also see an example of using the PROV ontology [prov-o] for earth observation imagery provenance. Alternatively a lineage ontology that extends the PROV ontology to reflect the lineage and lineage-extended components of ISO 19115 metadata is available.

Example 3
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :band "4" ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain ;
    prov:wasGeneratedBy :ANU-led-resampling .

:p1 a :Pixel ;
    qb:dataSet :exampleDataset ;
    :lat "90.5556";
    :long "41.2444";
    :time "2001-10-26T21:32:52"^^xsd:dateTime ;
    :dataPixelValue "15"^^xsd:integer ;
    :resolution "2.7"^^xsd:double ;
    :dggsCell "R00004" ;
    :bounds  "POLYGON((90.37 41.45, 90.74 41.45, 90.74 41.04, 90.37 41.04, 90.37 41.45))"^^ogc:wktLiteral ;
    prov:wasDerivedFrom :example-tile .

The RDF Data Cube also enables much more detailed metadata, like separate provenance for each observation. While it is not practical to serve Landsat imagery with such detailed metadata attached to each pixel, it may be reasonable to attach such metadata to aggregated tiles of pixels. In this case, each qb:Observation will be a whole tile (:GridSquare) rather than an individual pixel [Support for tiling]. Note that this technique applies BP Use globally unique persistent HTTP URIs for spatial things at the level of image tiles.

Example 4
:dataImageValue a qb:MeasureProperty ;
    rdfs:range xsd:anyURI ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

:R000 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :time "2001-10-26T21:32:52"^^xsd:dateTime ;
    :dataImageValue <http://www.example.org/led-example-image-R000> ;
    :resolution "0.9"^^xsd:double ;
    :dggsCell "R000" ;
    :dggsLevelSquare "3" ;
    :dggsLevelPixel "4" ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    prov:wasDerivedFrom :example-tile .

3. A spectrum of linkiness

In the ideal web of data, every single observation has a unique URI, can be queried using SPARQL, and has metadata attached to it. Upon hearing this, anyone familiar with Landsat data would be forgiven for rejecting the whole enterprise as entirely impractical. But all is not lost! Most of the benefits of Linked Data (namely, linkability, enhanced discoverability, machine-readability) can be realized by just publishing the dataset-wide metadata in this format. More 'linkiness' provides diminishing returns along with increasing costs. Publishers must decide on the appropriate compromise position for their data.

To characterize the spectrum, we can broadly define three applications of RDF for coverages. From most to least costly, these are: to store a coverage dataset, to serve a coverage (“serialization”), and to describe the metadata of a coverage (“description”).

3.1 Storing a coverage

RDF data is typically stored natively in a triple store. The Data Cube, and RDF in general, are too verbose to be viable for storing large coverages.

3.2 Serving a coverage

In this model, coverage data is physically stored in some more appropriate format (such as HDF5). Specialized middleware implements a virtual triple store by receiving SPARQL queries from a client and responding with dynamically-generated RDF. Such a response may be verbose, but the cost is much lower than physically storing the whole coverage as RDF. Query optimization is also necessary for this to be viable. Furthermore, we suggest using tiles for each qb:Observation in the RDF Data Cube, rather than individual pixels [Support for tiling]. This significantly reduces the blowup that comes from encoding data as RDF [Compressible].

The key advantage of serving a coverage in RDF is that the entire coverage, and individual tiles within it, become linkable [Linkability]; this could be a major contribution to the Linked Data Web. With sufficiently advanced middleware, SPARQL queries over the dataset can be served just as if the data were stored in RDF, but for a fraction of the storage cost. Not only that, but it is possible to make direct SPARQL queries performant through use of spatial data structures and assumptions about data layout, as explained in Implementation. Hence, it is still possible for publishers of dense spatial data to leverage much of the power of linked data.

It is common to want only a chunk of the data available, for example, all observations within 10km of Canberra in the past year, as required for BP [ Expose spatial data through 'convenience APIs']. Regardless of the format chosen, an ability to assign persistent identifiers to these sorts of queries is essential to publishers of coverages. Although the RDF Data Cube offers predefined chunks of triples called qb:slices for this purpose, coverage applications typically demand a greater degree of flexibility. Our approach is to let the publisher define appropriate chunks [Reference data chunks] using SPARQL queries. For example, FILTERs with inequalities can be used to return all tiles of a particular resolution within a particular spatial rectangle. If using this method to denote chunks, publishers should make it easy for a user to select chunks without the use of SPARQL directly, e.g. by providing an interface to generate the appropriate query using a few predefined operators.

3.3 Describing a coverage

A large portion of the benefits of Linked Data may be realized by describing only the metadata of a coverage in RDF. Such a dataset can be linked to [Linkability], and its essential properties are naturally machine-readable [Discoverability, Machine to machine]. The coverage itself can remain in whatever efficient format the publisher prefers, while following BP Include spatial metadata in dataset metadata and BP Encoding spatial data. Here, BP Use globally unique persistent HTTP URIs for spatial things is applied at the level of a qb:Dataset.

Whatever approach is taken, it should be as easy as possible for the user to grab just the metadata, without having to figure out how to write an appropriate query. The definition of a qb:DataSet and the associated qb:DataStructureDefinition can serve this role, but it is still up to the publisher to make it easy for the user to download those definitions.

It is also helpful if the user can easily identify the domain of a coverage, that is, the spatial and temporal area where measurements are made [ Spatial metadata]. QB4ST [qb4st] does not currently have a term for that, but it might in the future.

Example 5
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :band "4" ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain ;
    prov:wasGeneratedBy :ANU-led-resampling .



:exampleStructure a qb4st:SpatioTemporalDSD ;
    qb:component :spatialDomainComponent ,
                 :temporalDomainComponent ,
                 :latitudeComponent ,
                 :longitudeComponent ,
                 :timeComponent ,
                 :satelliteComponent ,
                 :instrumentComponent ,
                 :bandComponent ,
                 :dataImageComponent ,
                 :dataPixelComponent ,
                 :dggsCellComponent ,
                 :dggsLevelSquareComponent ,
                 :dggsLevelPixelComponent ,
                 :resolutionComponent ,
                 :boundsComponent .

:spatialDomainComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :coverageSpatialDomain .

:temporalDomainComponent a qb4st:TemporalComponentSpecification ;
    qb:attribute :coverageTemporalDomain .

:latitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :lat .

:longitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :long .

:timeComponent a qb4st:TemporalComponentSpecification ;
    qb:dimension :time .

:satelliteComponent a qb:ComponentSpecification ;
    qb:attribute :satellite .

:instrumentComponent a qb:ComponentSpecification ;
    qb:attribute :instrument .

:bandComponent a qb:ComponentSpecification ;
    qb:attribute :band .

:dataImageComponent a qb:ComponentSpecification ;
    qb:measure :dataImageValue .

:dataPixelComponent a qb:ComponentSpecification ;
    qb:measure :dataPixelValue .

:dggsCellComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :dggsCell .

:dggsLevelSquareComponent a qb:ComponentSpecification ;
    qb:dimension :dggsLevelSquare .

:dggsLevelPixelComponent a qb:ComponentSpecification ;
    qb:dimension :dggsLevelPixel .

:resolutionComponent a qb:ComponentSpecification ;
    qb:attribute :resolution .

:boundsComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :bounds .



:coverageSpatialDomain a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf :bounds .

:coverageTemporalDomain a qb:AttributeProperty, qb4st:TemporalProperty ;
    rdfs:range time:DateTimeInterval ;
    qb:concept sdmx-concept:timePeriod .

:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:satellite a qb:AttributeProperty ;
    rdfs:range sosa:Platform ;
    qb:concept sdmx-concept:collMethod .

:instrument a qb:AttributeProperty ;
    rdfs:range sosa:Sensor ;
    qb:concept sdmx-concept:collMethod .

:band a qb:AttributeProperty ;
    rdfs:range xsd:integer .

:dataImageValue a qb:MeasureProperty ;
    rdfs:range xsd:anyURI ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .
    
:dataPixelValue a qb:MeasureProperty ;
    rdfs:range xsd:integer ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

:rHEALPix a qb4st:CRS .

:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs :rHEALPix ;
    qb4st:crslabel "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string ;
    qb:concept sdmx-concept:refArea .

:dggsLevelSquare a qb:DimensionProperty ;
    rdfs:range xsd:integer .

:dggsLevelPixel a qb:DimensionProperty ;
    rdfs:range xsd:integer .

:resolution a qb:AttributeProperty ;
    rdfs:range xsd:double .

:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    rdfs:domain :GridSquare ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" ;
    qb:concept sdmx-concept:refArea .

4. Discrete Global Grid Systems

Discrete global grid systems are a family of spatial reference systems that subdivide the Earth's surface into a hierarchy of cells. Larger cells are subdivided into smaller cells deeper in the hierarchy. A location on the Earth's surface is specified by a cell id, not a latitude and longitude. Smaller cells are more precise, so choosing a cell forces the publisher to include a measure of uncertainty for any spatial measure. Cells are convenient units of tiling for gridded coverages. Each pixel in a tile corresponding to a larger cell can represent a measurement made on a smaller cell in the hierarchy below. The OGC published a standard specification of DGGS in August 2017 as ”Topic 21: Discrete Global Grid Systems Abstract Specification” [OGC-15-104r5].

The ANU-LED example in this document does not require the use of a DGGS. However, the DGGS has some convenient properties that make it particularly suitable for Linked Data. First, each DGGS cell has a unique identifier, so it is easy to generate natural URIs for each chunk of data. Second, the DGGS we use, rHEALPix [rHealPIX], defines cell geometries so that cells at the same level of the hierarchy have equal areas. This makes rHEALPix a suitable format for storing multiple datasets at different resolutions, or several different resolution views of the same dataset. The equal-area constraint means that different resolution pixels are directly comparable, and no resampling is required [Avoid coordinate transformations], as advised by BP Choose the coordinate reference system to suit your user's applications. Third, the hierarchical nature of the DGGS makes it convenient to implement spatial optimizations when responding to queries, by pruning the tree early to eliminate whole regions of unpromising cells that fall outside the desired area.

Data structures other than DGGS are also amenable to these approaches, for example n-dimensional gridded data, whether geospatial or not, and hierarchical structures such as tile sets, octrees and quadtrees.

5. Scalable Implementation

A proof of concept demonstrating the ANU-LED example with a SPARQL query system employing rHEALPix to retrieve satellite imagery has been implemented. This section briefly describes some of the strategies employed to make the implementation efficient. All code referenced here is available on GitHub [led-github].

As discussed previously, scalable implementations of a Data Cube for Earth observations must grapple with the verbosity of RDF representations relative to specialized coverage formats like GeoTIFF. This precludes materializing the entire dataset as RDF, storing it on disk, and serving it using an off-the-shelf triple store. Instead, implementations must employ a “virtual graph”, which can be used to service SPARQL queries without materializing all triples in advance. This approach has precedent: virtual graphs have been used to provide linked data interfaces to relational databases, RSS feeds, and ordinary HTML pages with no semantic markup [ perf-vgraph].

For the purpose of illustrating how triple stores service SPARQL queries—regardless of whether they are backed by virtual or materialized graphs—consider the query below.

Example 6
SELECT ?s ?v WHERE {
    ?s a :egType ;
        rdfs:label "Example" ;
        :value ?v .
    FILTER (?v < 15)
}

The heart of the query above is a Basic Graph Pattern (BGP) which specifies the triples to be accessed. In this case, the BGP contains three patterns. Written explicitly, they are:

Example 7
?s a :egType .
?s rdfs:label "Example" .
?s :value ?v .

Conceptually, a triple store will service the query above by iterating through each triple pattern in turn. First, a set of bindings for ?s will be generated that are consistent with ?s a :egType. That set of bindings will then be filtered by matching them against the pattern ?s rdfs:label "Example". The final ?s :value ?v will further filter the bindings for ?s by considering only subjects ?s with a :value property; it will also introduce a corresponding set of bindings for ?v. Having generated all bindings relevant to the BGP, a typical triple store will then apply the FILTER condition to each. This general approach works for both traditional storage backends (like on-disk RDF databases) and non-traditional ones (like virtual graphs).

In practice, processing each element of a SPARQL query sequentially is too inefficient to be of use in a large database. Instead, triple stores employ a range of optimisations to combine steps of the query process, speed up selected operations, or minimise the number of bindings produced by each stage, as outlined in [ sparql-opt]. For example, a triple store could speed up matching triples of the form ?s a :egType by keeping an index of all URIs associated with each present rdf:type, or could accelerate BGP matching by reordering the pattern to ensure that the most restrictive patterns are evaluated first.

Although we do not materialise our RDF triples, similar techniques are applicable to our virtual graph middleware. As a simple illustration of the optimisation opportunities available, we have implemented two simple optimisations:

  1. User-supplied triple patterns often generate highly selective constraints on the data returned by a SPARQL query. For example, the pattern ?s :dggsLevelSquare 5 . allows a virtual graph implementation to ignore all observations not corresponding to cells at the fifth level of the DGGS hierarchy. In a naive implementation, only one such BGP can be considered at a time; this makes strategies for BGP ordering essential. In contrast, a virtual graph query processor can simultaneously consider all supplied constraints in conjunction. For instance, if the user specifies ?s :dggsLevelSquare 5; :etmBand 3, then the virtual graph implementation can safely narrow its search to observations at level 5 of the DGGS hierarchy which correspond to Landsat's third ETM band.
  2. Consumers of spatial datasets typically request only a small spatial rectangle of the available data. In SPARQL, such a rectangle can be identified by a FILTER statement restricting the appropriate location properties. By inspecting the contents of FILTER statements, virtual graph implementations can preemptively narrow the set of bindings they generate to include only bindings which are spatially relevant. In general, this approach can yield excellent gains when the spatial extent of queries is small relative to the spatial extent of the overall dataset, which is typical of Earth Observation imagery.

These simple optimizations can improve query time substantially. Consider the following SPARQL query, which fetches the intensity (?val) and URI (?s) associated with each single-pixel observation in a satellite imagery database. Note the use of custom :latMin and :longMax to define the edges of a bounding box—we have included these in our demonstration system for ease of implementation, but it is expected that in a production system would use GeoSPARQL-style FILTERs together with the WKT-formatted :bounds predicate used elsewhere in this document.

Example 8
SELECT DISTINCT ?s ?val WHERE {
    ?s a :Pixel
        :etmBand "1"^^xsd:int ;
        :dggsLevelSquare "5"^^xsd:int ;
        # See comment above on :latMin/:longMax
        :latMin ?latMin ;
        :longMax ?longMax ;
        :dataPixelValue ?val .
    # Everything north-west of Parliament House
    FILTER (?latMin > -35.3082
        && ?longMax < 149.1244)
}

The above query was executed on a 500MB HDF5 dataset containing over 4000 distinct observations. Repeating the query a thousand times with ten concurrent clients on a desktop machine yielded the following mean running times. In the following, the “naive” implementation simply iterates through the BGP specified above on a pattern-by-pattern basis, subsequently passing results to the SPARQL engine for evaluation against the filter constraint. “Multiple pattern-matching” corresponds to the first optimization identified above, and “additional spatial optimizations” refers to a combination of the first and second optimizations.

Implementation Mean runtime (± standard deviation)
Naive 378ms (±65.5ms)
…with multiple-pattern matching 35ms (±22.2ms)
…with additional spatial optimisations 17ms (±11.8ms)

“Multiple-pattern matching” is a relatively simple optimization, yet is sufficient to improve query performance tenfold. Accounting for the bounding box constraint specified in the query improves performance by another factor of two. It is likely that further performance gains could be found with more sophisticated optimizations. In particular, processing queries with general polygonal spatial constraints could be further improved by employing an R-tree or some other specialized spatial data structure.

To demonstrate the practical utility of our system, we produced a simple web-based client application. The client application is able to fetch Landsat imagery and its associated metadata via SPARQL queries. It can then overlay the retrieved images on a movable map. As mentioned previously, code for both the client and sever is available on GitHub [led-github].

Client application running in a browser; shows colour Landsat image in background and dump of image's metadata (in JSON) on the left.
Figure 1 A screenshot of the client application running in a browser.

6. Use of existing ontologies

RDF makes it easy to re-use terms defined in external ontologies and some of the most widely applicable are explained here. See the ANU-LED example for some specific examples of these.

6.1 SSN

The Semantic Sensor Network ontology [vocab-ssn] defines terms which can be used to describe satellite sensors that collect Earth observation data [ Sensor metadata]. The ANU-LED example illustrates a minimal description of Landsat 8 OLI observations using SSN [ SSN-like representation]. Much more detailed descriptions are possible. In particular, SSN descriptions can be attached to individual tiles [ Quality per sample], demonstrating BP Include spatial metadata in dataset metadata.

Example 9
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

:landsat-8 a sosa:Platform ;
    owl:sameAs cci-platform:plat_landsat_8 .

:OLI a sosa:Sensor ;
    sosa:isHostedBy :landsat-8 ;
    sosa:observes :reflectance ;
    owl:sameAs cci-sensor:sens_oli .

:reflectance a sosa:ObservableProperty, ssn:Property, skos:Concept ;
    owl:sameAs sweet:Reflectance ;
    owl:sameAs cci-dataType:dtype_sr .

6.2 PROV-O

The PROV ontology [prov-o] allows the provenance of data to be traced [ Provenance]. It provides terms for describing what entities the data is based on, what processes were used to convert those entities into others and into the final data, and what individuals and organisations were responsible for those processes. PROV-O descriptions can be attached at the dataset level, and also at the individual observation or tile level to indicate precisely from which source material each observation is derived.

Example 10
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    prov:wasGeneratedBy :ANU-led-resampling .

:ANU-led-resampling a prov:Activity ;
    prov:wasAssociatedWith :DmitryBrizhinev ;
    prov:used :AGDC .

:DmitryBrizhinev a prov:Agent, prov:Person ;
    foaf:givenName "Dmitry"^^xsd:string ;
    foaf:mbox      <mailto:dmitry.brizhinev@anu.edu.au> .

:AGDC a prov:Collection ;
    prov:wasAttributedTo :GeoscienceAustralia ;
    prov:hadMember :example-tile .

:example-tile a prov:Entity ;
    prov:alternateOf <http://dapds00.nci.org.au/thredds/catalog/rs0/tiles/EPSG4326_1deg_0.00025pixel/LS8_OLI_TIRS/148_-035/2016/catalog.html?dataset=rs0/tiles/EPSG4326_1deg_0.00025pixel/LS8_OLI_TIRS/148_-035/2016/LS8_OLI_TIRS_FC_148_-035_2016-01-12T23-55-57.tif> .

:GeoscienceAustralia a prov:Agent, prov:Organization .

:R000 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :dataImageValue <http://www.example.org/led-example-image-R000> ;
    prov:wasDerivedFrom :example-tile .

6.3 Latitude and longitude

Spatial data best practice eschews unqualified uses of “latitude” and “longitude”. Commonly, these terms refer to the WGS-84 Coordinate Reference System (CRS), but data published according to BP State how coordinate values are encoded should always make its CRS explicit [Georectification]. In RDF, the WGS-84 geo vocabulary is often used, with its provided geo:lat and geo:long properties. QB4ST defines the qb4st:crs property to identify a CRS definition [ CRS definition, Spatial metadata]. The RDF Data Cube and QB4ST make is easy to define several CRSs and to use them simultaneously, providing clients with several views of the data [Multiple CRSs]. In the example below, a grid square can be identified by the latitude and longitude of its centroid, by its boundary, or by its rHEALPix cell.

Example 11
:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:rHEALPix a qb4st:CRS .

:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs :rHEALPix ;
    qb4st:crslabel "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string .

:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" .

:latitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :lat .

:longitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :long .

:dggsCellComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :dggsCell .

:boundsComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :bounds .

:R000 a :GridSquare ;
    :lat "91.6667";
    :long "40.0270";
    :dggsCell "R000" ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

6.4 GeoSPARQL

The GeoSPARQL ontology [geosparql] defines some terms for reasoning about objects and shapes in space [ Spatial operators]. It allows for the use of several encodings, including WKT, to describe polygons [ Encoding for vector geometry]. The ANU-LED example uses these terms to define the area covered by individual tiles in the coverage, and also to define the entire spatial domain of a dataset, as required for BPs Include spatial metadata in dataset metadata, and BP Provide geometries on the Web in a usable way.

Example 12
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    rdfs:domain :GridSquare ;
    qb4st:crs <http://epsg.io/4326> ;
    qb4st:crslabel "WGS84" ;
    qb:concept sdmx-concept:refArea .

:R000 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :dataImageValue <http://www.example.org/led-example-image-R000> ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

6.5 SKOS concepts

The RDF Data Cube is commonly used in conjunction with a SKOS [skos-reference] concept scheme (such as SDMX-RDF and its concept scheme) to define the meanings of the components [Observed property in coverage]. It is appropriate to use this for coverages also, but appropriate SKOS concepts do not always exist. They may need to be published along with the data proper.

Example 13
:reflectance a sosa:ObservableProperty, ssn:Property, skos:Concept ;
    owl:sameAs sweet:Reflectance ;
    owl:sameAs cci-dataType:dtype_sr .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:satellite a qb:AttributeProperty ;
    rdfs:range sosa:Platform ;
    qb:concept sdmx-concept:collMethod .

:instrument a qb:AttributeProperty ;
    rdfs:range sosa:Sensor ;
    qb:concept sdmx-concept:collMethod .

:dataPixelValue a qb:MeasureProperty ;
    rdfs:range xsd:integer ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

:rHEALPix a qb4st:CRS .

:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs :rHEALPix ;
    qb4st:crslabel "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string ;
    qb:concept sdmx-concept:refArea .

6.6 OWL-Time

Coverages should be annotated appropriately with the times observations were taken [ Coverage temporal extent], that is BP Describe properties that change over time. OWL-Time [owl-time] defines terms for time intervals that are useful for expressing the temporal domain of the dataset. It also allows temporal reference systems other than the Gregorian calendar. However, for Gregorian time instants which are typically used for Earth observation data, a datatype property using the built-in xsd:dateTime datatype is sufficient.

QB4ST defines terms that work well together with OWL-Time.

Example 14
:coverageTemporalDomain a qb:AttributeProperty, qb4st:TemporalProperty ;
    rdfs:range time:DateTimeInterval ;
    qb:concept sdmx-concept:timePeriod .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain .

:timeDomain a time:Interval ;
    time:hasBeginning :timeBeginning ;
    time:hasEnd :timeEnd .

:timeBeginning a time:Instant ;
    time:inXSDDateTime "2001-10-26T21:32:52"^^xsd:dateTime .

:timeEnd a time:Instant ;
    time:inXSDDateTime "2001-10-26T21:32:52"^^xsd:dateTime .

:R000 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :time "2001-10-26T21:32:52"^^xsd:dateTime .

7. Summary of ontologies used

Prefix Namespace Reference
cci-dataType: http://vocab-test.ceda.ac.uk/collection/cci/dataType/ [cci]
cci-platform: http://vocab-test.ceda.ac.uk/collection/cci/platform/
cci-sensor: http://vocab-test.ceda.ac.uk/collection/cci/sensor/
foaf: http://xmlns.com/foaf/0.1/ [foaf]
geo: http://www.w3.org/2003/01/geo/wgs84_pos# [w3c-basic-geo]
ogc: http://www.opengis.net/ont/geosparql# [geosparql]
owl: http://www.w3.org/2002/07/owl# [owl2-primer]
prov: http://www.w3.org/ns/prov# [prov-o]
qb: http://purl.org/linked-data/cube# [vocab-data-cube]
qb4st: http://www.w3.org/ns/qb4st/ [qb4st]
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# [rdf-concepts]
rdfs: http://www.w3.org/2000/01/rdf-schema# [rdf-schema]
sdmx-concept: http://purl.org/linked-data/sdmx/2009/concept#
skos: http://www.w3.org/2004/02/skos/core# [skos-reference]
sosa: http://www.w3.org/ns/sosa/ [vocab-ssn]
ssn: http://www.w3.org/ns/ssn/
sweet: http://sweet.jpl.nasa.gov/2.3/prop.owl# [sweet]
time: http://www.w3.org/2006/time# [owl-time]
xsd: http://www.w3.org/2001/XMLSchema# [swbp-xsch-datatypes]

A. Acknowledgements

This work would not be possible without the TechLauncher program of the Australian National University and its ardent convenor, Shayne Flint. We also thank Matthew Purss of Geoscience Australia for participating in the program and supporting this project. Finally, Ed Parsons of Google, Robert Woodcock of CSIRO, Robert Atkinson of the OGC and Bill Roberts of SWIRRL provided valuable discussions and feedback. The editors gratefully acknowledge the contributions of all members of the Spatial Data on the Web Working Group, its chairs Kerry Taylor and Ed Parsons, and W3C and OGC staff Phil Archer, Francois Daoust and Scott Simmons.

B. References

B.1 Informative references

[OGC-15-104r5]
Topic 21: Discrete Global Grid Systems Abstract Specification. 1 August 2017. URL: http://docs.opengeospatial.org/as/15-104r5/15-104r5.html
[cci]
Data Standards Requirements for CCI Data Producers. ESA Climate Office.9 March 2015. URL: http://cci.esa.int/sites/default/files/CCI_Data_Requirements_Iss1.2_Mar2015.pdf
[foaf]
FOAF Vocabulary Specification 0.99 (Paddington Edition). Dan Brickley; Libby Miller. FOAF project. 14 January 2014. URL: http://xmlns.com/foaf/spec/
[geosparql]
GeoSPARQL - A Geographic Query Language for RDF Data. Matthew Perry; John Herring.10 September 2012. URL: http://www.opengeospatial.org/standards/geosparql/
[led-github]
ANU Linked Earth Data. Dmitry Brizhinev; Mike Ledger; Yadunandan Sannappa; Sam Toyer; Zhiduo Zhang.URL: https://github.com/ANU-Linked-Earth-Data
[owl-time]
Time Ontology in OWL. Simon Cox; Chris Little. W3C. 7 September 2017. W3C Proposed Recommendation. URL: https://www.w3.org/TR/owl-time/
[owl2-primer]
OWL 2 Web Ontology Language Primer (Second Edition). Pascal Hitzler; Markus Krötzsch; Bijan Parsia; Peter Patel-Schneider; Sebastian Rudolph. W3C. 11 December 2012. W3C Recommendation. URL: https://www.w3.org/TR/owl2-primer/
[perf-vgraph]
A Performance and Scalability Metric for Virtual RDF Graphs. Michael Hausenblas; Wolfgang Slany; Danny Ayers.June 2007. URL: http://ceur-ws.org/Vol-248/paper2.pdf
[prov-o]
PROV-O: The PROV Ontology. Timothy Lebo; Satya Sahoo; Deborah McGuinness. W3C. 30 April 2013. W3C Recommendation. URL: https://www.w3.org/TR/prov-o/
[qb4st]
QB4ST: RDF Data Cube extensions for spatio-temporal components. Rob Atkinson. W3C. 18 April 2017. W3C Note. URL: https://www.w3.org/TR/qb4st/
[rHealPIX]
The rHEALPix Discrete Global Grid System. R. G. Gibb.2016. URL: http://iopscience.iop.org/1755-1315/34/1/012012
[rdf-concepts]
Resource Description Framework (RDF): Concepts and Abstract Syntax. Graham Klyne; Jeremy Carroll. W3C. 10 February 2004. W3C Recommendation. URL: https://www.w3.org/TR/rdf-concepts/
[rdf-schema]
RDF Schema 1.1. Dan Brickley; Ramanathan Guha. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-schema/
[sdw-bp]
Spatial Data on the Web Best Practices. Jeremy Tandy; Linda van den Brink; Payam Barnaghi. W3C. 11 May 2017. W3C Note. URL: https://www.w3.org/TR/sdw-bp/
[sdw-ucr]
Spatial Data on the Web Use Cases & Requirements. Frans Knibbe; Alejandro Llaves. W3C. 25 October 2016. W3C Note. URL: https://www.w3.org/TR/sdw-ucr/
[skos-reference]
SKOS Simple Knowledge Organization System Reference. Alistair Miles; Sean Bechhofer. W3C. 18 August 2009. W3C Recommendation. URL: https://www.w3.org/TR/skos-reference/
[sparql-opt]
Tutorial: SPARQL Optimization 101. Rob Vesse.URL: https://events.linuxfoundation.org/sites/events/files/slides/SPARQL%20Optimisation%20101%20Tutorial.pdf
[swbp-xsch-datatypes]
XML Schema Datatypes in RDF and OWL. Jeremy Carroll; Jeff Pan. W3C. 14 March 2006. W3C Note. URL: https://www.w3.org/TR/swbp-xsch-datatypes/
[sweet]
SWEET Overview. JPL.URL: https://sweet.jpl.nasa.gov/
[vocab-data-cube]
The RDF Data Cube Vocabulary. Richard Cyganiak; Dave Reynolds. W3C. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-data-cube/
[vocab-ssn]
Semantic Sensor Network Ontology. Armin Haller; Krzysztof Janowicz; Simon Cox; Danh Le Phuoc; Kerry Taylor; Maxime Lefrançois. W3C. 7 September 2017. W3C Proposed Recommendation. URL: https://www.w3.org/TR/vocab-ssn/
[w3c-basic-geo]
Basic Geo (WGS84 lat/long) Vocabulary. Dan Brickley. W3C Semantic Web Interest Group. 1 February 2006. URL: https://www.w3.org/2003/01/geo/