Abstract

This document advises on best practices related to the publication and usage of spatial data on the Web; the use of Web technologies as they may be applied to location. The best practices are intended for practitioners, including Web developers and geospatial experts, and are compiled based on evidence of real-world application. These best practices suggest a significant change of emphasis from traditional Spatial Data Infrastructures by adopting a Linked Data approach. As location is often the common factor across multiple datasets, spatial data is an especially useful addition to the Linked Data cloud; the 5 Stars of Linked Data paradigm is promoted where relevant.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

As a First Public Working Draft, this document is incomplete. The editors seek to illustrate the full scope of the best practices- albeit with the details missing at this stage. In particular, the examples for each best practice are largely incomplete. The editors intend to compile a much richer set of examples in the period leading up to publication of the next Working Draft. Feedback is requested on the scope of this document and the best practices herein. The editors are particularly keen for reviewers to cite examples that may be used to further illustrate these best practices.

The charter for this deliverable states that the best practices will include "an agreed spatial ontology conformant to the ISO 19107 abstract model and based on existing available ontologies [...]". Rather than creating a new spatial ontology, the editors aim to provide a methodology that will help data publishers choose which exisitng spatial ontology is relevant for their application. If deemed necesssary to meet the stated requirements (see [SDW-UCR]), a new spatial ontology, or elements that extend the existing spatial ontologies, will be established in subsequent Working Draft releases.

The editors also intend to provide supplementary methods to navigate through the best practices in order to increase the utility of this document. This will be addressed for the next Working Draft release.

For OGCThis is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) — a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.

This document was published by the Spatial Data on the Web Working Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to public-sdw-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

The group does not expect this document to become a W3C Recommendation.

This document is governed by the 1 September 2015 W3C Process Document.

Table of Contents

1. Introduction

This section is non-normative.

1.1 General introduction

Data on the Web Best Practices ([DWBP]) outlines a growing interest in publishing and consuming data on the Web. Very often the common factor across multiple datasets is the location data. Spatial data, or data related to a location, is what this Best Practice document is all about.

Definition of "spatial data" is required.

It's not that there is a lack of spatial data on the Web; the maps, satellite and street level images offered by search engines are familiar and there are many more nice examples of spatial data being used in Web applications. However, the data that has been published is difficult to find and often problematic to access for non-specialist users. The key problems we are trying to solve in this document are discoverability and accessibility, and our overarching goal is to bring publishing spatial data into the web mainstream as a mechanism for solving these twin problems.

Is "interoperability" also a top-level problem (alongside discoverability and accessibility)?

Different groups of people have a need for spatial data on the Web.

Commercial operators, including search engine operators, invest a great deal of time and effort in generating geographical databases which are mirrors to Web content with the geographical context often added manually or at best semi-automatically. This process would be much better if data were published on the Web with the appropriate geographic information at the source, so it can be found and accessed using the standard mechanisms of the Web.

Geospatial experts who try to find and use data published in Spatial Data Infrastructures (SDI), can get frustrated by the fact that most web services available for spatial data are in fact Web Map Services (WMS) services, which serve static pictures, not data. Web Feature Services (WFS) services, which allow you to get data, also exist, but are far less common. One could ask: do we really have a Spatial Data Infrastructure or is it mostly just a 'spatial picture infrastructure'?

Note

If you're not a geospatial expert, this may be the first time you've heard terms like SDI, WMS and WFS. But in fact, there is a whole world of geospatial standards, maintained by the Open Geospatial Consortium (OGC), aimed at publishing geospatial data in a related series of standardized Web services and processing it with specialized tools. These technologies have a steep learning curve.

The intended users of the SDI are experts in the geospatial domain. The OGC standards cover the full range of geospatial use cases – some of which are very complex. Because of this, it requires significant expertise in geospatial information technology to be able to use the SDI.

For Web developers, who come from outside the geospatial domain, the data behind the OGC services is part of the "Deep Web" - the data is published behind specialized Web services and is not easy to get at, unless you're an expert. But Web developers are increasingly creating and using data related to locations, e.g. obtained from GPS enabled mobile devices and sensors, so they are important participants in the business of geospatial data.

The public sector creates a lot of geospatial data, much of which is open and can be useful to others who may or may not be geospatial data experts. This includes statistical data, for example regions identified by NUTS codes used as territorial units for statistics by the European Union.

Spatial data often has a temporal component; things move, and boundaries change over time. Geospatial data is becoming more and more important for the Web and its importance is still growing, among other things because of the rise of the Internet of Things. The importance of spatial data is far beyond serving static 'pixels'. Meaningful information about objects is required with clearly expressed semantics.

In short, there is a large demand for spatial data on the Web. But there are some questions around publishing spatial data on the Web that keep popping up and need to be answered. One is that many relevant standards already exist. These include informal 'community standards' - geospatial formats and/or vocabularies that enjoy widespread adoption ([GeoJSON] being a prime example) - and others for which the formal standardization process has not been completed. Where standards have been completed there are competing ideas, and it is often unclear which one you should use. With these factors in mind, this Best Practice document aims to clarify and formalize the relevant standards landscape.

Analysis of the requirements derived from scenarios that describe how spatial data is commonly published and used on the Web (as documented in [SDW-UCR]) indicates that, in contrast to the workings of a typical SDI, the Linked Data approach is most appropriate for publishing and using spatial data on the Web. Linked Data provides a foundation to many of the best practices in this document.

Note

If you are not a Web developer, Linked Data may be one of those buzz words that doesn’t mean much to you. Essentially it is about publishing bite size information resources about real world or conceptual things, using URIs to identify those things, and publishing well described relationships between them, all in a machine readable way that enables data from different sources to be connected and queried.

Questions answered in this document include:

The best practices in this document are based on what is already there. It is not meant to be a 'best theories' document: it takes its examples and solutions from what is being done in practice right now. The examples point to publicly available real-world datasets. In other words, this document is as much as possible evidence based. And where real-world practice is missing, it provides clearly identified recommended practices.

The best practices in this document are designed to be testable by humans and/or machines in order to determine whether you are following them or not. They are based on the idea of using the Web as a data sharing platform, as described in Architecture of the World Wide Web, Volume One [webarch].

This document complies as much as possible with the principles of the Best Practices for Publishing Linked Data [LD-BP] and the (developing) Data on the Web Best Practices [DWBP]. Where it does not, this will be identified and explained.

Devise a way to make best versus emerging practices clearly recognizable in this document.

1.2 Difference between spatial data on the Web and Spatial Data Infrastructure practice

Need to describe how the proposed best practices differ from typical SDI approaches; content to be added.

2. Audience

This section is non-normative.

Details of "Audience" section overlaps with "Introduction"; redraft to avoid duplication.

The audience is the broadest community of Web users possible, three important groups of which are described below. Application and tool builders addressing the needs of the mass consumer market should find value and guidance in the document.

2.1 Spatial data custodians

These are people who already have a spatial dataset (or more than one) as part of existing SDIs and they want to publish it as "spatial data on the web" so that data becomes "mashable". They are vital in liberating existing spatial data, already published but not visible or linkable.

2.2 Web developers

These are people who just want to work with (find, publish, use) spatial data on the Web and are not necessarily experts in spatial technology. Spatial data is just one facet of the information space they work with, and they want to use Web technologies to work with spatial data. These web developers will be writing Web-based applications that either use spatial data directly or help non-technical users publish spatial information on the Web, for example, people who are publishing content about their village fête or local festival including relevant spatial information.

2.3 Content publishers

The first two audience groups mentioned are user types; but spatial data publishers are a third important audience for this document. They want to know how to publish their spatial data so that it can be used to its full potential.

Are "content publishers" sufficiently different from the other defined audience categories? Do we need this category?

3. Scope

This section is non-normative.

This document extends the scope of the [DWBP] to advise on best practices related to the publication and usage of spatial data on the Web, including sensor data and spatial data. Spatial data concerns resources that have physical extent, from buildings and landmarks to cells on a microscope slide. Where [DWBP] is largely concerned with datasets and their distributions (as defined in [vocab-dcat]), this document focuses on the content of the spatial datasets: how to describe and relate the individual resources and entities (e.g. SpatialThings) themselves. The best practices included in this document are intended for practitioners; encouraging publication and/or re-use of spatial data (datasets and data streams) on the Web.

Best practices described in this document fulfill the requirements derived from scenarios that describe how spatial data is commonly published and used on the Web (use cases and requirements are documented in [SDW-UCR]). In line with the charter, this document provides advice on:

Location is often the common factor across multiple datasets, which makes spatial data an especially useful addition to the Linked Data cloud. Departing from the typical approach used in Spatial Data Infrastructures, these best practices promote a Linked Data approach (6.3 Linking spatial data).

Given our focus on spatial data, best practices for publishing any kind of data on the Web are deemed to be out of scope. Where relevant to discussion, such best practices will be referenced from other publications including [DWBP] and [LD-BP]. Other aspects that are out of scope include best practices for:

Compliance with each best practice in this document can be tested by programmatically and/or by human inspection. However, note that determining whether a best practice has been followed or not should be judged based on the intended outcome rather than the possible approach to implementation which is offered as guidance; implementation approaches are subject to change as technology and practices evolve and enhance.

4. Best Practices Template

This section presents the template used to describe Spatial Data on the Web Best Practices.

This is the one used by DWBP - is this fit for purpose for SDW?
Need to provide summary in 'approach to implementation' plus drill-down for more detailed information
Need to review best practices text to ensure that they consistently use an imperative style- like "xx should be yyy"

Best Practice Template

Short description of the BP

Why

This section answers crucial questions:

  • Is the use case specifically about Spatial data on the Web? (Resolved 29/1/2015)
  • is the use case including data published, reused, and accessible via Web technologies? discussed 25/2/2015, agreed
  • Has a use case a description that can lead to a testable requirement? (discussed 25/2/2015, wording?)
  • public vs private web (suggestion to be worded, discussed 25/2/2015)
A full text description of the problem addressed by the best practice may also be provided. It can be any length but is likely to be no more than a few sentences.

Intended Outcome

What it should be possible to do when a data publisher follows the best practice.

Possible Approach to Implementation

A description of a possible implementation strategy is provided. This represents the best advice available at the time of writing but specific circumstances and future developments may mean that alternative implementation methods are more appropriate to achieve the intended outcome.

How to Test

Information on how to test the BP has been met. This might or might not be machine testable.

Evidence

Information about the relevance of the BP. It is described by one or more relevant requirements as documented in the Spatial Data on the Web Use Cases & Requirements document

5. Best Practices Summary

6. The Best Practices

6.1 Assigning identifiers to real world things and information resources

In the geospatial community, the entities within datasets are usually information resources, representing areas on a map, which in turn represent real world things. These information resources are usually called 'features'. A feature is a representation of a real world thing and there could be, and often are, more than one feature representing the same real world thing. A feature often has, as a property, a geometry which describes the location of the feature.

For example, a lighthouse standing somewhere on the coast is a real world thing. In some dataset, an information record about this lighthouse exists: a 'feature'. In current practice, this feature will often have properties that are about the real thing (e.g. the height of the lighthouse) and properties about the information record (e.g. when it was last modified). There could be several features, in different datasets, that refer to the same lighthouse; one dataset has its location and date it was built, another has data about its ownership or about shipwrecks near the same lighthouse.

Mostly, people looking for information are interested in real world things, not in information resources. This means the real world things should get global identifiers so they can be found and referenced. The features and their map representations - geometries and topologies - should have global identifiers too so they can be referenced as well; and must have them if they are managed elsewhere.

Discussion on Features, information resources and real-world Things is unclear and needs redrafting

Best Practice 1: Use globally unique HTTP identifiers for entity-level resources

Entities within datasets SHOULD have unique, persistent HTTP or HTTP(S) URIs as identifiers.

The term "entity-level resources" is confusing and needs to be clarified or replaced.

Why

A lot of spatial data is available 'via the Web' - but not really 'on the web': you can download datasets, or view, query and download data via web services, but it is usually not possible to reference an entity within a dataset, like you would a web page. If this were possible, spatial data would be much easier to reuse and to integrate with other data on the Web.

Intended Outcome

Entities (SpatialThings) described in a dataset will each be identified using a globally unique HTTP URI so that a given entity can be unambiguously identified in statements that refer to that entity.

Possible Approach to Implementation

In order for identifiers to be useful, people should be comfortable creating them themselves without needing to refer to some top-level naming authority- much like how Twitter's hashtags are created dynamically. Good identifiers for data on the web should be dereferenceable/resolvable, which makes it a good idea to use HTTP URIs as identifiers. There is no top down authority that you have to go to in order to create such identifiers for spatial objects. So just make them up yourself if you need them and they don't exist. Best Practice 2: Reuse existing (authoritative) identifiers when available explains how to find already existing identifiers you can reuse.

Read [DWBP] Best Practice 11: Use persistent URIs as identifiers within datasets for general discussion on why persistent URIs should be used as identifiers for data entities. Using URIs means the data can be referenced using standard Web mechanisms; making them persistent means the links won't get broken. Note that ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.

For guidance about how to create persistent URIs, read [DWBP] Best Practice 10: Use persistent URIs as identifiers. Keep in mind not to use service endpoint URLs, as these are usually dependent on technology, implementation, and/or API version and will probably change when the implementation changes.

update reference to DWBP BP 11 #identifiersWithinDatasets

Complete this section and How to Test section

How to Test

...

Evidence

Relevant requirements: R-Linkability.

Best Practice 2: Reuse existing (authoritative) identifiers when available

Avoiding the creation of lots of identifiers for the same resource

Why

In general, re-using identifiers for well-known resources is a good idea, because it discourages proliferation of disparate copies with uncertain provenance. Linking your own resources to well-known or authoritative resources also makes relationships between your data and other data, which refers to the same well-known resource, discoverable. The result is a network of related resources using the identifiers for the SpatialThings.

In the case of SpatialThings, a simple way of indicating a location is by referencing an already existing named place resource on the Web. For example, DBpedia and GeoNames are existing datasets with well-known spatial resources, i.e. besides place names and a lot of other information, a set of coordinates is also available for the resources in these datasets. The advantage of referring to these named place resources is that it makes clear that different resources which refer to, for example, http://dbpedia.org/page/Utrecht, are all referring to the same city. If these resources did not use a URI reference but a literal value "Utrecht" this could mean the province Utrecht, the city Utrecht (both places in the Netherlands), the South African town called Utrecht, or maybe something else entirely.

See also Best Practice 22: Link to resources with well-known or authoritative identifiers.

Some content of this BP may be moved to BP link-to-auth-identifiers.

Intended Outcome

Already existing identifiers for spatial resources are reused instead of new ones being created.

Possible Approach to Implementation

If you've got feature data and want to publish that as linked data, the first step is to see if there's an authoritative URI already available that you can reuse. If so, do that; else refer to Best Practice 1: Use globally unique identifiers for entity-level resources.

DBpedia and GeoNames are examples of popular, community-driven resource collections. Another good source of resource collections is often found in public government data, such as national registers of addresses, buildings, and so on. Mapping and cadastral authorities maintain datasets that provide geospatial reference data.

See Appendix B. Authoritative sources of geographic identifiers for a list of good sources of geographic identifiers.

How to Test

An automatic check is possible to determine if any of the good sources of geographic identifiers are referenced.

Evidence

Relevant requirements: R-GeoReferencedData, R-IndependenceOnReferenceSystems.

Best Practice 3: Working with data that lacks globally unique identifiers for entity-level resources

Spatial reconciliation across datasets

The term "entity-level resources" is confusing and needs to be clarified or replaced.

Why

There are many mechanisms to reconcile (i.e. find related, map or link) objects from different datasets. When two spatial datasets contain geometries, you can use spatial functions to find out which objects overlap, touch, etc. Based on this spatial correlation you might determine that two datasets are talking about the same places, but this is often not enough. For reasons of efficiency or simply for being able to use these spatial correlations in a context where spatial functions are not available, it is a good idea to express these spatial relationships explicitly in your data. There is also danger in relying on spatial correlation alone; you might conclude that two resources represent the same thing when in reality they represent, for example, a shop at ground level and living apartment above it.

Intended Outcome

Links between resources in datasets created from spatial correspondences.

Possible Approach to Implementation

If you want to link two spatial datasets, find out if they have corresponding geometries using spatial functions and then express these correspondences as explicit relationships.

In this best practice we only give guidance on spatial reconciliation (e.g. two mentions of Paris are talking about the same place). We do not address thematic reconciliation.

If the spatial datasets you want to reconcile are managed in a Geographic Information System (GIS), you can use the GIS spatial functions to find related spatial things. If your spatial things are expressed as Linked Data, you can use [GeoSPARQL], which has a set of spatial query functions available.

How to express discovered relationships is discussed in Best Practice 13: Assert known relationships.

This Best Practice needs more content.

So far we have discussed the short comings of using names as identifiers (and the subsequent need for reconciliation). We also need to discuss assigning URIs based on local identifiers; for example, row numbers from tabular data or Feature identifiers from geo-databases.

How to Test

...

Evidence

Relevant requirements: R-Linkability.

Best Practice 4: Provide stable identifiers for Things (resources) that change over time

Even though resources change, it helps when they have a stable, unchanging identifier.

Why

Spatial things can change over time, but as explained in Assigning identifiers to real world things and information resources, their identifiers should be persistent.

Should we reference the paradox of the Ship of Theseus to highlight there is no rigorous notion of persistent identity?

Intended Outcome

Even if a spatial thing has changed, its identifier should stay the same so that links to it don't get broken.

Possible Approach to Implementation

[DWBP] Best Practice 8: Provide versioning information explains how to provide versioning info for datasets. It doesn't provide information about versioning individual resources.

Spatial things can change in different ways, and sometimes the change is such that it's debatable if it's still the same thing. Think carefully about the life cycle of the spatial things in your care, and be reluctant to assign new identifiers. A lake that became smaller or bigger is generally still the same lake.

If your resources are versioned, a good solution is to provide a canonical, versionless URI for your resource, as well as date-stamped versions.

How to Test

Check the identifier for any version-dependent components.

Evidence

Relevant requirements: R-Linkability

Best Practice 5: Provide identifiers for parts of larger information resources

Identify subsets of large information resources that are a convenient size for applications to work with

Is the term "subset" correct?

Why

Some datasets, particularly coverages such as satellite imagery, sensor measurement timeseries and climate prediction data, can be very large. It is difficult for Web applications to work with large datasets: it can take considerable time to download the data and requires sufficient volume local storage to be available. To remedy this challenge, it is often useful to provide identifiers for conveiently sized subsets of large datasets that Web applications can work with.

Intended Outcome

Being able to refer to subsets of a large information resource that are sized for convienient usage in Web applications.

Possible Approach to Implementation

Two possible approaches are described below:

  1. Create named subsets.
    • Determine how users may seek to access the dataset, determining subsets of the larger dataset that are appropriate for the intended application. A data provider may consider a general approach to improve accessibility of a dataset, while a data user might want to publish details of a workflow for others to reuse referencing only the relevant parts of the large dataset.
    • Given the anticipated access pattern, create new resources and mint a new identifier for each subset.
    • Provide metadata to indicate how a given subset resource is related to the original large dataset.
  2. Map a URI pattern to an existing data-access API.
    Note

    Web service URLs in general not a good URI for a resource as it is unlikely to be persistent. A Web service URL is often technology and implementation dependent and both are very likely to change with time. For example, consider oft used parameters such as ?version=. Good practice is to use URIs that will resolve as long as the resource is relevant and may be referenced by others, therefore identifiers for subsets should be protocol independent.

    • Identify the service end-point that provides access to the larger dataset.
    • Determine which parameters offered by the service end-point are required to construct a meaningful subset.
    • Map these parameters into a URI pattern and configure an HTTP server to apply the necessary URL-rewrite.

How to Test

...

Evidence

Relevant requirements: R-Compatibility, R-Linkability, R-Provenance, R-ReferenceDataChunks.

More content needed for this BP.

6.2 Expressing spatial data

It is important to publish your spatial data with clear semantics. The primary use case for this is you already have a database of assets and you want to publish the semantics of this data. Another use case is someone wanting to publish some information which has a spatial component on the web in a form that search engines will understand.

The spatial thing itself as well as its spatial properties have semantics. There are several vocabularies which cover spatial things and spatial properties. If you need extra semantics not available in an existing vocabulary, you should create your own.

How to publish your vocabulary, which describes the meaning of your data, is explained in [LD-BP]. We recommend that you link your own vocabulary to commonly used existing ones because this increases its usefulness. How to do this is out of scope for this document; however, we give some examples of mapping relations you can use from OWL, SKOS, RDFS. And we do the mapping between some commonly used spatial vocabularies.

Note

The current list of RDF vocabularies / OWL ontologies for spatial data being considered by the SDW WG are provided below. Some of these will be used in examples. Full details, including mapping between vocabularies, pointers about inconsistencies in vocabularies (if any are evident), and recommendations avoiding their use as these may lead to confusion, will be published in a complementary NOTE: Comparison of geospatial vocabularies.

Vocabularies can discovered from Linked Open Vocabularies (LOV); using search terms like 'location' or Tags place, Geography, Geometry and Time.

No attempts have yet been made to rank these vocabularies; e.g. in terms of expressiveness, adoption etc.

Note

The motivation behind the ISA Programme Location Core Vocabulary was establishing a minimal core common to existing spatial vocabularies. However, experience suggests that such a minimal common core is not very useful as one quickly need to employ specific semantics to meet one's application needs.

6.2.1 Describing spatial resources

This entire subsection is concerned with helping data publishers choose the right spatial data format or vocabulary. Collectively this section provides a methodology for making that choice. We do this rather than recommending one vocabulary because this recommendation would not be durable as vocabularies are released or amended.

Do we need a subclass of SpatialThing for entities that do not have a clearly defined spatial extent; or a property that expresses the fuzzyness the extent?

Best Practice 6: Provide a minimum set of information for your intended application

When someone looks up a URI for a SpatialThing, provide useful information, using the common representation formats

Why

This will allow to distinguish SpatialThings from one another by looking at their properties; e.g. type, label. It will also allow to get the basic information about SpatialThings by referring to their URI.

Intended Outcome

This requirement should serve a minimum set of information for a SpatialThing against a URI. In general,  this will allow to look up the properties and features of a SpatialThings, and get information from machine-interpretable and/or human-readable descriptions.

Possible Approach to Implementation

This requirement specifies that useful information should be returned when a resource is referenced. This can include:

  • Expressing properties and features of interest for a SpatialThing using common semantic descriptions.
  • Expressing names of places; provides multiple names for your SpatialThings if they are known. These could be toponyms (names that appear on a map) or colloquial names (that local people use for a place). This part will explain in more detail how to provide the names/labels for the spatial things that are referred to. (e.g. a way to do this could be rdfs:label)
  • A 'place' may have an indistinct (or even undefined) boundary. It is often useful to identify spatial things even though they are fuzzy. For example: 'Renaissance Italy' or 'the American West'.
  • Information (about a SpatialThing; a place) should be provided with information about authority (owner, publisher), timeliness (i.e. is it valid now? is it historical data?) and, (if applicable) quality. It is common, for example, that there exist many maps of a place - none of them the same. In that case users need to know who produced each one, to be able to choose the right one to use.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MultilingualSupport, R-SpatialVagueness

Best Practice 7: How to describe geometry

Geometry data should be expressed in a way that allows its publication and use on the Web.

Why

This best practice helps with choosing the right format for describing geometry based on aspects like performance and tool support. It also helps when deciding on whether or not using literals for geometric representations is a good idea.

Intended Outcome

The format chosen to express geometry data should:

  • Support the dimensionality of the geometry;
  • Be supported by the software tools used within data user community;
  • Keep geometry descriptions to a size that is conventient for Web applications.

Possible Approach to Implementation

Steps to follow:

  • Decide on the geometric data representations based on performance; in some cases, geometry can be 95% of the total data size;
  • Determine the dimensionality of geometry data (0d 'point' to 3d 'volume')
  • Choose the right format and deciding on when to use geometry literals. For geometry literals, several solutions are available, like Well-Known Text (WKT) representations, GeoHash and other geocoding representations. The alternative is to use structured geometry objects as is possible, for example, in [GeoSPARQL].
  • There are also several suitable binary data formats (e.g. Google's protocol buffers for vector tiling); however, some binary formats do not (effectively) work on the Web as there are no software tools for working with those formats from within a typical Web application; to work with data in such formats, you must first download the data and then work with it locally.
  • There are widespread practices for representing geometric data as linked data, such as using W3C WGS84 Geo Positioning vocabulary (geo) geo:lat and geo:long that are used extensively for describing geo:Point objects.
  • Concrete geometry types are available, such as those defined in the OpenGIS [Simple-Features] Specification, namely 0-dimensional Point and MultiPoint; 1-dimensional curve LineString and MultiLineString; 2-dimensional surface Polygon and MultiPolygon; and the heterogeneous GeometryCollection.

How to Test

...

Evidence

Relevant requirements: R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.

Best Practice 8: Specify Coordinate Reference System for high-precision applications

A coordinate referencing system should be specified for high-precision applications to locate geospatial entities.

Why

The choice of CRS is sensitive to the intended domain of application for the geospatial data. For the majority of applications a common global CRS (WGS84) is fine, but high precision applications (such as precision agriculture and defence) require spatial referencing to be accurate to a few meters or even centimeters.

Add explanation of why there are so many CRSs.

Need to clarify when and why people use different CRS's

Note

The misuse of spatial data, because of confusion about the CRS, can result in catastrophic results; e.g. both the bombing of the Chinese Embassy in Belgrade during the Balkan conflict and fatal incidents along the East Timor border are generally attributed to spatial referencing problems.

Intended Outcome

A Coordinate Reference System (CRS) sensitive to the intended domain of application (e.g. high precision applications) for the geospatial data should be chosen.

Possible Approach to Implementation

Recommendations about CRS referencing should consider:

  • When a default Coordinate Reference System is sufficient.
  • The choice of CRS should be sensitive to the intended domain of application for the geospatial data.
  • WGS84 is a common choice of CRS - although this in itself is ambiguous: we need to assert whether data relates to the ellipsoid (datum surface) or the geoid (gravitational equipotential surface; EGM96).
  • WGS84 is a reasonable choice for many human-scale activities (e.g. navigation). However, given that the earth's surface is constantly moving (e.g. Australia moves by 7cm per year), WGS84 is not appropriate for precision applications. For example, the defense community uses 12 separate Mercator projections to maintain accuracy around the globe whilst the Australian national mapping authority is considering use of a dynamic datum [citation required].
  • For convenience, the CRS is often designated within the data format or vocabulary specification (e.g. W3C WGS84 Geo Positioning vocabulary) and, therefore, does not appear in the data itself. This is often considered as a default CRS. Data publishers and consumers should make sure they are aware of the specified CRS and any limitations that this may pose regarding the use of the data.
  • Where a specific CRS is required, the data publisher should choose a vocabulary where the CRS can be defined explicitly within the data.

How to Test

...

Evidence

Relevant requirements: R-DefaultCRS

Best Practice 9: How to describe relative positions

Provide a relative positioning capability in which the entities can be linked to a specific position.

Why

In some cases it is needed to describe the location of an entity in relation to another location or in relation to location of another entity. For example, South-West of Guildford, close to London Bridge.

Intended Outcome

It should be possible to describe the location of an entity in relation to another entity or in relation to a specific location, instead of specifying a geometry.

The relative positioning descriptions should be machine-interpretable and/or human-readable.

Possible Approach to Implementation

The relative positioning should be provided as:

  • A positioning capability to describe the position of entities with explicit links to a specific location and/or other entities.
  • Semantic descriptions for relative positions and relative associations to an explicit or absolute positioning capability.

Do we need this as a best practice; if yes, this BP needs more content

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-SamplingTopology.

Best Practice 10: How to describe positional (in)accuracy

Accuracy and precision of spatial data should be specified in machine-interpretable and human-readable form.

Why

The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [Veregin].

Intended Outcome

When known, the resolution and precision of spatial data should be specified in a way to allow consumers of the data to be aware of the resolution and level of details that are considered in the specifications.

Possible Approach to Implementation

...

We need some explanations for the approaches to describe positional (in)accuracy.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-QualityMetadata.

Best Practice 11: How to describe properties that change over time

Entities and their data should have versioning with time/location references

Why

Entities and their properties can change over time and this change can be related to spatial properties, for example when a spatial thing moves from one location to another location, or when it becomes bigger or smaller. For some use cases you need to be able to explicitly refer to a particular version of information that describes a SpatialThing, or to infer which geometry is appropriate at a specific time, based on the versioning. To make this possible, the properties that are described for an entity should have references to the time and location that the information describing a SpatialThing was captured and should retain a version history. This allows you to reference the most recent data as well as previous versions and to also follow the changes of the properties.

Intended Outcome

Properties described in a dataset will include a time (and/or location) stamp and also versioning information to allow tracking of the changes and accessing the most up-to-date properties data.

Possible Approach to Implementation

Need to include guidance on when a lightweight approach (ignoring the change aspects) is appropriate

When entities and their properties can change over time, or are valid only at a given time, and this needs to be captured, it is important to specify a clear relationship between property data and its versioning information. How properties are versioned should be explained in the specification or schema that defines those properties. Temporal and/or spatial metadata should be provided to indicate when and where the information about the SpatialThing was captured.

For an example of how to version information about entities and their properties and retaining a version history, see version:VersionedThing and version:Version at https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#versioned-types.

It is also useful to incorporate information on how often the information might change, i.e. the frequency of update.

Data publishers must decide how much change is acceptable before a SpatialThing cannot be considered equivalent. At this point, a new identifier should be used as the subject for all properties about the changed SpatialThing. Also see Best Practice 4: Provide stable identifiers for Things (resources) that change over time.

How to work with data that is such high volume (e.g. sensor data streams) that the data is discarded after a period of time?

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MovingFeatures, R-Streamable

6.2.2 Publishing data with clear semantics

In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.

This best practice document provides mechanisms for determining how places and locations are related - but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.

Note

Thematic semantics are out of scope for this best practice document. For associated best practices, please refer to [DWBP] Metadata, Best Practice 4 Provide structural metadata; and [DWBP] Vocabularies, Best Practice 15 Use standardized terms, Best Practice 16 RE-use vocabularies and Best Practice 17 Choose the right formalization level.

See also [LD-BP] Vocabularies.

Best Practice 12: Use spatial semantics for spatial Things

The best vocabulary should be chosen to describe the available spatial things.

Why

The spatial things can be described using several available vocabularies. A robust methodology or an informed decision making process should be adapted to choose the best available vocabulary to describe the entities.

Intended Outcome

Entities and their properties are described using common and reusable vocabularies to increase and improve the interoperability of the descriptions.

Possible Approach to Implementation

There are various vocabularies that provide common information (semantics) about spatial things, such as Basic Geo vocabulary, [GeoSPARQL] or schema.org that provide common information about spatial things. This best practice helps you decide which vocabulary to use. The semantic description of entities and their properties should use the existing common vocabularies in their descriptions to increase the interoperability with other descriptions that may refer to the same vocabularies. For this it is required to:

  • Go through a selection process to decide on the existing and relevant vocabularies that can be used to describe the spatial things and their properties.
  • Maintain links to the vocabularies in the schema definitions and provide linked-data descriptions.
  • Define location and spatial references using the common vocabulary concepts whenever applicable instead of defining your own location instance.
  • Provide thematic semantics and general descriptions of spatial things and their properties as linked data. They should have URIs in which when you look those up you can see what they mean. For more information refer to [DWBP] Best Practice 4: Provide structural metadata.

There are different vocabularies that are related to spatial things. This best practice will provide a method for selecting the right vocabulary for your task, in the form of a durable methodology or an actionable selection list.

The Basic Geo vocabulary has a class SpatialThing which has a very broad definition. This can be applicable (as a generic concept) to most of the common use-cases.

For some use cases we might need something more specific.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MobileSensors, R-MovingFeatures.

We might publish in the BP or a complimentary note a set of statements mapping the set of available vocabularies about spatial things. There are mappings available e.g. GeoNames has a mapping with schema.org. http://www.geonames.org/ontology/mappings_v3.01.rdf

Best Practice 13: Assert known relationships

Spatial relationships between Things should be specified in forms of geographical, topological and hierarchical links.

Why

It is often more efficient to rely on relationships asserted between SpatialThings rather than rely on solutions such as analysis of geometries to find out that two Things are, for example, at the same place, near each other, or one is inside the other. Describing the spatial relationships between SpatialThings can be based on relationships such as topological, geographical and hierarchical (e.g. partOf) links.

Relating SpatialThings to other spatial data enables, for example, digital personal assistants (e.g. Siri, Cortana) to make helpful suggestions or infer useful information such as "address" and "description" attributes added to extent data model of GeoLocation API. See also W3C EMMA 2.0; devices provide location and time-stamp data, and this helps us, for example:

  • Determining whether 'this' device interaction was 'within' a car, rather than the device being carried by a pedestrian, may result in different outcomes or suggestions
  • Determining whether an event for a traveler took place after the journey started and before it ended … we can infer the event was on-route
  • 'near' is contextual - it depends if you're walking or driving
  • Capturing social information; "I'm on my way to work" rather than "I'm here" … where work is a SpatialThing with indistinct boundaries

Intended Outcome

This requirement will allow expressing explicit spatial relationships between Things in the form of geographical, topological and hierarchical links that will not need post geometric processing and inferring to find the spatial links.

Possible Approach to Implementation

How to use spatial functions to find out if spatial things have corresponding geometries is described in Best Practice 3: Working with data that lacks globally unique identifiers for entity-level resources. This best practice describes how to express these discovered relationships between resources about physical and conceptual spatial things.

The asserted spatial semantics can include relationship such as nearby, contains, etc. This best practice requires specifying geometric, topological and social spatial relationships. It is also important to determine which relationships are appropriate for a given case (This is beyond the scope of this BP). This best practice requires:

  • The geographical, topological and social hierarchy (part of) should be described with clear semantics and registered with IANA Link relations.
  • Hierarchical relationships (i.e. part of, for example between administrative regions, have a specific need for defining "Mutually Exclusive Collectively Exhaustive" (MECE) set.
  • Topological relationships that are described for an entity should have references to concepts such as over, under etc.
  • Spatial relationships can use concepts such as Region Connection Calculus (RCC8) that contains, overlaps, touches, intersects, adjacent to, or "spatial predicates" or also other similar concepts from Allen Calculus.

Social relationships can be defined based on perception; e.g. "samePlaceAs", nearby, south of. These relationships can also be defined based on temporal concepts such as: after, around, etc. In current practice, there is no such property as samePlaceAs to express the social notion of place; enabling communities to unambiguously indicate that they are referring to the same place but without getting hung up on the semantic complexities inherent in the use of owl:sameAs or similar.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-SamplingTopology, R-SpatialRelationships, R-SpatialOperators.

Which vocabularies out there have social spatial relationships? FOAF, GeoNames, ...

6.2.3 Temporal aspects of spatial data

Temporal relationship types will be described here and be entered eventually as link relationship types into the IANA registry, Link relations, just like the spatial relationships.

In the same sense as with spatial data, temporal data can be fuzzy.

Note

Retain section; point to where temporal data is discussed in detail elsewhere in this document.

6.2.4 Spatial data from sensors and observations

The best practices described in this document will incorporate practice from both Observations and Measurements [OandM] and W3C Semantic Sensor Network Ontology [SSN].

See also W3C Generic Sensor API and OGC Sensor Things API. These are more about interacting with sensor devices.

Best Practice 14: Provide context required to interpret observation data values

Observation data should be linked to spatial, temporal and thematic information that describes the data.

Why

Processing and interpreting observation and measurement data in many use cases will require other contextual information including spatial, temporal and thematic information. This information should be specified as explicit semantic data and/or be provided as linked to other resources.

Intended Outcome

The contextual data will specify spatial, temporal and thematic data and other information that can assist to interpret the observation data; this can include information related to quality, observed property, location, time, topic, type, etc.

Possible Approach to Implementation

The context required to interpret observation values will require:

  • Specify explicit semantics that describe temporal, spatial and thematic information related to an entity
  • Provide links to to other related resources that can describe contextual information related to observation data
  • Specify provenance and other related information to the observation
  • Information related to the resource that provide the data (and properties of that resource/device; e.g. quality of measurement) can also help to interpret the observation data more effectively. For more information refer to SSN Ontology

How to Test

...

Evidence

Relevant requirements: R-ObservedPropertyInCoverage, R-QualityMetadata, R-SensorMetadata, R-SensingProcedure, R-UncertaintyInObservations.

Best Practice 15: Describe sensor data processing workflows

Processing steps that are used in collecting and publication of sensor data should be specified as semantic data associated to the sensor observations.

Why

Sensor data often goes through different pre-processing steps before making the data available to end-users. Providing information about these processes and workflows that are undertaken in collection and preparation of sensor data helps users understand how the data is modified and decide whether the data is appropriate for a given application/purpose.

Intended Outcome

Explicit semantic descriptions and/or links to external resources that describe the processing workflows that are used in collection and preparation of the sensor data.

Possible Approach to Implementation

Processing workflows are often employed to transform raw observation data into more usable forms. For example, satellite data often undergoes multiple processing steps (e.g. pixel classification, georeferencing) to create usable products. It is important to understand the provenance of the data and how it has been modified in order to determine whether the resulting data product can be used for a given purpose. This will require:

  • Providing explicit semantics or links related to provenance data and semantic description of processing workflows that are applied to the raw data.
  • Describing processing methods and their parameters (for example aggregation methods and their settings and functions that are applied).
  • Providing links to the original data for each step (if the data is available).
  • W3C PROV ontology can be used to describe the processing steps

How to Test

...

Evidence

Relevant requirements: R-ObservationAggregations, R-Provenance.

Best Practice 16: Relate observation data to the real world

Provide links between the observation and measurement data and the real world objects and/or subject of interest.

Why

Observation and measurement data usually represents a feature of interest related to Things: some thing or phenomenon in the real world that is being observed and measured. This link between the observation and measurement data and real world concepts and their feature of interest will help interpreting and using the data more effectively and will specify their relationships with concepts in the real world.

Intended Outcome

It should be possible for data consumers to interpret the meaning of data by referring to real world concepts and features of interest related to Things that are represented by the data.

Possible Approach to Implementation

Real world concept description metadata should include the following information:

  • Concepts of sampling feature from observation and measurement data.
  • Representation of the subject of interest and more specific concepts of "specimen".
  • Links to the Thing that the observation and measurement data is related to.

How to Test

...

Evidence

Relevant requirements: R-SamplingTopology.

Best Practice 17: How to work with crowd-sourced observations

Crowd-sourced data should be published as structured data with metadata that allows processing and interpreting it.

Why

Some social media channels do not allow use of structured data. Crowd-sourced data should be published as structured data with explicit semantics and also links to other related data (wherever applicable).

Human-readable and machine-readable metadata data should be provided with the crowd-sourced data.

Contextual data related to crowd sourced data should be available. Quality, trust and density levels of crowd-sourced data varies and it is important that the data is provided with contextual information that helps people judge the probable completeness and accuracy of the observations.

Intended Outcome

It should be possible for humans to have access to contextual information that describes the quality, completeness and trust levels of crowd-sourced observations. It should be possible for machines to automatically process the contextual information that describes the quality, completeness and trust levels of crowd-sourced observations.

Possible Approach to Implementation

The crowd-sourced data should be published as structured data with metadata that allows processing and interpreting it. The contextual information related to crowd-sourced data may be provided according to the vocabulary that is being developed by the DWBP working group (see [DWBP] Best Practice 7: Provide data quality information).

How to Test

...

Evidence

Relevant requirements: R-HumansAsSensors.

Best Practice 18: How to publish (and consume) sensor data streams

The overall (and common) features of a sensor data stream must be described by metadata

Why

Providing explicit metadata and semantic descriptions about the common features of a sensor data stream allows user agents to avoid adding repetitive information to individual data items and also allows to automatically discover sensor data streams on the Web and/or to understand (for human users) and interpret (for machine agents) the common features of sensor data streams.

Intended Outcome

  • It should be possible for humans to understand the common features of a sensor data stream.
  • It should be possible for machine agents to interpret the common features of a sensor data stream.
  • It should be possible for machine agents be able to automatically discover a sensory data stream.

Possible Approach to Implementation

The sensor data stream metadata should include the following overall features of a dataset:

  • The title and a description of the stream.
  • The keywords describing the stream.
  • Temporal and spatial information of the data stream.
  • The contact point of the data stream provider.
  • The access point and service API information.
  • Thematic features of the data provided by the sensor data stream.

The information above should be included both in the human understandable and the machine interpretable forms of metadata.

The machine readable version of the discovery metadata may be provided according to models such as Stream Annotation Ontology (SAO)

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-Streamable, R-TemporalReferenceSystem, R-TimeSeries.

6.3 Linking spatial data

For data to be on the web the resources it describes need to be connected, or linked, to other resources. The connectedness of data is one of the fundamentals of the Linked Data approach that these best practices build upon. The 5-star rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:

Just like any type of data, spatial data benefits massively from linking when publishing on the web.

Note

The widespread use of links within data is regarded as one of the most significant departures from contemporary practices used within SDIs.

Crucially, the use of links is predicated on the ability to identify the origin and target, or beginning and end, of the link. Best Practice 1: Use globally unique identifiers for entity-level resources is a prerequisite.

This section extends [DWBP] by providing best practices that are concerned with creating links between the resources described inside datasets. Best practices detailing the use of links to support discovery are provided in section 6.4 Enabling discovery.

Note

[DWBP] identifies Linkability as one of the benefits gained from implementing the Data on the Web best practices (see [DWBP] Data Identification Best Practice 11 Use persistent URIs as identifiers of datasets and Best Practice 12 Use persistent URIs as identifiers within datasets). However, no discussion is provided about how to create the links that the use those persistent URIs.

Best Practice 19: Make your entity-level links visible on the web

The data should be published with explicit links to other resources.

Why

Exposing entity-level links to web applications, user-agents and web crawlers allows the relationships between resources to be found without the data user needing to download the entire dataset for local analysis. Entity-level links provide explicit description of the relationships between resources and enable users to find related data and determine whether the related data is worth accessing. Entity-level links can be used to combine information from different sources; for example, to determine correlations in statistical data relating to the same location.

Intended Outcome

  • It should be possible for humans to understand and follow the the entity-level links between resources.
  • It should be possible for machine agents to interpret and explore the the entity-level links between resources.
  • It should be possible for machine agents to automatically determine (and find) relationships between entities by exploring the links between them.
  • It should be possible for a third party, who owns neither subject or object resources, to publish a set of links between resources.

Possible Approach to Implementation

To provide explicit entity-level links:

  • Publish the data with links (to uniquely identified objects) embedded in the data.
  • Publish sets of links as a complementary resource.
  • Publish summaries of links for the dataset so that the semantics of the links can be evaluated and accessed if deemed appropriate.

The use of Linksets needs further discussion as evidence indicates that it is not yet a widely adopted best practice. It may be appropriate to publish such details in a Note co-authoried with the DWBP WG.

Note

[gml] adopted the [xlink11] standard to represent links between resources. At the time of adoption, XLink was the only W3C-endorsed standard mechanism for describing links between resources within XML documents. The Open Geospatial Consortium anticipated broad adoption of XLink over time - and, with that adoption, provision of support within software tooling. While XML Schema, XPath, XSLT and XQuery etc. have seen good software support over the years, this never happened with XLink. The authors of GML note that given the lack of widespread support, use of Xlink within GML provided no significant advantage over and above use a bespoke mechanism tailored to the needs of GML.

Note

[void] provides guidance on how to discover VoID descriptions (including Linksets)- both by browsing the VoID dataset descriptions themselves to find related datasets, and using /.well-known/void (as described in [RFC5758]).

How would a (user) agent discover that these 3rd-party link-sets existed? Is there evidence of usage in the wild?

Does the [beacon] link dump format allow the use of wild cards / regex in URLs (e.g. URI Template as defined in [RFC6570]?

The examples contain only outline information; further details must be added.

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Best Practice 20: Provide meaningful links

When providing a link, a data publisher should opt for a level of formal and meaningful semantics that helps data consumers to decide if the target resource is relevant to them.

Why

Formal and meaningful semantics may help to provide explicit specifications that describe the intended meaning of the relationships between the resources.

Providing details of the semantic relationship inferred by a link enables a data user to evaluate whether or not the target resource is relevant to them. Describing the affordances of the target resource (e.g. what that resource can do or be used for) helps the data user to determine whether it is worth following the link.

Intended Outcome

The links provided for the data should allow different data consumers and applications to determine the relevance of a target resource to them. The links should be precise and explicit.

How do we know what is at the end of a link - and what can I do with it / can it do for me (e.g. the 'affordances' of the target resource).

How to describe the 'affordances' of the target resource?

Possible Approach to Implementation

Ensure that the type of relationship used to link between resources is explicitly identified. Provide resolvable definitions of those relationship types.

Note

Please refer to Best Practice 13: Assert known relationships for details of relationship types that may be used to describe spatial links (e.g. geographical, hierarchical, topological etc.). [DWBP] Section 9.9 Data Vocabularies provides futher information on use of relationship types described in well-defined vocabularies (see [DWBP] Best Practice 16: Use standardized terms and Best Practice 17: Reuse vocabularies).

How to Test

...

Evidence

Relevant requirements: R-Linkability

Best Practice 21: Link to spatial Things

Create durable links by connecting Spatial Things.

Why

Links enable a network of related resources to be connected together. For those connections to remain useful over a long period of time, both origin and target resources need to have durable identifiers. Typically, it is the SpatialThings that are given durable identifiers (see Best Practice 1: Use globally unique identifiers for entity-level resources) whereas the information resources that describe them (e.g. geometry objects) may be replaced by new versions.

When describing the relationships between related spatial resources, the links should connect SpatialThings.

Intended Outcome

Providing machine-interpretable and/or human-readable durable links between SpatialThings.

Note

This best practice is concerned with the connections between SpatialThings. When describing an individual SpatialThing itself, it is often desirable to decompose the information into several uniquely identified objects. For example, the geometry of an administrative area may be managed as a separate resource due to the large number of geometric points required to describe the boundary.

Also note that in many cases, different identifiers are used to describe the SpatialThing and the information resource that describes that SpatialThing. For example, within DBpedia, the city of Sapporo, Japan, is identified as http://dbpedia.org/resource/Sapporo, while the information resource that describes this resource is identified as http://dbpedia.org/page/Sapporo. Care should be taken to select the identifier for the SpatialThing rather than the information resource that describes it; in the example above, this is http://dbpedia.org/resource/Sapporo.

Possible Approach to Implementation

  • Link to the identifier for the SpatialThing itself rather than information resource(s) that describe it.
  • Be aware that objects that describe SpatialThings (e.g. the geometry of a hydrological catchment) are prone to change as the information is updated.
Note

Refer to Best Practice 20: Provide meaningful links for further information on providing the semantics for links.

How to link to a resource as it was at a particular time?

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Best Practice 22: Link to resources with well-known or authoritative identifiers

Link your spatial resources to others that are commonly used.

Why

In Linked Data, commonly used resources behave like hubs in the network of interlinked resources. By linking your spatial resources to those in common usage it will be easier to discover your resources. For example, a data user interested in air quality data about the place they live might begin by searching for that place in popular data repositories such as GeoNames, Wikidata or DBpedia. Once the user finds the resource that describes the correct place, they can search for data that refers to the identified resource that, in this case, relates to air quality.

Furthermore, by referring to resources in common usage, it becomes even easier to find those resources as search engines will prioritize resources that are linked to more often.

Refer to Best Practice 24: Use links to find related data for more details about how a user might use links to discover data.

Intended Outcome

Data publishers relate their data to commonly used spatial resources using links. Data users can quickly find the data they are interested in by browsing information that is related to commonly used spatial resources.

Possible Approach to Implementation

The link must convey the semantics appropriate to the application (see Best Practice 13: Assert known relationships and Best Practice 20: Provide meaningful links for more information).

A list of sources of commonly used spatial resources is provided in section B. Authoritative sources of geographic identifiers.

How to Test

...

Evidence

Relevant requirements: R-Crawlability, R-Discoverability.

Best Practice 23: Link to related resources

Link your spatial resources to other related resources

Why

Relationships between resources with with spatial extent (i.e. size, shape, or position; SpatialThings) can often inferred from their spatial properties. For example, two resources might occur at the same location suggesting that they may be the same resource, or one resource might exist entirely within the bounds of another resource suggesting some kind of hierarchical containment relationship. However, reconciliation of such resources is complex: it requires some degree of understanding about the semantics of the two, potentially related, resources in order to determine how they are related, if at all.

Rather than expecting that data consumers will have sufficient context to relate resources, it is better for data publishers to assert the relationships that they know about. Not only does this provide data users with clear information about how resources are related, it removes the need for complex spatial processing (e.g. region connection calculus) to determine potential relationships between resources with spatial extent because those relations are already made explicit.

Note

Where possible, existing identifiers should be reused when referring to resources (see Best Practice 2: Reuse existing (authoritative) identifiers when available). However, the use of multiple identifiers for the same resource is commonplace, for example, where data publishers from different jurisdictions refer to the same SpatialThing. In this special case, properties such as owl:sameAs can be used to declare that multiple identifiers refer to the same resource. It is often the case that data published from different sources about the same physical or conceptual resource may provide different view points.

Intended Outcome

A data user can browse between (information about) related resources using the explicitly defined links to discover more information.

In the special case that the property owl:sameAs is used to relate identifiers, information whose subject is one of the respective identifiers can be combined.

Note

A data user should always exercise some discretion when working with data from different sources; for example, to determine whether the data is timely, accurate or trustworthy. Further discussion on this issue is beyond the scope of these best practices.

Possible Approach to Implementation

Given their in depth understanding of the content they publish, data publishers are in a good position to determine the relationships between related resources. Data publishers should analyze their data to determine related resources.

Note

The mechanics of how to decide when two resources are the same are beyond the scope of this best practice. Tools (e.g. OpenRefine and Silk Linked Data Integration Framework) are available to assist with such reconciliation and may provide further insight.

The link must convey the semantics appropriate to the application (see Best Practice 13: Assert known relationships and Best Practice 20: Provide meaningful links for more information).

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

6.4 Enabling discovery

[DWBP] provides best practices discussing the provision of metadata to support discovery of data at the dataset level (see [DWBP] section 9.2 Metadata for more details). This mode of discovery is well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself - which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.

This section includes a best practice for including spatial information in the dataset metadata, for example, the spatial extent of the dataset.

However, one of the criteria for exposing data on the Web is that it can be discovered directly using search engines such as Google, Bing and Yandex. Current SDI approaches treat spatial data much like books in a library where you must first use the librarian's card catalog index to find the book on the shelf. As for other types of data on the Web, we want to be able to find spatial resources directly; we want to move beyond the two-step discovery approach of contemporary SDIs and find the words, sentences and chapters in a book without needing to check the card catalog first. Not only will this make spatial data far more accessible, it mitigates the problems caused when catalogs have only stale dataset metadata and removes the need for data users to be proficient with the query protocol of the dataset's service end-point in order to acquire the data they need.

In the wider Web, it is links that enable this direct discovery: from user-agents following a hyperlink to find related information to search engines using links to prioritise and refine search results. Whereas section 6.3 Linking spatial data discusses the creation of links, this section is largely concerned with the use of those links to support discovery of the SpatialThings described in spatial datasets.

Best Practice 24: Use links to find related data

Related data to a spatial dataset and its individual data items should be discoverable by browsing the links

Why

In much the same way as the document Web allows one to find related content by following hyperlinks, the links between spatial datasets, SpatialThings described in those datasets and other resources on the Web enable humans and software agents to explore rich and diverse content without the need to download a collection of datasets for local processing in order to determine the relationships between resources.

Spatial data is typically well structured; datasets contain SpatialThings that can be uniquely identified. This means that spatial data is well suited to the use of links to find related content.

Note

The emergency response to natural disasters is often delayed by the need to download and correlate spatial datasets before effective planning can begin. Not only is the initial response hampered, but often the correlations between resources in datasets are discarded once the emergency response is complete because participants have not been able to capture and republish those correlations for subsequent usage.

Intended Outcome

It should be possible for humans to explore the links between a spatial dataset (or its individual items) and other related data on the Web.

It should be possible for software agents to automatically discover related data by exploring the links between a spatial dataset (or its individual items) and other resources on the Web.

It should be possible for a human or software agent to determine which links are relevant to them and which links can be considered trustworthy.

What do we expect user-agents to do with a multitude of links from a single resource? A document hyperlink has just one target; but in data, a resource may be related to many things.

Possible Approach to Implementation

  • For a given subject resource find all the related resources that it refers to.
  • Evaluate the property type that is used to relate each resource in order to determine relevance (see Best Practice 20: Provide meaningful links).
  • Use the metadata for the dataset within which the subject resource is described in order to determine which links to "trust" (e.g. whether to use the data or not); owner / publisher, quality information, community annotations (“likes”), publication date etc.
  • Aggregate links from trusted sources into a database. Referring URLs can be indexed to determine which resources refer to the subject resource, i.e. "what points to me?". These referring links are sometimes called back-links. Dataset-level metadata may provide information regarding the frequency of update for the information sources, enabling one to determine a mechanism for keeping the aggregated link-set fresh.
    Note

    These "back-links" can be traversed to find related information and also help a publisher assess the value of their content by making it possible to see who is using (or referencing) their data.

  • Use network / graph analysis algorithms to determine related information that is not directly connected; i.e. resources that are connected via a chain of links and intermediate resources.

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Best Practice 25: Make your entity-level data indexable by search engines

Search engines should receive a metadata response to a HTTP GET when dereferencing the link target URI.

Why

Current SDI approaches require a 2-step approach for discovery, beginning with a catalog query and then accessing the appropriate data service end-point.

Exposing data on the Web means that it can be discovered directly using search engines. This provides a number of benefits:

  • Spatial data will become far more discoverable because a user does not need any special knowledge to find the SDIs catalog service.
  • Users can discover what data is actually available, rather than relying on the metadata that is infrquently published to the catalog and may have become stale.
  • Users do not need to be proficient with the query protocol of the dataset's service end-point in order to acquire the data they need.

Search engines should be able to use links and URIs to discover indexable spatial data and to prioritize those spatial data collections within a search result.

Search engines use links to discover content to index and to prioritize that content within a search result.

Intended Outcome

Spatial data should be discoverable directly using a search engine query.

Spatial data is indexable by search engines; a search engine Web crawler should be able to obtain descriptive and machine interpretable metadata response to a HTTP GET when dereferencing the URL of a SpatialThing and to determine links to related data for the Web crawler to follow.

Note

We make the assertion that data is not really 'on the web' until it's crawlable.

Possible Approach to Implementation

To make your entity-level data indexable by search engines:

  • Generate one HTML page per resource. Include structured markup (see schema.org) that the search engines can use to make more detailed assumptions about your resource(s) and drive better search performance. Either create pages beforehand or generate them at query time via an API.
  • Provide a path for search engines to find your pages - either crawling to each entity from a 'collection' object (which provides the entry point for the web crawler) or being directed by sitemaps.
  • Search engines may be used to search spatial data based on identifier, location, time, etc (some examples are provided in [Section III.A, Barnaghi et al.]. The search for spatial data can be similar to the typical search query, but the search and use of results in many cases will be done by software agents and other services like Google Now that interpret information.

More discussion is required on how to structure meaningful (spatial) queries with search engines (e.g. based on identifier, location, time etc.).

Note

As more spatial datasets are published that provide structured markup to search engine Web crawlers enabling the content of those datasets to be indexed, the more likely that search engines will provide richer and more sophisticated search mechanisms to exploit that markup which will further improve the ability of users to find spatial data.

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Best Practice 26: Include spatial information in dataset metadata

The description of datasets that have spatial features should include explicit metadata about the spatial information

Why

It is often useful to provide metadata at the dataset-level. The dataset is the unit of governance for information, which means that details like license, ownership and maintenance regime etc. need only be stated once, rather than for every resource description it contains. Data that is not directly accessible due to commercial or privacy arrangements can also be publicized using summary metadata provided for the dataset. [DWBP] section 9.2 Metadata provides more details.

For spatial data, it is often necessary to describe the spatial details of the dataset - such as extent and resolution. This information is used by SDI catalog services that offer spatial query to find data.

Intended Outcome

Dataset metadata should include the information necessary to enable spatial queries within catalog services such as those provided by SDIs.

Dataset metadata should include the information required for a user to evaluate whether the spatial data is suitable for their intented application.

Possible Approach to Implementation

To include spatial information in datasets one can:

  • Provide information about the spatial attributes of the dataset; such as the spatial extent of the features described by the dataset.
  • Use common vocabularies for geospatial semantics (e.g. GeoNames) and geospatial ontologies (see W3C Geospatial Incubator Group (GeoXG)'s report) to describe the spatial information for the datasets.
  • Provide explicit metadata regarding mobility and/or APIs that can provide information about the spatial features (e.g. location) of a dataset or its data items.

How to Test

...

Evidence

Relevant requirements: R-Discoverability, R-Compatibility, R-BoundingBoxCentroid, R-Crawlability, R-SpatialMetadata and R-Provenance.

6.5 Exposing datasets through Web services

Should content from this section be moved to [DWBP] section 9.11 Data Access?

SDIs have long been used to provide access to spatial data via web services; typically using open standard specifications from the Open Geospatial Consortium (OGC). With the exception of the Web Map Service, these OGC Web service specifications have not seen widespread adoption beyond the geospatial expert community. In parallel, we have seen widespread emergence of Web applications that use spatial data - albeit focusing largely on point-based data.

This section seeks to capture the best practices that have emerged from the wider Web community for accessing spatial data via the Web. While [DWBP] provides best practices discussing access to data using Web infrastructure (see [DWBP] section 9.11 Data Access), this section provides additional insight for publishers of spatial data. In particular, we look at how Application Program Interfaces (API) may be used to make it easy to work with spatial data.

Note

The term API as used here refers to the combination of the set of operations provided and the data content exposed by a particular Web service end-point.

Best Practice 27: Publish data at the granularity you can support

Granularity of mechanisms provided to access access a dataset should be decided based on available resources

Why

Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms in order to ensure long-term, sustainable access to their data.

Intended Outcome

Data is published on the Web in a mechanism that the data publisher can afford to implement and support throughout the anticipated lifetime of the data.

Possible Approach to Implementation

When determining the mechanism to be used provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:

  1. Bulk-download or streaming of the entire dataset (see [DWBP] Best Practice 20: Provide bulk download)
  2. Generalized query API (such as WFS or SPARQL [sparql11-overview])
  3. Bespoke API designed to support a particular application (see Best Practice 28: Expose entity-level data through 'convenience APIs')
Note

[DWBP] indicates that when data that is logically organized as one container and is distributed across many URLs, accessing the data in bulk is useful. As a minimum, it should be possible for users to download data on the Web as a single resource; e.g. through bulk file formats.

Note

A data publisher need not incur all the costs alone. Given motivation, third parties (such as the open data community or commercial providers) may provide value-added services that build on simple bulk-download or generalized query interfaces. While establishing such arrangements is beyond the scope of this document, it is important to note that the data publisher should consider the end-to-end data publication chain. For example, one may need to consider including conditions in the usage license about the timeliness of frequently changing data.

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Best Practice 28: Expose entity-level data through 'convenience APIs'

If you have a specific application in mind for publishing your data, tailor the spatial data API to meet that goal.

Why

When access to spatial data is provided by bulk download or through a generalized query service, users need to understand how the data is structured in order to work effectively with that data. Given that spatial data may be arbitrarily complex, this burdens the data user with significant effort before they can even perform simple queries. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with arbitrarily complex data structures using (a set of) simple queries. As stated in [DWBP], an API offers the greatest flexibility and processability for consumers of data; for example, enabling real-time data usage, filtering on request, and the ability to work with the data at an atomic level. If your dataset is large, frequently updated, or highly complex, a convenience API is likely to be helpful.

Intended Outcome

  • The API provides a coherent set of queries and operations that help users achieve common tasks.
  • Data users are presented with a set of simple operations within the API that enable them to get working with the data quickly; they do not need to understand the structure and semantics of the data up front.
  • Data users are able to use progressively more complex features of the API as their understanding of the data increases.
  • The API provides both machine readable data and human readable HTML markup; the latter is used by search engines to index the spatial data.
  • The API encapsulates (i.e. hides) the complexity of the data is exposes.
  • The API is versioned, enabling developers to upgrade their client application at a convenient time.

Possible Approach to Implementation

Note

This best practice extends [DWBP] Best Practice 26: Use an API.

The API should be targeted to deliver a coherent set of functions that are design to meet the needs of common tasks. Work with your developer community to determine the tasks that they want to do with the data or use your experience to infer those tasks. Design your API to help developers achieve those tasks. API operations may be one of:

  • query and response - for requesting data
  • publish and subscribe - for disseminating real-time or event-based data
  • transaction - for storing, modifying, processing or analysing data

Include light-weight queries and/or operations in the API that help users start working with your data quickly. The complexity of the data should be hidden from inexperienced users.

The API should offer both machine readable data and human readable HTML that includes the structured metadata required by search engines seeking to index content (see Best Practice 25: Make your entity-level data indexable by search engines for more details).

When designing the API, each operation or query should be focused on achieving a single outcome; this should ensure that the API remains light-weight. Groups of API operations may be chained together (like unix pipes) in order to complete complex tasks.

When designing APIs, data publishers must be aware of the constraints of operating in a Web environment. Providing access to large datasets, such as coverages, is a particular challenge. The API should provide mechanisms to request subsets of the dataset that are a convenient size for client applications to manage.

APIs will often evolve as more data is added or usage changes; queries or operations may be changed or new ones added. APIs should be versioned in order to insulate downstream client applications from these changes.

Note

Regarding API design, also see [DWBP] Best Practice 21: Use Web Standardized Interfaces.

In the geospatial domain there are a lot of WFS services providing data. A RESTful API as a wrapper or a shim layer could be created around WFS services. GML content from the WFS service could be provided in this way as linked data or another Web friendly format. This approach is similar to the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and web services are wrapped around it. Adding URIs in (GML) data exposed by a WFS is pretty straightforward, but making the data 'webby' is harder. There are examples of this approach of creating a convenience API on top of WFS, but rather than adapting the WFS (GML) output, it may be more effective to provide an alternative 'Linked Data friendly' access path to the data source; creating a new, complementary service endpoint e.g. expose the underpinning postGIS database via SPARQL endpoint (using something like ontop-spatial) and Linked Data API.

Example of providing an alternative access path to WFS GML source data
Fig. 1 Providing an alternative 'Linked Data friendly' access path to a WFS data source.

How to Test

...

Evidence

Relevant requirements: R-Compatibility, R-LightweightAPI.

Best Practice 29: APIs should be self-describing

APIs should provide a discoverable description of their contents and how to interact with it. Ideally this should be a machine-readable description.

Why

Good information about an API lets potential users determine if the API is a good resource to use for a given task and how to use it, as well as letting machines find out how to interact with it.

Intended Outcome

The API description enables a user to construct meaningful queries against the API for data that the API can actually provide.

A user can find the API description; e.g. via referral from the API or via a search engine query.

Possible Approach to Implementation

Note

This best practice extends [DWBP] Best Practice 25: Document your API.

API documentation should describe the data(set) it exposes; the APIs operations and parameters; what kind of format / payload the API offers; and API versioning.

As a minimum, you should provide a human readable description of your APIs so that developers can read about how the API works. We recommend providing machine readable API documentation that can be used by software development tools to help developers build API client software. API documentation should be crawlable via search engines.

The API documentation should be generated from the API code so that the documentation can be easily kept upto date.

Where a parameter domain is bound to a set of values (e.g. value range, spatial or temporal extent, controlled vocabulary etc.), the API documentation or the API itself should indicate the set of values that may be used in order to help users request data that is actually available.

The API documentation should be discoverable from the API itself.

How to Test

...

Evidence

Relevant requirements: R-Discoverability ... others to be added.

Best Practice 30: Include search capability in your data access API

If you publish an API to access your data, make sure it allows users to search for specific data.

Should BP "Include search capability in your data access API" move to section 6.4 Enabling discovery?

Why

It can be hard to find a particular resource within a dataset, requiring either prior knowledge of the respective identifier for that resource and/or some intelligent manual guesswork. It is likely that users will not know the URI of the resource that they are looking for- but may know (at least part of) the name of the resource or some other details. A search capability will help a user to determine the identifier for the resource(s) they need using the limited information they have.

Intended Outcome

A user can do a text search on the name, label or other property of an entity that they are interested in to help them find the URI of the related resource.

Possible Approach to Implementation

to be added

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

6.6 Dealing with large datasets

There are several Best Practices in this document dealing with large datasets and coverages:

Should we discuss scaleability issues here?

7. Conclusions

A. Applicability of common formats to implementation of Best Practices

The Spatial Data on the Web working group is working on recommendations about the use of formats for publishing spatial data on the web, specifically about selecting the most appropriate format. There may not be one most appropriate format: which format is best may depend on many things. This section gives two tables that both aim to be helpful in selecting the right format in a given situation. These tables may in future be merged or reworked in other ways.

The first table is a matrix of the common formats, showing in general terms how well these formats help achieve goals such as discoverability, granularity etc.

An attempt at matrix of the common formats (GeoJSON, GML, RDF, JSON-LD) and what you can or can't achieve with it. (source: @eparsons)
Format Openness Binary/text Usage Discoverability Granular links CRS Support Verbosity Semantics vocab? Streamable 3D Support
ESRI Shape Open'ish Binary Geometry only attributes and metadata in linked DB files Poor In Theory? Yes Lightweight No No Yes
GeoJSON Open Text Geometry and attributes inline array Good ? In Theory? No Lightweight No No No
DXF Proprietary Binary Geometry only attributes and metadata in linked DB files Poor Poor No Lightweight No No Yes
GML Open Text Geometry and attributes inline or xlinked Good ? In Theory ? Yes Verbose No No Yes
KML Open Text Geometry and attributes inline or xlinked Good ? In Theory ? No Lightweight No Yes? Yes

The second table is much more detailed, listing the currently much-used formats for spatial data, and scoring each format on a lot of detailed aspects.

An attempt at a matrix of the formats for spatial data in current use and detailed aspects. (source: @portele)
GML GML-SF0 JSON-LD GeoSPARQL (vocabulary) schema.org GeoJSON KML GeoPackage Shapefile GeoServices / Esri JSON Mapbox Vector Tiles
Governing Body OGC, ISO OGC W3C OGC Google, Microsoft, Yahoo, Yandex Authors (now in IETF process) OGC OGC Esri Esri Mapbox
Based on XML GML JSON RDF HTML with RDFa, Microdata, JSON-LD JSON XML SQLite, SF SQL dBASE JSON Google protocol buffers
Requires authoring of a vocabulary/schema for my data (or use of existing ones) Yes (using XML Schema) Yes (using XML Schema) Yes (using @context) Yes (using RDF schema) No, schema.org specifies a vocabulary that should be used No No Implicitly (SQLite tables) Implicitly (dBASE table) No No
Supports reuse of third party vocabularies for features and properties Yes Yes Yes Yes Yes No No No No No No
Supports extensions (geometry types, metadata, etc.) Yes No Yes Yes Yes No (under discussion in IETF) Yes (rarely used except by Google) Yes No No No
Supports non-simple property values Yes No Yes Yes Yes Yes (in practice: not used) No No No No No
Supports multiple values per property Yes No Yes Yes Yes Yes (in practice: not used) No No No No No
Supports multiple geometries per feature Yes Yes n/a Yes Yes (but probably not in practice?) No Yes No No No No
Support for Coordinate Reference Systems any any n/a many WGS84 latitude, longitude WGS84 longitude, latitude with optional elevation WGS84 longitude, latitude with optional elevation many many many WGS84 spherical mercator projection
Support for non-linear interpolations in curves Yes Only arcs n/a Yes (using GML) No No No Yes, in an extension No No No
Support for non-planar interpolations in surfaces Yes No n/a Yes (using GML) No No No No No No No
Support for solids (3D) Yes Yes n/a Yes (using GML) No No No No No No No
Feature in a feature collection document has URI (required for ★★★★) Yes, via XML ID Yes, via XML ID Yes, via @id keyword Yes Yes, via HTML ID No Yes, via XML ID No No No No
Support for hyperlinks (required for ★★★★★) Yes Yes Yes Yes Yes No No No No No No
Media type application/gml+xml application/gml+xml with profile parameter application/ld+json application/rdf+xml, application/ld+json, etc. text/html application/vnd.geo+json application/vnd.google-earth.kml+xml, application/vnd.google-earth.kmz - - - -
Remarks comprehensive and supporting many use cases, but requires strong XML skills simplified profile of GML no support for spatial data, a GeoJSON-LD is under discussion GeoSPARQL also specifies related extension functions for SPARQL; other geospatial vocabularies exist, see ??? schema.org markup is indexed by major search engines supported by many mapping APIs focussed on visualisation of and interaction with spatial data, typically in Earth browsers liek Google Earth used to support "native" access to geospatial data across all enterprise and personal computing environments, including mobile devices supported by almost all GIS mainly used via the GeoServices REST API used for sharing geospatial data in tiles, mainly for display in maps

B. Authoritative sources of geographic identifiers

As per http://www.w3.org/DesignIssues/LinkedData.html item 4, it's useful for people to link their data to other related data. In this context we're most frequently talking about either Spatial Things and/or their geometry.

There are many useful sets of identifiers for spatial things and which ones are most useful will depend on context. This involves discovering relevant URIs that you might want to connect to.

Relevant URIs for spatial things can be found in many places. This list gives the first places you should check:

Finding out which national open spatial datasets are available and how they can be accessed, currently requires prior knowledge in most cases, because these datasets are often not easily discoverable. Look for national dataportals / geoportals such as Nationaal Georegister (Dutch national register of geospatial datasets) or Dataportaal van de Nederlandse overheid (Dutch national governmental data portal).

As an example, let's take Edinburgh. In some recent work with the Scottish Government, we have an identifier for the City of Edinburgh Council Area - i.e. the geographical area that Edinburgh City Council is responsible for:

http://statistics.gov.scot/id/statistical-geography/S12000036

(note that this URI doesn't resolve yet but it will in the next couple of months once the system goes properly live)

The UK government provides an identifier for Edinburgh and/or information about it that we might want to link to:

http://statistics.data.gov.uk/id/statistical-geography/S12000036

The Scottish identifier is directly based on this one, but the Scottish Government wanted the ability to create something dereferenceable, potentially with additional or different info to the data.gov.uk one. These two are owl:sameAs.

DBpedia also includes a resource about Edinburgh. Relationship: "more or less the same as" but probably not the strict semantics of owl:sameAs.

http://data.ordnancesurvey.co.uk/id/50kGazetteer/81482

This Edinburgh resource is found by querying the OS gazetteer search service for 'Edinburgh' then checking the labels of the results that came up. OS give it a type of 'NamedPlace' and give it some coordinates.

http://data.ordnancesurvey.co.uk/id/50kGazetteer/81483

This Edinburgh airport resource was also found by the same OS gazetteer search service for 'Edinburgh'. This is clearly not the same as the original spatial thing, but you might want to say something like 'within' or 'hasAirport'.

http://data.ordnancesurvey.co.uk/id/7000000000030505

This resource is in the OS 'Boundary Line' service that contains administrative and statistical geography areas in the UK. It's probably safe to say the original identifier is owl:sameAs this one.

http://sws.geonames.org/2650225/

This is the Geonames resource for Edinburgh found using the search service: http://api.geonames.org/search?name=Edinburgh&type=rdf&username=demo Once you have found a place in GeoNames, there are other useful services to find things that are nearby.

C. Cross reference of use case requirements against Best Practices

TO DO: cross reference the Best Practices to determine if any do not have an associated Requirement

Cross reference of requirements against best practices
UC Requirements Best Practice
R-BoundingBoxCentroid Best Practice 7: How to describe geometry
R-Compatibility Best Practice 28: Expose entity-level data through 'convenience APIs'
R-Compressible Best Practice 7: How to describe geometry
R-CoverageTemporalExtent Not a Best Practice deliverable
R-Crawlability Best Practice 25: Make your entity-level data indexable by search engines
R-CRSDefinition Best Practice 7: How to describe geometry
R-DateTimeDuration Not a Best Practice deliverable
R-DefaultCRS Best Practice 8: Specify Coordinate Reference System for high-precision applications
R-DifferentTimeModels Not a Best Practice deliverable
R-Discoverability

Best Practice 24: Use links to find related data

Best Practice 25: Make your entity-level data indexable by search engines

Best Practice 26: Include spatial information in dataset metadata

R-DynamicSensorData Not a Best Practice deliverable
R-EncodingForVectorGeometry Best Practice 7: How to describe geometry
R-ExSituSampling Not a Best Practice deliverable
R-Georectification Not a Best Practice deliverable
R-GeoreferencedData Not a Best Practice deliverable
R-HumansAsSensors Not a Best Practice deliverable
R-IndependenceOnReferenceSystems Best Practice 7: How to describe geometry
R-LightweightAPI

Not a Best Practice deliverable, but referenced in -

Best Practice 28: Expose entity-level data through 'convenience APIs'

R-Linkability

Best Practice 1: Use globally unique identifiers for entity-level resources

Best Practice 19: Make your entity-level links visible on the web

Best Practice 20: Provide meaningful links

R-MachineToMachine

Best Practice 6: Provide a minimum set of information for your intended application

Best Practice 7: How to describe geometry

Best Practice 9: How to describe relative positions

Best Practice 10: How to describe positional (in)accuracy

Best Practice 11: How to describe properties that change over time

Best Practice 12: Use spatial semantics for spatial Things

Best Practice 13: Assert known relationships

Best Practice 18: How to publish (and consume) sensor data streams

Best Practice 25: Make your entity-level data indexable by search engines

R-MobileSensors

Not a Best Practice deliverable, but referenced in -

Best Practice 12: Use spatial semantics for spatial Things

R-4DModelSpaceTime Not a Best Practice deliverable
R-ModelReuse Not a Best Practice deliverable
R-MovingFeatures

Not a Best Practice deliverable, but referenced in -

Best Practice 11: How to describe properties that change over time

Best Practice 12: Use spatial semantics for spatial Things

R-MultilingualSupport Best Practice 6: Provide a minimum set of information for your intended application (provision of multi-lingual labels)
R-MultipleTypesOfCoverage Not a Best Practice deliverable
R-NominalObservations Not a Best Practice deliverable
R-NominalTemporalReferences Not a Best Practice deliverable
R-NonGeographicReferenceSystem Not a Best Practice deliverable
R-ObservationAggregations

Not a Best Practice deliverable, but referenced in -

Best Practice 15: How to describe sensor data processing workflows

R-ObservedPropertyInCoverage

Not a Best Practice deliverable, but referenced in -

Best Practice 14: How to provide context required to interpret observation data values

R-Provenance

Best Practice 15: How to describe sensor data processing workflows

Best Practice 26: Include spatial information in dataset metadata

R-QualityMetadata

Not a Best Practice deliverable, but referenced in -

Best Practice 10: How to describe positional (in)accuracy

Best Practice 14: How to provide context required to interpret observation data values

R-ReferenceDataChunks

Not a Best Practice deliverable, but referenced in -

Best Practice 27: Publish spatial data at the level of granularity you can support

R-ReferenceExternalVocabularies

Not a Best Practice deliverable

Also see [DWBP] Best Practice 16: Re-use vocabularies

R-SamplingTopology

Not a Best Practice deliverable, but referenced in -

Best Practice 9: How to describe relative positions

Best Practice 13: Assert known relationships

Best Practice 16: How to relate observation data to the real world

R-SensorMetadata

Not a Best Practice deliverable, but referenced in -

Best Practice 14: How to provide context required to interpret observation data values

R-SensingProcedure

Not a Best Practice deliverable, but referenced in -

Best Practice 14: How to provide context required to interpret observation data values

R-SpaceTimeMultiScale Not a Best Practice deliverable
R-SpatialMetadata

Best Practice 7: How to describe geometry

Best Practice 26: Include spatial information in dataset metadata

R-SpatialRelationships Best Practice 13: Assert known relationships
R-SpatialOperators Best Practice 13: Assert known relationships
R-SpatialVagueness

Not a Best Practice deliverable, but referenced in -

Best Practice 6: Provide a minimum set of information for your intended application

R-SSNLikeRepresentation Not a Best Practice deliverable
R-Streamable

Best Practice 11: How to describe properties that change over time

Best Practice 18: How to publish (and consume) sensor data streams

Best Practice 27: Publish spatial data at the level of granularity you can support

R-3DSupport Best Practice 7: How to describe geometry
R-TimeDependentCRS Best Practice 7: How to describe geometry
R-TemporalReferenceSystem

Not a Best Practice deliverable, but referenced in -

Best Practice 18: How to publish (and consume) sensor data streams

R-TemporalVagueness Not a Best Practice deliverable
R-TilingSupport Best Practice 7: How to describe geometry (performance considerations)
R-TimeSeries

Not a Best Practice deliverable, but referenced in -

Best Practice 18: How to publish (and consume) sensor data streams

R-UncertaintyInObservations

Not a Best Practice deliverable, but referenced in -

Best Practice 14: how to provide context required to interpret observation data values

R-UpdateDatatypes Not a Best Practice deliverable
R-UseInComputationalModels Not a Best Practice deliverable
R-Validation

CLARIFICATION required; is this in scope?

R-ValidTime Not a Best Practice deliverable
R-VirtualObservations Not a Best Practice deliverable

D. Glossary

Coverage: A property of a SpatialThing whose value varies according to location and/or time.

Commercial operator: Search engine or similar company that operates on the Web and generates indexes from the information found in web pages and data published on the Web.

CRS: Coordinate Reference System, a coordinate-based local, regional or global system used to locate geographical entities.

GIS: Geographic information system or geographical information system, a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data. (ref. Geographic information system)

Geospatial expert: A person with a high degree of knowledge about SDIs.

Public sector: The part of the economy concerned with providing various government services. (ref. Public sector)

IoT: Internet of Things, the network of physical objects or "things" embedded with electronics, software, sensors, and network connectivity, which enables these objects to be controlled remotely and to collect and exchange data.

SDI: Spatial Data Infrastructure, a data infrastructure implementing a framework of geographic data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. (ref. Spatial Data Infrastructure (SDI))

SpatialThing: Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes (ref. W3C WGS84 Geo Positioning vocabulary (geo))

TemporalThing: Anything with temporal extent, i.e. duration. e.g. the taking of a photograph, a scheduled meeting, a GPS time-stamped trackpoint (ref. W3C WGS84 Geo Positioning vocabulary (geo))

Web developer: A programmer who specializes in, or is specifically engaged in, the development of World Wide Web applications, or distributed network applications that are run over HTTP from a Web server to a Web browser. (ref. Web developer)

WCS: Web Coverage service, a service offering multi-dimensional coverage data for access over the Internet. (ref. OGC WCS)

WFS: Web Feature Service, a standardized HTTP interface allowing requests for geographical features across the web using platform-independent calls. (ref. OGC WFS)

WMS: Web Map Service, a standardized HTTP interface for requesting geo-registered map images from one or more distributed geospatial databases. (ref. OGC WMS)

WKT: Well Known Text, a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems. (ref. Well-known text)

WPS: Web Processing Service, an interface standard which provides rules for standardizing inputs and outputs (requests and responses) for invoking geospatial processing services, such as polygon overlay, as a Web service. (ref. OGC WPS)

E. Acknowledgments

The editors gratefully acknowledge the contributions made to this document by all members of the working group and the chairs: Kerry Taylor and Ed Parsons.

F. References

F.1 Informative references

[DWBP]
Bernadette Farias Loscio; Caroline Burle; Newton Calegari. Data on the Web Best Practices. 12 January 2016. W3C Working Draft. URL: http://www.w3.org/TR/dwbp/
[GeoJSON]
Howard Butler; Martin Daly; Allan Doyle; Sean Gillies; Tim Schaub; Christopher Schmidt. The GeoJSON Format Specification. 16 June 2008. URL: http://geojson.org/geojson-spec.html
[GeoSPARQL]
Matthew Perry; John Herring. GeoSPARQL - A Geographic Query Language for RDF Data. 10 September 2012. URL: http://www.opengeospatial.org/standards/geosparql
[LD-BP]
Bernadette Hyland; Ghislain Auguste Atemezing; Boris Villazón-Terrazas. Best Practices for Publishing Linked Data. 9 January 2014. W3C Note. URL: http://www.w3.org/TR/ld-bp/
[OandM]
Simon Cox. Observations and Measurements - XML Implementation. 22 March 2011. URL: http://www.opengeospatial.org/standards/om
[RFC5758]
Q. Dang; S. Santesson; K. Moriarty; D. Brown; T. Polk. Internet X.509 Public Key Infrastructure: Additional Algorithms and Identifiers for DSA and ECDSA. January 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5758
[RFC6570]
J. Gregorio; R. Fielding; M. Hadley; M. Nottingham; D. Orchard. URI Template. March 2012. Proposed Standard. URL: https://tools.ietf.org/html/rfc6570
[RFC6906]
E. Wilde. The 'profile' Link Relation Type. March 2013. Informational. URL: https://tools.ietf.org/html/rfc6906
[SCHEMA-ORG]
Schema.org. URL: http://schema.org/
[SDW-UCR]
Frans Knibbe; Alejandro Llaves. Spatial Data on the Web Use Cases & Requirements. 17 December 2015. W3C Note. URL: http://www.w3.org/TR/sdw-ucr/
[SSN]
W3C Semantic Sensor Network Incubator Group. Semantic Sensor Network Ontology. URL: http://purl.oclc.org/NET/ssnx/ssn
[Simple-Features]
John Herring. Simple Feature Access - Part 1: Common Architecture. 28 May 2011. URL: http://www.opengeospatial.org/standards/sfa
[Veregin]
H. Veregin. Data quality parameters. In: Geographical Information Systems: Principles, Techniques, Management and Applications. URL: http://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch12.pdf
[beacon]
J. Voß; M. Schindler. BEACON link dump format. 6 July 2014. URL: https://gbv.github.io/beaconspec/beacon.html
[gml]
Geography Markup Language (GML) Encoding Standard. URL: http://www.opengeospatial.org/standards/gml
[json-ld]
Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/json-ld/
[microdata]
Ian Hickson. HTML Microdata. 29 October 2013. W3C Note. URL: http://www.w3.org/TR/microdata/
[sparql11-overview]
The W3C SPARQL Working Group. SPARQL 1.1 Overview. 21 March 2013. W3C Recommendation. URL: http://www.w3.org/TR/sparql11-overview/
[vocab-data-cube]
Richard Cyganiak; Dave Reynolds. The RDF Data Cube Vocabulary. 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/vocab-data-cube/
[vocab-dcat]
Fadi Maali; John Erickson. Data Catalog Vocabulary (DCAT). 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/vocab-dcat/
[void]
Keith Alexander; Richard Cyganiak; Michael Hausenblas; Jun Zhao. Describing Linked Datasets with the VoID Vocabulary. 3 March 2011. W3C Note. URL: http://www.w3.org/TR/void/
[webarch]
Ian Jacobs; Norman Walsh. Architecture of the World Wide Web, Volume One. 15 December 2004. W3C Recommendation. URL: http://www.w3.org/TR/webarch/
[xlink11]
Steven DeRose; Eve Maler; David Orchard; Norman Walsh et al. XML Linking Language (XLink) Version 1.1. 6 May 2010. W3C Recommendation. URL: http://www.w3.org/TR/xlink11/