Share-PSI 2.0 logo

Best Practice: Develop a federation tool for open data portals

Draft: 11 February 2016

This version
http://www.w3.org/2013/share-psi/bp/ft-20160211/
Latest version
http://www.w3.org/2013/share-psi/bp/ft/
Previous version
http://www.w3.org/2013/share-psi/bp/ft-20151005/

This is one of a set of Best Practices developed by the Share-PSI 2.0 Thematic Network.

Creative Commons Licence Share-PSI Best Practice: Develop a federation tool for open data portals by Share-PSI 2.0 is licensed under a Creative Commons Attribution 4.0 International License.


Outline

The Spanish National Catalogue, datos.gob.es, has developed a federation tool for open data portals that enables automatic publication of the metadata corresponding to the data sets published on the websites of each public entity. A global index of reusable public information is thus created and can be accessed by companies or any user to locate reusable data in datos.gob.es without the need to know and find the website of the public entity holding the data in which re-users are interested in.

Challenge

More and more public entities are creating their own open data spaces in the web. These different spaces are unconnected. To avoid that reuse agents have to look for the data all over the web, we needed a single point where all the Spanish public sector information can be permanently and automatically referenced at the National Catalogue of Reusable Public Information: datos.gob.es

Solution

The catalogue federation tool enables aggregation and automatic publication of the metadata corresponding to the data sets published in the own catalogues on the websites of each public entity and also at the National Catalogue datos.gob.es in a consistent way. A global index of reusable public information is thus created and can be accessed. The tool developed in PHP as an extension of the National Catalogue data portal datos.gob.es ensures maximum coherence between the information being made available by the public entities in their own catalogues and the National Catalogue itself.

This solution enables the existence of a global reuse scenario that provides greater visibility for the public data made available by the three levels of government (central, regional, local and universities), as well as a general overview of how public sector information is being reused in Spain.

Why is this a Best Practice? What’s the impact of the Best Practice

  • The ability to interconnect open data initiatives at a single access point - datos.gob.es - that can be accessed by the reuser or any member of the public via a search tool to locate reusable information without needing to know and find the website of the public entity holding the data of interest to them, strongly contributes to the efficiency of the research processes.
  • It facilitates enrichment of datos.gob.es through the large-scale upload of meta-information associated with the data sets made available by public entities for reuse.
  • Provides maximum consistency between the information being made available by the public entities in their own catalogues and the information referenced at datos.gob.es.

  • Reduced workload for public employees in their task of publishing the data sets subject to reuse by avoiding the need to upload information twice, both to the internal catalogue and datos.gob.es.
  • It enables the existence of a global scenario that fosters the extraction of general conclusions and a general overview of the PSI situation in Spain, facilitating the use of this information to extract meaningful and actionable knowledge regarding the open data landscape.
  • The federator - developed according to guidelines set down by experts in the field - ensures standardisation and data integrity, and enables automated publication and constant updating of published information, while also enhancing the visibility of the data sets made available by the various public entities.o get the most out of scarce public resources that are available in the country.
  • Facilitates reuse by infomediary sector in Spain. Datasets are displayed in a clear and structured fashion on a user-friendly interface for reuse.
  • Ensures the consistent growth of the Spanish Open Data Catalogue by means of a set of guidelines to use standard metadata structure in data sets following W3C recommendations.

  • Facilitates the task of public employees avoiding to publish reusable information in two different places.

How do I implement this Best Practice?

To implement this best practice you will need some elements, among them:

  • A legal and technical framework ensuring that each public entity will federate their datasets at the national data portal in a standard manner.
  • A coordination structure between the different administrative levels (State, regional, Local) to agree the metadata considering the common needs of the group directly involved in using the federation tool to assure the further collaboration using the tool that means for the success on the initiative.
  • A data portal as a single point where federate all published datasets of the different public entities. The federation tool is a module PHP open-source programming language that acts as an extension of the data portal, which was developed using Drupal 7.
  • An agreed Metadata scheme following W3C recommendation should be available in a DCAT/RDF or ATOM format feed which must be accessible at a URI on the website of the entity origin of the data.
  • Some complementary web services and widgets that enable the meta-information published in the catalogue to be obtained and processed according to various invocation parameters and various response formats (ATOM/XML, DCAT/RDF and JSON) to be referenced on the entity of the data website.

Where has this best practice been implemented?

This best practice could be implemented when different open data portals exists in order to provide one single point where to locate every datasets which are contained in the aforementioned portals. The drawing up of technical standards to establish common conditions for: the selection, identification, description, conditions of use and making available of data sets - the Interoperability Technical Standard on the Reuse of PSI (hereinafter, ITS-PSI), which define a DCAT profile for the public information catalogues at the various government and agency levels in Spain, and closely linked to the DCAT Application profile for data portals in Europe.

The federation tool - integrated as an extra module on the datos.gob.es portal - accesses the metadata of each entity and updates the meta-information available at datos.gob.es according to a pre-established schedule. This ensures effective federation with datos.gob.es of the open data catalogues of the public entities for example with the European Data Portal, which seeks to facilitate the location and reuse of data from national, regional and local administration services throughout Europe.

Country Implementation Contact Point
Spain The Spanish National Catalogue soporte@datos.gob.es

References

Contact Info

soporte@datos.gob.es or via Dirección de Tecnologías de la Información y las Comunicaciones.

$Id: Overview.html,v 1.3 2016/08/20 06:52:09 phila Exp $