A Self Sustaining Business Model for Open Data: Report

Executive Summary

The fourth Share-PSI Workshop was held at the Danube University in Krems, an hour's drive from Vienna, and was collocated with the annual CeDEM conference. The event comprised a mixture of presentations and facilitated discussions, the latter focussed particularly on eliciting best practices. A notable feature of the Krems workshop was the high number of businesses present who make use of PSI in some way or another.

Although commercial users of PSI see the information as essential, it's not the focus of their business. Public authorities must engage with a wide range of businesses to maximise the chances of success for their PSI strategy.

Business users of PSI typically carry out extensive processing on data to create a new service that has commercial value. Those services are likely to be of great interest to the public sector itself and procurement procedures may need to be updated to take advantage of this opportunity to effectively form new partnerships.

A public authority is typically a monopoly supplier of data. Charging large sums for this data will limit the size of the market to a few customers able to afford the fee. Reducing or eliminating the cost of the data will quickly create a bigger market than the original.

Reliance on traditional KPIs and project management practices, particularly when confined to a single department, are almost certain to prolong the status quo. Transformational change – seeing real benefits through disruptive innovation – requires new methods of management and new metrics.

Search engine optimisation is an important feature of data publication.

The top priorities for publishers and users of PSI are licensing and training/skills.

Most controversially for a workshop with many open data enthusiasts, charging for PSI is not always seen as bad. Strong arguments were made that allowing public authorities to make small charges for PSI provides sufficient reward to encourage the publication of higher value data that would otherwise remain unavailable. Commercial users do not mind paying a reasonable sum for PSI.


KREMS from noel on Vimeo.

Introduction

The fourth Share-PSI workshop was collocated with the annual CeDEM conference at the Danube University, Krems, Austria and built upon previous workshops in the series, particularly the one held in Lisbon in December 2014: Encouraging commercial use of open data. How can businesses be built upon, supported or otherwise enriched through the use of public sector information and what should the public sector do to facilitate this? The workshop included a mixture of presentations and interactive discussions amongst a wide range of participants from the public and private sectors as well as academia and citizens' advocacy groups.

The themes discussed in Krems were familiar to many in the audience, however, the event benefitted greatly from the participation of more businesses than is often the case in workshops like this. As a result, the open data community's cries of 'raw data now' and ' it's not your data, it's ours, we already paid for it, so hand it over' were perhaps muted. Instead, the emphasis was on the quality and reliability of data and it was notable that, as is often the case in discussions with businesses, charges for data are not seen as fundamentally wrong.

The recurring themes in Krems were:

  • Engage end users of the data – including potential business users.
  • Traditional processes lead to traditional metrics and KPIs that are designed to measure incremental improvements in the status quo, not encourage transformational change.
  • Where high value data is made available for free, the overall tax return is higher than the lost revenue.
  • Businesses are not built on PSI but on servicing a need. Meeting that need may benefit from the availability of PSI.
  • There is a business in discovering, refining and packaging information, but this depends more on PSI being accurate than it does on it being available for free.
  • PSI can only be made more readily available if there are the tools and workflows to support it.
  • IPR, copyright and licensing are critical.

In numbers: the event attracted 87 registrants and comprised 12 plenary talks, 12 workshop sessions, 4 bar camps and a discussion panel.

Engaging Users

(R to L) Nancy Routzouni, Amanda Smith and Frederique Oudkerk appreciate the discussion in the Podium session of the SHARE-PSI meeting on the afternoon of Wednesday, 20 May, 2015

Discussions around Public Sector Information and open data often conclude that engagement with users is important (it was highlighted at the Timişoara event, for example). Feedback mechanisms are important and it's often suggested – although rarely implemented – that data be published via GitHub so that feedback can be gathered, issues raised and, most interestingly, corrections made to the data. This user engagement is usually thought of in terms of developers and active citizens but the Krems event highlighted the need to engage with the local business community who would then create services for end users.

Companies like Eversport and OpenMove are both successful and make use of data but they are not positioned as data companies. OpenMove is an obvious user of PSI and is a star performer from the FINODEX project. It provides electronic ticketing services for municipal areas so that the availability of transport information is an obvious requirement. Eversport, however, provides a suite of digital services to the operators of sports facilities, particularly publicly owned tennis courts. The fact that in order to operate a sports facilities booking service they need data about the location and ownership of those sports facilities would not necessarily be obvious in a discussion about PSI Directive implementation. As Andreas Woditschka explained, Eversport does not market itself to sports facility owners by saying that they can help open up their data but by offering online payments systems, bookings and search engine optimisation.

The point about search engine optimisation is not unimportant. By producing Web pages about specific tennis courts, complete with structured data, Eversport is able to offer its clients very high positions in relevant search results, usually top position. That's an incentive for the facility operator to share their data and effectively create a partnership with Eversport.

Making PSI available is clearly something that only the public sector can do. Creating value in the information through curation, cleaning and service provision can be done by either the public sector or the private sector but where is the division? Who will do what and how can the private sector be sure that the public sector won't undermine their future business by changing the rules and becoming a competitor with an unfair advantage? Who needs to pay for data to be anonymised? Can those costs be shared? These questions and more are important in the establishment of any business so that discussion between all the stakeholders is essential.

Slim Turki of the Luxembourg Institute of Science and Technology presented work (PDF) he and Muriel Foulonneau did on different types of service that use PSI. These range from businesses that exist as a direct result of the availability of data, such as data visualisation services, businesses that use data as a raw material, and those that use PSI to validate their own data. The business models for all three are very different, as are the needs of each. Therefore, again, engagement with those users is essential if public authorities are to support innovation.

Business Models

In his opening keynote and during the Share-PSI panel, Alon Peled, Associate Professor and Political Scientist at the Hebrew University of Jerusalem, made the point that public sector bodies must have a positive incentive to make their PSI available. In the absence of this, they will very often make some information available so that they appear to fulfill their obligations under the PSI Directive. However, this is likely to be misleading as they will retain the most valuable data and only make it available for a fee or via a paid for service. Alon Peled provided evidence to back this up and it matches the situation concerning some European weather services that, to a certain extent, operate anti-competitively (since they are a monopoly supplier). However, the picture is not so simple and doesn't apply universally.

Richard Pettifer of PRIMET, the Association of Private Meteorological Services, explained that in 1995 the World Meteorological Organisation, WMO, identified a small number of datasets that should be made available for free by its members. These are known as the WMO-40, not because there are 40 datasets but because it was the 40th resolution of the WMO. However, the most valuable, high resolution data is generally not made available for free and the rules for charging are written in such a way as to allow for a very wide variation in costs, depending on the data provider. By contrast, meteorological data from the US and Japan has long been available for free. As a consequence, many weather forecasting applications have used the American and Japanese data.

Richard Pettifer's slide describing the market for meteorological data could be applied to many other sectors. The provision of value added services by a monopoly supplier of data is anti-competitive, especially where key datasets are not made readily available.

The development of small scale apps is seen by meteorological organisations as a possible source of revenue from the long tail, i.e. small amounts of money from many sources cf. a few large scale customers. Richard Pettifer's most compelling information comes from a comparison of the European and US markets. In the US, where meteorological data has been available for free for many years, growth in the market for weather forecasting services grew by 17% per annum between 1999 and 2006. In Europe, the comparable figure is just 5% p.a.

Richard Pettifer of PRIMET

Weather data is available for free in the Netherlands, Finland, Norway, Sweden and Iceland with much more data being made available for free recently in Germany and the UK. Where the data has been made available for free, the value of private sector businesses already exceeds the previous size of the monopolies' commercial businesses prior to opening the data.

This was one of the sessions at Krems to highlight a need for a different set of metrics to be applied when assessing the value of PSI.

Peter Guggenberger of Austrian publisher MANZ described a number of business models that had been tried for their RDB Rechtsdatenbank, a Legal database for free research. They have long offered free access to primary information about the Austrian law but have charged for secondary services, notably their magazine. Secondary services have always been made available behind a paywall, searchable only within the site. However, after disappointing results, in December 2014, all landing pages, preview text and metadata for the secondary content was made public so that they became searchable on the Web (a reflection of the Eversport point about SEO). Only access to full texts of the magazines, books and other secondary content published by MANZ is limited to subscribers.

The result?

MANZ content is much more visible and revenues have increased.

Nicolas Hazard from PwC presented work done under the ISA Programme concerning Business Models for Linked Open Government Data. This study took a look at the 5 stars of open data and asked whether 5 star data, i.e. data encoded as RDF and linked to other datasets, really is more valuable than, say, an Excel file (2 stars). The enablers and roadblocks are summarised in the table, but the report highlighted that the primary benefits of the 5 star approach are realised where it is necessary to link diverse data points together. This is often true in the public sector itself, being able to link authoritative data without duplication, and Austrian business intelligence supplier Kompany is an example of a private sector business that makes use of the technology for this reason too (see below). These, plus increased flexibility and the network effect, are powerful factors in its favour. Conversely, the lack of necessary skills and the perceived lack of tools are seen as major roadblocks.

EnablersRoadblocks
Efficiency gains in data integration – the network effectNecessary investments
Forward-looking strategiesLack of necessary competencies
Increased linking and integrated servicesPerceived lack of tools
Ease of model updatesLack of service level guarantees
Ease of navigationMissing, restrictive, or incompatible licences
Open licensing and free accessSurfeit of standard vocabularies
Enthusiasm from ‘champions’The inertia of the status quo
Table summarising the enablers and roadblocks for using Linked Open Government Data, presented by Nicolas Hazard, based on a report by Phil Archer, W3C; Makx Dekkers, AMI Consult; Stijn Goedertier, Nikolaos Loutas, Nicolas Hazard PwC EU Services.

The issue of a lack of tools that support the workflow was a recurring theme throughout the workshop. Most people use Word and Excel every day with other office programs used as needed. There is no "convert to an open format, add metadata and upload to the data portal" function in that software. Until such tools exist, making PSI available is going to remain the preserve of the specialist.

Licensing

The work presented by PwC was one of many that highlighted the critical issue of licensing. More than active citizens, more than not for profit campaigning organisations: for businesses, a clear statement of the IPR invested in a dataset is essential. You can't build a business on data you don't know for sure that you're allowed to use. This is consistent with the business community's willingness to pay for data. If you've paid for it, you know you're allowed to use it.

Others highlighting the IPR issue in particular were Wolters Kluwer and Agenzia per l'Italia Digitale (AgID).

The Value of Data

Raw data is of almost no value except to a small number of people with the skills and motivation to work with it. Inaccurate or out of date data is entirely worthless. There is, however, value in accurate, curated data and services that offer human-digestible information.

The Krems workshop heard from several companies that offer such information services (infomediaries). MANZ has already been mentioned, Another, Kompany, is an Austrian company that makes official information available about businesses: announcements, patents, trademarks, credit reports and scores, all based on official sources. Peter Bainbridge-Clayton described the workflow as:

Retrieve ⇒ Analyse ⇒ Transform ⇒ Store ⇒ Enhance ⇒ Playout

Peter Bainbridge-Clayton makes a point during his presentation Swimming Against the Tide - turning data back into information in the SHARE-PSI Plenum session in the morning of Thursday, 21 May, 2015.

In his experience, data tends to be published following American technical guidelines and cultural norms from whatever country the businesses are in. Knowledge of local terminology and legal structures is essential to make sense of the data which is always the work of a human. The Linked Data approach is particularly apposite in this circumstance where disparate datasets from different countries need to be transformed into a common model from which useful information can then be extracted for presentation to clients. The format of the original data source is of only minor consequence and multiple tools are used for the initial retrieval stage.

Another infomediary present in Krems was data.be. Like Kompany, it offers business intelligence services and it's founder, Toon Vanagt, like Peter Bainbridge-Clayton, has to be largely unconcerned about the format of the original information - they deal with what they can get. data.be uses a variety of techniques to extract it and turn it into processable data and a lot of the processed data is then made available for free. As a serial entrepreneur, Toon Vanagt suggested that patience is a requirement for anyone wishing to make a business out of PSI and that alternative sources of income should be arranged while the business is developed. It is notable that the public sector, i.e. the original source of the data, is the biggest user of the services. Frustratingly, this only applies to the free services since they have no budget to pay for the premium services.

The panel session at the end of day 1. From left to right: Alon Peled (Hebrew University of Jerusalem), Gregor Eibil (Austrian Federal Chancellery), Wendy Carerra (CapGemini/European Data Portal), DI Dieter Zoubek CMC (Austrian Economic Chamber), Toon Vanagt (data.be), Phil Archer (W3C)

Gregor Eibil from the Federal Chancellery (BKA), reported that they had looked at buying enhanced information from services like Kompany but that their procurement procedures and legal framework don't allow it. This is a case where the existing legal and regulatory framework is working directly against a market it is trying to encourage.

Metrics and Management

The opening keynote of the event, shared with CeDEM, was given by Shauneen Furlong who is Professor and ICT and eGovernment Consultant at the Universities of Toronto and Ottawa. Her message was that the key to the successful creation of a self-sustaining ecosystem around public sector information is a revised project management model. Projects are established by senior managers within a department who measure the success or failure of the project according to a set of pre-arranged Key Performance Indicators. These will show whether the departmental budget has been well spent and what the return has been to the department.

It's a model that leads to improved transactions within the public sector – more efficient ways of maintaining the status quo – but that actively prevents the development of transformational services, that is, services that achieve the desired goal but that may disrupt or replace long established methods entirely. What's needed is an approach to project management, complete with KPIs, that can cut across departments and measure new benefits. As noted, Richard Pettifer described how the meteorological services in the Nordic countries have completely removed themselves from the market so their commercial income is now zero. However, the overall size of services based on that data is already greater than the meteorologists managed previously. A traditional approach to project management is not equipped to recognise this success. Perhaps this explains why IBM's Torsten Skalla said it's hard to identify tangible business benefits of sharing PSI more openly.

In Greece, the transparency portal publishes the decisions of 4221 public bodies (as PDFs that have to be manually searched) but there's no obvious method of measuring whether this is helping to reduce corruption. The portal represents a very substantial change in government processes but it is hard to quantify the benefits arising from that change and thereby encourage a sceptical and naturally conservative public sector to adjust the culture accordingly. In Austria, it's impossible to say how much it cost to establish data.gv.at since no one tracked the cost of preparing the data for publication.

The conclusion is clear: just as much as new tools need to be developed — new workflows need to be established, and a shift in the culture is needed — a change in the way efforts and returns on those efforts are recorded and managed is also needed.

One possible new approach was described by Joseph Azzopardi from Malta. He described the concept of a Data Bank, an idea that has already been adopted in Austria, Denmark, Switzerland and Norway. Certified copies of digital assets are deposited in a digital bank and remain under the control of the owner who can decide which third parties have access to which documents. This is a cross-departmental activity that can centralise data without necessarily centralising some of the processes that use that data.

Another answer to the call for improved project management was presented by Gabriele Ciasullo from Agenzia per l'Italia Digitale (AgID). AgID runs an annual review of the PSI publication process that focuses on their guidelines and whether they have been followed by different public bodies.

The organisational model of the Italian government's guidelines on publishing PSI

Specifically, the relevant legislation assigns AgID the responsibility to:

  • define a strategic agenda that identifies principles and objectives to be achieved by public administrations in valorising the information they own and manage;
  • develop a set of technical guidelines;
  • make recommendations that administrations should follow in order to meet the objectives indicated in the agenda;
  • report on the principal results of an assessment of how well the guidelines have been followed and whether the objectives have been met by administrations.

The legislative backing empowers AgID to take a high level view of the overall picture across the public sector, thus stepping outside the usual intra-departmental project management view.

Slim Turki of LIST presenting Service innovation: the hidden value of open data

Success Factors

The Open Data Institute offers a system of certificates that recognise organisations that follow good practices when making their data available. During their session, Amanda Smith and Sumika Sakanishi emphasised that ODI Certificates are about the process of publication, not the quality of the data itself. The questionnaire filled in by dataset publishers asks whether you have documentation, a contact point where users can ask questions and send corrections, etc. By going through the questionnaire, publishers are prompted to think about the publishing process and thereby learn what good practice is like. There was some scepticism in the session about the usefulness of the system and whether it was transferrable to countries other than the UK. On the latter point, the system has already been localised into a number of different languages and jurisdictions (the absence of database rights in the US is a particular issue, for example) and the uptake of certificates is increasing.

Yannis Charalbidis of the University of the Aegean presented research (PDF) he undertook with Anneke Zuiderwijk, Iryna Susha, Peter Parycek and Marijn Janssen, into the critical success factors for the publication and use of open data in practice. The principal research was conducted by way of a questionnaire (PDF) filled in during a 3.5 hour workshop by approximately 20 experts from the field of e-government and e-participation, all involved open data research.

The factors this audience deemed critical for successful PSI publication were:

  • Legislation, regulation, licenses
  • Strategy and political support
  • Management support
  • Training of and support for civil servants
  • Sustainability of the open data initiative
  • Collaboration
  • Open data platforms, tools and services
  • Accessibility, interoperability, and standards

The factors this audience deemed critical for successful PSI use were:

  • Legislation, regulation, and licenses
  • Success stories
  • Training of and support for open data users
  • Feedback and sustainability
  • Research and education

Note the inclusion of licensing and training in both lists.

Just Rewards

Uldis Bojārs explains how to extract structured data from unstructured open data in his talk with Renars Liepins

One potential, but usually overlooked, indicator for a public authority, is the prevalence of screen scrapers. Screen scraping is the technique used by developers with a strong desire to obtain information that is available on the Web only in natural language provided for humans to read. As Uldis Bojārs and Renars Liepins of IMCS, University of Latvia showed, even when screen scraping techniques are boosted by natural language processing, the accuracy of the data extracted is rarely better than 50%. But if that's what a developer has, that's what s/he will use. Therefore it is in the self interest of public authorities to publish structured data, either separately or embedded within their Web pages, to increase the accuracy of data that some developers are going to use anyway.

That's a negative incentive – a stick – much better is the positive incentive – the carrot. As noted, Alon Peled repeatedly called for PSI to be made available for a fee so that publishers received a reward for publishing. A bar camp session on scientific research data looked at many of the issues raised in the workshop as they relate to research data. Again the conclusion was that the key factor currently missing from the academic world is a reward (i.e. recognition) for researchers who publish their data and whose data is used by others. This should match the establish reward mechanisms for research paper citations.

Conclusions

Workshop host Johann Höchtl

As with the previous workshops in the series, the fourth event in Krems was successful in stimulating conversations around issues related to the sharing of public sector information. The collocation with CeDEM provided a means for the project to learn from international academic experts. Furthermore, the PSI Alliance's Georg Hittmair, based in Vienna, was able to help workshop organiser Johann Höchtl attract a number of local businesses and DI Dieter Zoubek from the Austrian Economic Chamber to Krems. This gave the event a real flavour of the business community's view on the topic of PSI. The specific conclusions from the event were:

  • Engagement with end users, including the local business community, is essential since businesses know what data they want and what they'll do with it.
  • Business users of PSI offer potential, tangible benefits to the data supplier and may effectively act as a partner.
  • Making expensive to collect data available for free can quickly lead to a bigger overall market than the one restricted to a few large scale customers.
  • New methods of measurement and project management are essential if the success or failure of a PSI policy is to be assessed.
  • The two top priorities for publishers and users of PSI are licensing and training with the tools and workflows a close second.
  • Search engine optimisation is an important feature of data publication.
  • Publishers should be rewarded for their efforts.