Use Cases and Requirements

From Dataset Exchange Working Group

This is where Use Cases were initially consolidated as they were prepared and agree for voting as accepted scope of the the DXWG.

Please do not add new material here. New Use Cases should be submitted through GitHub

(this is a first draft - open questions include back references to "raw use case suggestions")

UC1 Consistent use of summary properties and extension points with more detailed domain specific information models

Deliverable(s):

(DCAT1.1, AP Guidelines, Content Negotiation)

Tags

  1. Profile #Content_negotiation

Problem statement

Most properties used in DCAT represent a summary of a set of concerns that may need to be more deeply modeled in a given application domain. For example, dct:publisher is arguably a view of a the chain of provenance of the data. Both simple properties with well defined semantics and the ability to define appropriate models for richer metadata are useful.

There is a need for richer semantic descriptions of datasets and the possible wide range of alternate forms of distribution, including access services. Different communities of practice will have different needs for both the level of detail and the form such descriptive metadata is available in. Such metadata will also need both human and machine readable forms.

It is not possible to pre-define all such forms of metadata, however it is not desirable to have no means to predict what form such metadata takes and to interact with it. Thus communities of practice are expected to develop “profiles” of DCAT to support their needs for the level of detail and its interoperability with both discovery and data access systems.

Such profiles should support multiple forms of any given metadata, using 3rd party encodings, schema, vocabularies and additional constraints - for example a dataset schema definition may be provided in XML schema, OWL, UML, DDL or many other forms.

Because such metadata is inherently complex, each metadata component will need to conform to specific interoperability profiles. A means to share definitions of these, along with machine readable conformance and validation resources, will be required to maximise interoperability, whilst allow future flexibility. It is expected that agents will perform profile identification based negotiation to access the forms of metadata they are able to best exploit.

DCAT profiles will reference (by URI) the specific metadata component profiles they require, and such URIs may be dereferenced by agents using profile based content negotiation to find forms of these profiles suitable for use at run time.

Existing approaches

Extension Points

Links in HTML provide for annotations describing the role of extensions. Such roles are formalised in the IANA link registry [1] <link rel="alternate" type="application/rss+xml" href="/rss.xml" title="RSS 2.0"> <link rel="alternate" type="application/atom+xml" href="/atom.xml" title="Atom 1.0">

RDF provides for reification, but links can just as easily be explictly modelled by providing a link object type, with the link to the resource being referenced and any set of metadata properties for that link.

Profiles

Application profiles of DCAT, and other similar standards are currently written mainly in document form, with some partial support using constraint languages such as Schematron, SHACL etc.

Links

Requirements:

  • Provide extensions points to DCAT for further description of multiple aspects of datasets, with the ability to declare the role, data type and encoding of resource, the content profile the resource conforms to, and a human readable label for the link.
  • A extensible register of metadata component roles, and a register for data types referenced by content profiles, and a register of content profiles. (NB encoding types already have MIME type registers)

Related use cases:

(back refs to sources?)

Comments

UC2 Dataset Versioning Information

Deliverable(s):

DCAT1.1

Tags

DCAT #Lifecycle #Provenance #Dataset_concept #Aggregate

Stakeholders

* data producers that produce versioned datasets
* data consumers that consume versioned datasets

Problem statement

Many datasets are released as discrete versions. These may be published as updates, as new datsets or via distributions that support specific versions, or distributions that support version based content negotiation. The DCAT 1.0 model does not cover versioning with sufficient detail or flexibility. Being able to publish dataset version information in a standard way will help both producers publishing their data on data catalogues or archiving data and dataset consumers who want discover new versions of a given dataset, etc. It will also help users who wish to cite a specific version of a dataset. There are several existing dataset description models that extend DCAT to provide versioning information, for example, HCLS Community Profile. Many systems will have their own strategies for handling versions - for example replacing a dataset and updating a DCAT description, deprecating previous versions and redirecting users to current versions, providing explicit records for previous and newer versions with references. DCAT cannot dictate this choice, but can provide canonical mechanisms for common cases.

Version information may provide domain-specific semantics - for example software release version numbers.

Users may search for a particular version of a dataset, which implies identification of the dataset independently of identification of dataset version.

Users may seek to find and record the version designator of a dataset.

Users may need to find the available versions of a dataset and the versioning history, and possibly the versioning policy to assess data quality.

Existing versioning designators in use by the community will need support, however possibly a normalised version may be required to support version comparison operations on DCAT resources.

Links:

https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels https://www.w3.org/TR/dwbp/#dataVersioning https://www.w3.org/TR/dwbp-ucr/#R-DataVersion http://db.csail.mit.edu/pubs/datahubcidr.pdf https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Jun/thread.html#msg6

Requirements:

A definition of what is meant by version in this context and how it relates to dataset, distribution should be provided. A simple canonical form of version designator that supports simple comparison (version1 > version2) Ability to attach a domain-specific version model and designate its type Ability to designate for each dataset’s distribution whether it provides the most recent version, a specific version or allows a version to be specified. Different versioning scenarios should be supported (e.g., dataset evolution, conversions/translations, granularities/subsets). Each version should provide a version identifier and other relevant metadata. It should be possible to provide metadata about when a version was created (released). It should be possible to provide identifiers for the previous/next versions when applicable (if they are in chronological order) It should be possible to provide what has been changed when applicable (if they are in chronological order) It should be possible to discover versions of a given dataset in a catalog. W3C DWBP guidelines on versioning: BP7. Provide a version indicator, BP8. Provide version history, BP11. Assign URIs to dataset versions and series

Related use cases

Relationships between Datasets

Comments

UC3 Modeling agent roles

Deliverable(s): DCAT1.1

Problem statement

Each metadata standard has its own set of agent roles, and they all use their own vocabularies / code lists. E.g., the latest version (2014) of [ISO-19115] has 20 roles, and [DataCite] even more.

Two of the main issues concern (a) how to ensure interoperability across roles defined in different standards, and (b) if it makes sense to support all of them across platforms. The latter point follows from a common issue in metadata standards supporting multiple roles, with overlapping semantics (e.g., the difference between a data distributor and a data publisher is not always clear). In these scenarios, whenever metadata are not created by specialists, roles frequently happen to be used inconsistently.

Funding source identification is another example of this and may be dealt with using a suitable mechanism for agent roles.

As far as research data are concerned, agent roles are important to denote the type of contribution provided by each individual / organization in producing data.

Moreover, in some cases, an additional requirement is to specify the temporal dimension of a role – i.e., the time frame during which an individual / organisation played a given role - and, maybe, also other information – e.g., the organisation where the individual held a given position while playing that role.

Existing approaches

[DCTerms] defines a limited number of agent roles as properties. [VOCAB-DCAT] re-uses some of them (in particular, dcterms:publisher), plus it defines a new one, namely, dcat:contactPoint. [DCAT-AP] and [GeoDCAT-AP] provide guidance on the use of other [DCTerms] roles - in particular, dcterms:creator, dcterms:rightsHolder. Anyway, the role properties defined in [DCTerms] and [VOCAB-DCAT] model just a subset of the agent roles defined in other standards. Moreover, they cannot be used to associate a role with other information concerning its temporal / organizational context.

[PROV-O] could be used for this purpose by using a “qualified attribution”. This is, for instance, the approach used in [GeoDCAT-AP] to model agent roles defined in [ISO-19115] but not supported in [DCTerms] and [VOCAB-DCAT]:

NOTE: PROV-O "qualified attribution" is an application of the Qualified Relation pattern. It is also used in the Sample Relations module in SSN/SOSA (SJDC)

Links

Requirements

  • Being able to model different types of agent roles
  • Agent roles defined in an authority list
  • commonality of practice of common properties and more detailed models as an extension point with similar patterns across DCAT specification
  • (potentially a "hierarchy of roles" - needs more input here)

Related use cases

  • special case of UC1