Abstract
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.
By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
Status of This Document
This section describes the status of this document at the time of its publication. Other
documents may supersede this document. A list of current W3C publications and the latest revision
of this technical report can be found in the W3C technical reports
index at http://www.w3.org/TR/.
This document was published by the Government Linked Data Working Group as a Last Call Working Draft. It is the second Last Call and addresses the comments and feedback received during the first Last Call period. See change history for more details.
The Working Group is very interested in hearing comments about this work. Please send comments by 30 August 2013 to
public-gld-comments@w3.org
(subscribe,
archives). This document is intended to become a W3C Recommendation.
Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the
relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.
This document was produced by a group operating under the
5 February 2004 W3C Patent Policy.
W3C maintains a public list of any patent disclosures
made in connection with the deliverables of the group; that page also includes instructions for
disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
Essential Claim(s) must disclose the
information in accordance with section
6 of the W3C Patent Policy.
2. Namespaces
The namespace for DCAT is http://www.w3.org/ns/dcat#
. However,
it should be noted that DCAT makes extensive use of terms from other vocabularies,
in particular Dublin Core. DCAT itself defines a minimal set of classes and
properties of its own. A full set of namespaces and prefixes used in this
document is shown in the table below.
Prefix | Namespace |
dcat | http://www.w3.org/ns/dcat# |
dct | http://purl.org/dc/terms/ |
dctype | http://purl.org/dc/dcmitype/ |
foaf | http://xmlns.com/foaf/0.1/ |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs | http://www.w3.org/2000/01/rdf-schema# |
skos | http://www.w3.org/2004/02/skos/core# |
vcard | http://www.w3.org/2006/vcard/ns# |
xsd | http://www.w3.org/2001/XMLSchema# |
4. Vocabulary Overview
This section is non-normative.
DCAT is an RDF vocabulary well-suited to representing government data catalogs such as data.gov and data.gov.uk. DCAT defines three main classes:
-
dcat:Catalog
represents the catalog
-
dcat:Dataset
represents a dataset in a catalog.
-
dcat:Distribution
represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.
Notice that a dataset in DCAT
is defined as a "collection of data, published or curated by a single agent, and available for access or
download in one or more formats". A dataset does not have to be available as a downloadable file.
For example, a dataset that is available via an API can be defined as an instance of dcat:Dataset and the
API can be defined as an instance of dcat:Distribution. DCAT itself does not define properties specific to APIs description.
These are considered out of the scope of this version of the vocabulary. Nevertheless, this can be defined as a profile of the DCAT vocabulary.
Another important class in DCAT is dcat:CatalogRecord
which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored.
All RDF examples in this document are written in Turtle syntax [TURTLE-TR].
4.1 Basic Example
This section is non-normative.
This example provides a quick overview of how DCAT might be used to represent a government catalog and its datasets.
First, the catalog description:
:catalog
a dcat:Catalog ;
dct:title "Imaginary Catalog" ;
rdfs:label "Imaginary Catalog" ;
foaf:homepage <http://example.org/catalog> ;
dct:publisher :transparency-office ;
dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
dcat:dataset :dataset-001 , :dataset-002 , :dataset-003 ;
.
The publisher of the catalog has the relative URI :transparency-office. Further description of the publisher can be provided as in the following example:
:transparency-office
a foaf:Organization ;
rdfs:label "Transparency Office" ;
.
The catalog lists each of its datasets via dcat:dataset property. In the example above, an example dataset was mentioned with the relative URI :dataset-001. A possible description of it using DCAT is shown below:
:dataset-001
a dcat:Dataset ;
dct:title "Imaginary dataset" ;
dcat:keyword "accountability","transparency" ,"payments" ;
dct:issued "2011-12-05"^^xsd:date ;
dct:modified "2011-12-05"^^xsd:date ;
dct:publisher :finance-ministry ;
dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-W> ;
dcat:distribution :dataset-001-csv ;
.
In order to express frequency of update in the example above, we chose to use an instance from the Content-Oriented Guidelines developed as part
of the W3C Data Cube Vocabulary efforts.
The dataset distribution :dataset-001-csv can be downloaded as a 5Kb CSV file. This information is
represented via an RDF resource of type dcat:Distribution.
:dataset-001-csv
a dcat:Distribution ;
dcat:downloadURL <http://www.example.org/files/001.csv> ;
dct:title "CSV distribution of imaginary dataset 001" ;
dcat:mediaType "text/csv" ;
dcat:byteSize "5120"^^xsd:decimal ;
.
4.2 Classifying datasets
The catalog classifies its datasets according to a set of domains represented by the relative URI :themes. SKOS can be used to describe the domains used:
:catalog dcat:themeTaxonomy :themes .
:themes
a skos:ConceptScheme ;
skos:prefLabel "A set of domains to classify documents" ;
.
:dataset-001 dcat:theme :accountability .
Notice that this dataset is classified under the domain represented by the relative URI :accountability.
It is recommended to define the concept as part of the concepts scheme identified by the URI :themes that was used to describe the catalog domains. An example SKOS description:
:accountability
a skos:Concept ;
skos:inScheme :themes ;
skos:prefLabel "Accountability" ;
.
4.4 A dataset available only behind some Web page
:dataset-002 is available as a CSV file. However :dataset-002 can only be obtained through some Web page
where the user needs to click some links, provide some information and check some boxes
before accessing the data
:dataset-002
a dcat:Dataset ;
dcat:landingPage <http://example.org/dataset-002.html> ;
dcat:distribution :dataset-002-csv ;
.
:dataset-002-csv
a dcat:Distribution ;
dcat:accessURL <http://example.org/dataset-002.html> ;
dcat:mediaType "text/csv" ;
.
Notice the use of dcat:landingPage and the definition of the dcat:Distribution instance.
4.5 A dataset available as download and behind some Web page
On the other hand, :dataset-003 can be obtained through some landing page but also can be downloaded from a known URL.
:dataset-003
a dcat:Dataset ;
dcat:landingPage <http://example.org/dataset-003.html> ;
dcat:distribution :dataset-003-csv ;
.
:dataset-003-csv
a dcat:Distribution ;
dcat:downloadURL <http://example.org/dataset-003.csv> .
dcat:mediaType "text/csv" ;
.
Notice that we used dcat:downloadURL with the downloadable distribution and that the other distribution through the landing page
does not have to be defined as a separate dcat:Distribution instance.
5. Vocabulary specification
5.1 Class: Catalog
The following properties are recommended for use on this class:
catalog record,
dataset,
description,
homepage,
language,
license,
publisher,
release date,
rights,
spatial,
themes,
title,
update date
RDF Class: | dcat:Catalog |
Definition: | A data catalog is a curated collection of metadata about datasets. |
Usage note: | Typically, a web-based data catalog is represented as a single instance of this class. |
See also: | Catalog record, Dataset |
Property: update/modification date
Property: language
RDF Property: | dct:language |
Definition: | The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog. |
Range: |
dct:LinguisticSystem
Resources defined by the Library of Congress (1,
2) should be used.
If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI should be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code should be used. |
Usage note: | Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see dataset language). |
Property: homepage
RDF Property: | foaf:homepage |
Definition: | The homepage of the catalog. |
Range: | foaf:Document |
Usage note: | foaf:homepage is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used. |
Property: spatial/geographic
Property: license
RDF Property: | dct:license |
Definition: | This links to the license document under which the catalog
is made available and not the datasets. Even if the license of the catalog applies to all of its
datasets and distributions, it should be replicated on each distribution. |
Range: | dct:LicenseDocument |
See also: | catalog rights,
distribution license |
Property: rights
RDF Property: | dct:rights |
Definition: | This describes the rights under which the catalog
can be used/reused and not the datasets. Even if theses rights apply to all the catalog
datasets and distributions, it should be replicated on each distribution. |
Range: | dct:RightsStatement |
See also: | catalog license,
distribution rights |
5.2 Class: Catalog record
The following properties are recommended for use on this class:
description,
listing date,
primary topic,
title,
update date
RDF Class: | dcat:CatalogRecord |
Definition: | A record in a data catalog, describing a single dataset. |
Usage note | This class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about
a dataset and metadata about the dataset's entry in the catalog. For example, the publication date property of the dataset reflects
the date when the information was originally made available by the publishing agency, while the publication date of the catalog record is the date when the dataset was added to the catalog.
In cases where both dates differ, or where only the latter is known, the publication date should only be specified for the catalog record.
Notice that the W3C PROV Ontology [PROV-O] allows describing further provenance information such as the details of the process and the agent involved in a particular change to a dataset.
|
See also | Dataset |
If a catalog is represented as an RDF Dataset with named graphs (as defined in [SPARQL-QUERY-11]),
then it is appropriate to place the description of each dataset
(consisting of all RDF triples that mention the dcat:Dataset, dcat:CatalogRecord, and any of its dcat:Distributions)
into a separate named graph. The name of that graph should be the IRI of the catalog record.
Property: update/modification date
Property: primary topic
RDF Property: | foaf:primaryTopic |
Definition: | Links the catalog record to the dcat:Dataset resource described in the record. |
Usage note: | foaf:primaryTopic property is functional:
each catalog record can have at most one primary topic i.e. describes one dataset. |
5.3 Class: Dataset
The following properties are recommended for use on this class:
contact point,
description,
distribution,
frequency,
identifier,
keyword,
landing page,
language,
publisher,
release date,
spatial coverage,
temporal coverage,
theme,
title,
update date,
RDF Class: | dcat:Dataset |
Definition: | A collection of data, published or curated by a single agent, and available for access or download in one or more formats. |
Sub class of: | dctype:Dataset |
Usage note: | This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter. |
See also: | Catalog record |
Property: update/modification date
RDF Property: | dct:modified |
Definition: | Most recent date on which the dataset was changed, updated or modified. |
Range: | rdfs:Literal
encoded using the relevant ISO 8601 Date and Time compliant string and typed using the appropriate XML Schema datatype [XMLSCHEMA-2]
|
Usage note: | The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated. |
See also: | frequency |
Property: language
RDF Property: | dct:language |
Definition: | The language of the dataset. |
Range: | dct:LinguisticSystem
Resources defined by the Library of Congress (1,
2) should be used.
If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI should be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code should be used. |
Usage note: | This overrides the value of the catalog language in case of conflict. |
Property: identifier
RDF Property: | dct:identifier |
Definition: | A unique identifier of the dataset. |
Range: | rdfs:Literal |
Usage note: | The identifier might be used as part of the URI of the dataset, but still having it represented explicitly is useful. |
Property: spatial/geographical coverage
RDF Property: | dct:spatial |
Definition: | Spatial coverage of the dataset. |
Range: | dct:Location (A spatial region or named place) |
Property: temporal coverage
RDF Property: | dct:temporal |
Definition: | The temporal period that the dataset covers. |
Range: | dct:PeriodOfTime (An interval of time that is named or defined by its start and end dates) |
Property: dataset distribution
Property: landing page
RDF Property: | dcat:landingPage |
Definition: | A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information. |
Sub property of: | foaf:page |
Domain: | dcat:Dataset |
Range: | foaf:Document |
Usage note: |
If the distribution(s) are accessible only through a landing page
(i.e. direct download URLs are not known), then the landing page link should be duplicated as accessURL on a distribution. (see example 4.4)
|
5.4 Class: Distribution
The following properties are recommended for use on this class:
access URL,
byte size,
description,
download URL,
format,
license,
media type,
release date,
rights,
title,
update date
RDF class: | dcat:Distribution |
Definition: | Represents a specific available form of a dataset. Each dataset might be available in different forms,
these forms might represent different formats of the dataset or different endpoints.
Examples of distributions include a downloadable CSV file, an API or an RSS feed |
Usage note: | This represents a general availability of a dataset it implies no information
about the actual access method of the data, i.e. whether it is a direct download, API, or some through Web page.
The use of dcat:downloadURL property indicates directly downloadable distributions. |
Property: update/modification date
Property: rights
RDF Property: | dct:rights |
Definition: | Information about rights held in and over the distribution. |
Range: | dct:RightsStatement |
Usage note: | dct:license, which is a sub-property of dct:rights, can be used to link
a distribution to a license document. However, dct:rights allows linking to a rights statement that
can include licensing information as well as other information that supplements the licence such as attribution. |
See also: | distribution license,
catalog rights |
Property: access URL
RDF Property: | dcat:accessURL |
Definition: | Could be any kind of URL that gives access to a distribution of the dataset. E.g. landing page, download, feed URL,
SPARQL endpoint. Use when your catalog does not have information on which it is or when it is definitely not a download. |
Range: | rdfs:Resource |
Usage note: | - the value is a URL.
-
If the distribution(s) are accessible only through a landing page
(i.e. direct download URLs are not known), then the landing page link should be duplicated as accessURL on a distribution. (see example 4.4)
|
See also | distribution download URL |
Property: download URL
RDF Property: | dcat:downloadURL |
Definition: | This is a direct link to a downloadable file in a given format. E.g. CSV file or RDF file.
The format is described by the distribution's dc:format and/or dcat:mediaType |
Range: | rdfs:Resource |
Usage note: | the value is a URL. |
See also | distribution access URL |
Property: byteSize
RDF Property: | dcat:byteSize |
Definition: | The size of a distribution in bytes. |
Range: | rdfs:Literal typed as xsd:decimal. |
Usage note: | The size in bytes can be approximated when the precise size is not known. |
RDF Property: | dcat:mediaType |
Definition: | The media type of the distribution as defined by IANA. |
Sub property of: | dct:format |
Range: | dct:MediaTypeOrExtent |
Usage note: | This property should be used when the media type of the distribution is defined in IANA, otherwise dct:format may be used with different values. |
See also: | format |
5.5 Class: Concept scheme
5.6 Class: Concept
RDF Class: | skos:Concept |
Definition: | A category or a theme used to describe datasets in the catalog. |
Usage note: | It is recommended to use either skos:inScheme or skos:topConceptOf on every skos:Concept
used to classify datasets to link it to the concept scheme it belongs to. This concept scheme is typically associated with the catalog using dcat:themeTaxonomy |
See also: | catalog themes, dataset theme |
5.7 Class: Organization/Person
RDF Classes: | foaf:Person for people and foaf:Organization for government agencies or other entities. |
Usage note: | FOAF provides sufficient properties to describe these entities. |
A. Acknowledgements
This document contains a significant contribution from Richard Cyganiak. Richard Cyganiak
is one of the initiators of the DCAT work and
significantly contributed to the work on this specification as it made its way through the W3C process.
The editors would like to thank Vassilios Peristeras for his comments and support for the original DCAT work. Vassilios Peristeras is also one of the initiators of the DCAT work. We would also like to thank Rufus Pollock for his significant input and comments.
This document has benefited from inputs from many members of the Government Linked Data Working Group.
Specific thanks are due to Ghislain Atemezing, Martin Alvarez and Makx Dekkers.
B. Change history
Changes since W3C Last Call working Draft 12 March 2013:
- Section 4: diagram updated with new properties
- Section 4: add text to clarify describing datasets available via API
- Section 5.1: description of properties
dct:issued
and dct:modified
updated
- Section 5.1:
dct:rights
added
- Section 5.2: description of properties
dct:issued
and dct:modified
updated
- Section 5.3: description of properties
dct:issued
and dct:modified
updated
- Section 5.3:
dcat:contactPoint
added
- Section 5.4: description of properties
dct:issued
and dct:modified
updated
- Section 5.4:
dct:rights
added
- Section 5.5: split into two sections 5.5 and 5.6