Warning:
This wiki has been archived and is now read-only.
Best practices guidelines
Contents
General Guidelines
What is a best practice?
- A Best Practice implements one or more UC Requirements.
- A UC Requirement is motivated by one or more Use Cases.
- A Best Practice has a title, a description and one or more How to Sections.
- A How to Section specifies one possible way of implementing a Best Practice.
Use Cases Requirements [1] show that we should also consider licenses and vocabularies as "main targets" when defining best practices instead of just the data. However, describing best practices for licenses is out of the scope of this working group.
Question 1: Should we define best practices for vocabularies or just for datasets?
Terminology
- Dataset: assuming that best practices definitions should not consider specific standards or technologies, we adopt the Dataset definition provided by DCAT, which considers a dataset as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats".
- Distribution: according to DCAT, a distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data. A dataset may have multiple distributions
- Vocabulary: according to the Linked Data Glossary, a vocabulary is a collection of "terms" for a particular purpose. Vocabularies play a fundamental role when "publishing and consuming Data on the Web", specifically to help with data integration. The use of this term overlaps with Ontology. Vocabularies may be used to define metadata.
- Metadata: according to the Linked Data Glossary, metadata is an information used to administer, describe, preserve, present, use or link other information held in resources, especially knowledge resources, be they physical or virtual. Metadata may be further subcategorized into several types. Datasets, distributions and vocabularies are described by metadata.
Best Practices
The examples of Best Practices presented below were extracted from the use case requirements [2]. Each BP may be implemented in several different ways, according to a specific technology, for example.
- BP1. Datasets should be available in an open format
- BP2. Datasets should be available in a machine-readable format
- BP3. Datasets should be available in multiple formats
- BP4. Datasets should be accessible in different ways
- BP5. Datasets should be available in standard data formats
- BP6. Datasets should be described by metadata
- BP7. Metadata should be available in a machine-readable format
- BP8. Standard vocabularies should be used to define metadata
- BP9. Datasets should be available at different levels of granularity
- BP10. Datasets selected for publication should be of high-value
- BP11. Datasets should be available in an up-to-date manner
- BP12. Each data resource should be associated with a unique identifier
- BP13. Datasets should be suitable for industry reuse
- BP14. Provenance information should be available
- BP15. Quality information should be available
- BP16. Usage information should be available
- BP17. Versioning information should be available
- BP18. Licensing information should be available
- BP19. Vocabularies should be well documented
- BP20. Existing reference vocabularies should be reused where possible
- BP21. Vocabularies should be shared in an open way
Mapping between Best Practices and Proposed Chapters
The table below shows and attempt to map the General Best Practices to the different groups that are currently working on the Best Practices document. (to be defined)
Section | Best Practice |
---|---|
URI Best Practice for Web Data URI (DURI), URI Design and Management for Persistence, URIs versus APIs | |
Guidance on the Provision of Metadata | |
Use of core vocabularies to improve interoperability | |
Data quality vocabulary | |
Data usage vocabulary | |
Publishing and accessing versions of datasets |
To discuss with the group (how to map?)
- Making controlled vocabularies accessible as URI sets:
- Mark, Antoine
- Technical factors for consideration when choosing data sets for publication
- Nathalia, Flavio
- Technical factors affecting potential use of open data for innovation, efficiency and commercial exploitation
- Vagner, Nathalia, Hadley, Yaso
- Data preservation:
- Phil, Christophe
Mapping between UC Requirements and Best Practices
The table below shows a map between Use Case Requirements and Best Practices (General and Specific). A Best Practice was created based on one or more requirements. (to be defined)
Requirement | Requirement Description | Best Practice |
---|---|---|
R-MetadataAvailable | Metadata should be available | |
R-MetadataMachineRead | Metadata should be machine-readable | |
R-MetadataStandardized | Metadata should be standardized | |
R-MetadataDocum | Metadata vocabulary, or values if vocabulary is not standardized, should be well-documented | |
R-MetadataInteroperable | Metadata should be interoperable | |
R-GranularityLevels | Data available at different levels of granularity should be accessible and modelled in a common way | |
R-FormatMachineRead | Data should be availabe in a machine-readable format | |
R-FormatStandardized | Data should be availabe in a standardized format | |
R-FormatOpen | Data should be availabe in an Open format | |
R-FormatMultiple | Data should be availabe in multiple formats | |
R-FormatLocalize | It should be possible to localize data on the Web | |
R-VocabReference | Existing reference vocabularies should be reused where possible | |
R-VocabDocum | Vocabularies should be clearly documented | |
R-VocabOpen | Vocabularies should be shared in an Open way | |
R-VocabVersion | Vocabularies should include versioning information | |
R-LicenseAvailable | Data should be associated with a license | |
R-LicenseMachineRead | Data licenses should be provided in a machine-readable format | |
R-LicenseStandardized | Standard vocabularies should be used to describe licenses | |
R-LicenseInteroperable | Data licenses should be interoperable | |
R-LicenseLiability | Liability terms associated with usage of Data on the Web should be clearly outlined | |
R-ProvAvailable | Data provenance information should be available | |
R-SelectHighValue | Datasets selected for publication should be of high-value | |
R-SelectDemand | Datasets selected for publication should be in demand by potential users | |
R-AccessBulk | Data should be available for bulk download | |
R-AccessRealTime | Where data is produced in real-time, it should be available on the Web in real-time | |
R-AccessUptodate | Data should be available in an up-to-date manner | |
R-SensitivePrivacy | Data should not infringe on a person's right to privacy | |
R-SensitiveSecurity | Data should not infringe on national security | |
R-UniqueIdentifier | Each data resource should be associated with a unique identifier | |
R-MultipleRepresentations | A data resource may have multiple representations, e.g. xml/html/json/rdf | |
R-DynamicGeneration | Dynamic generation of Data on the Web from non-Web data resources | |
R-AutomaticUpdate | Automatic update of Data on the Web when original data source is updated | |
R-CoreRegister | Core registers should be accessible | |
R-IndustryReuse | Data should be suitable for industry reuse | |
R-SLAAvailable | Service Level Agreements (SLAs) for industry reuse of the data should be available if requested | |
R-SLAMachineRead | SLAs should be provided in a machine-readable format | |
R-SLAStandardized | Standard vocabularies should be used to describe SLAs | |
R-PotentialRevenue | Potential revenue streams from data should be described | |
R-PersistentIdentification | Data should be persistently identifiable | |
R-Archiving | It should be possible to archive data | |
R-QualityAvailable | Quality information should be available | |
R-UsageAvailable | Usage information should be available |
References
- Best Practices for Publishing Linked Data - http://www.w3.org/TR/2014/NOTE-ld-bp-20140109/
- Web Application Privacy Best Practices - http://www.w3.org/TR/app-privacy-bp/
- Best Practice Recipes for Publishing RDF Vocabularies - http://www.w3.org/TR/2008/NOTE-swbp-vocab-pub-20080828/
- Mobile Web Best Practices 1.0 - http://www.w3.org/TR/2008/REC-mobile-bp-20080729/