Warning:
This wiki has been archived and is now read-only.
Technical factors for consideration when choosing data sets for publication
Contents
- 1 Intro
- 1.1 Good Practices
- 1.1.1 1. Open to integrate to others services/platforms
- 1.1.2 2. Open file format
- 1.1.3 3. "RDFizations of Datasets"
- 1.1.4 4. Dataset versioning
- 1.1.5 5. File readable for machine
- 1.1.6 6. REST access for individual datasets
- 1.1.7 7. It is good to use other protocols
- 1.1.8 8. Ontology definition must be standard and machine readable
- 1.1.9 9. URI must be persistent
- 1.1.10 10. Data must have more than one format available
- 1.1.11 11. Definition of update data frequency
- 1.1.12 12. Good filters/searches to avoid many unnecessary requests
- 1.1.13 13. Data must be structured
- 1.1.14 14. Reuse of existing ontologies
- 1.1.15 15. Dataset size must be limited to small portions to be consumed bit by bit
- 1.1.16 16. It is good to may some management tool's dataset
- 1.1.17 17. Use of a dedicated service
- 1.2 Bibliography
- 1.3 Relation between good practices and use cases
- 1.1 Good Practices
- 2 Editors and Contributors
- 3 Links and References
Intro
This section of the Data on the Web Best Practices document will include best practices for Technical factors for consideration when choosing data sets for publication.
Good Practices
1. Open to integrate to others services/platforms
2. Open file format
According to [1], proprietary file format could create technology dependency for the information use and this could generate restrictions to data access. Thus, the data need to be structured and organised to facilitate their manipulation for distinct software. For example, some data are available in PDF format which doesn't allow software analyse the document.
Open file format avoid the use of scraping techniques to translate a proprietary file format in open formats such as XML or JSON.
3. "RDFizations of Datasets"
4. Dataset versioning
5. File readable for machine
6. REST access for individual datasets
7. It is good to use other protocols
such as sftp, rsync, scp
8. Ontology definition must be standard and machine readable
9. URI must be persistent
10. Data must have more than one format available
According to [1], it mustn't make available files only one open format, as this would also undermine the use by a group of people (for knowledge lack), and in other cases could miss structure to manipulate the files.
11. Definition of update data frequency
12. Good filters/searches to avoid many unnecessary requests
13. Data must be structured
14. Reuse of existing ontologies
15. Dataset size must be limited to small portions to be consumed bit by bit
16. It is good to may some management tool's dataset
17. Use of a dedicated service
a service independent of the data origin
Bibliography
[1] http://www.w3c.br/pub/Materiais/PublicacoesW3C/manual_dados_abertos_desenvolvedores_web.pdf
Relation between good practices and use cases
Good Practices | Use Cases |
---|---|
12 | 22 |
Example | Example |
Example | Example |
Editors and Contributors
Nathalia
Flávio Yanai