Warning:
This wiki has been archived and is now read-only.
DataIndependence
Original Text: Data Independence And Survival Best Practices
Contents
Data Independence And Survival Best Practices
At a regular pace, we hear about social networks catastrophes. One of the last example is the bookmark service Magnolia which has lost all data from its users. Some people who have subscribed to their own RSS feed of bookmarks have recovered their data. Social networks catastrophes are of a different types and with different consequences, but often revolve around personal data. These data can be "fully" private to completly public with a lot of granularity in between (an opacity defined by shades).
- Abstract
- Best Practices For Users
- Best Practices For Services Providers
- Services Data Independence Grid
- Formats List
- References
- Acknowledgements
- Issues
Abstract
Data Independence And Survival Best Practices
collects ideas around data independence. How to better share your data by promoting reusability, standards and clear policies. A series of best practices and tools for both the users (individual or structure) and service providers will be given. This document is a working draft.
Document license and Feedback
Creative Commons License Data Independence and Survival Best Practices by Karl Dubost is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 France License. Based on a work at www.la-grange.net.
The content of Data Independence and Survival Best Practices has been written with the direct participation of some people and through a lot of discussions. See Acknowledgements.
If you have any comments about this document, please send an email to karl+databp@la-grange.net and copy www-archive@w3.org or on twitter at @karlpro. You can use the [[1]] to reference the document in twitter.
What is data?
Along this document, we consider data being any kind of content produced by a person and disposed on the Web. It could be photographs, drawings, text, code, etc. We also include the enriched data. When data are put online, they will be enriched through human or automatic interactions (example: tagging on photographs). These enriched data are part of the personal data value, which is worth keeping in the longterm.
What are services?
Services could be a simple blog, a social network, an online simple backup system, a messenger communication tool, etc. Some of these services are accessed through a browser, some through specific clients or Web applications. Many users are also unaware of what is done with their data and how some different online services belong to some unique data aggregator companies.
There is a need for a map of the different online services and which belongs to what.
Type Of Contexts
Using services and exposing data comes in many different contexts. Some people are using their own computers, some people are sharing a computer at home or in a public space such as a classrom or an internet cafe. This creates more challenges for both one's privacy and one's data independence. It is not always possible to save data locally when travelling and accessing or uploading data online.
- one user - one owned computer - one browser
- one user - one owned computer - multiple browsers
- multiple users - one shared computer - one shared browser
- multiple users - multiple shared computers - multiple browsers
Typographic Convention
Blue box will mark issues or todo things.
Yellow box will mark best practices or check recommendations
Best Practices For Users
Data Local Copy
When possible always keep a local copy of your data.
It is very tempting to rely on the distant services to keep copy of your own data when you lack of space on your own personal computer. This is a dangerous solution without redundancy of this copy. If you can't have a local copy, you should at least duplicate your data on another service.
Sometimes there are no obvious ways of keeping data.
example: When uploading photographs on a service, do not get rid of your own data locally.
Remote Backup
Whenever possible, always keep a secure remote backup of your data.
As a counterpart to the good practice of always having a local duplicate of data remotely held in a data silo, it is generally wise to organize remote backup of the kind of data you would usually only keep on a local computer. A local computer can be stolen, crash, break, burn or be flooded. In the latter two cases, the data you may have backed up on any media in your house or office is likely to be destroyed. too.
In the words of Linus Torvalds: Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;). His strategy of course only works for non-sensitive data, although some hackers have been known to upload heavily-encrypted data to bittorent.
For the rest of us, remote backup can be achieved, for example, by renting cheap server space, using a webmail service with plenty of disk space, or register on a remote backup service.
Best Practices For Services Providers
Data Export Feature
The service MUST provide a data export feature.
Users of services upload a large quantity of personal data. These data are essential. Services must provide a data export feature for giving the user the freedom to archive his/her own data. This feature exists in some social bookmarking services such as blogmarks.net and delicious.
The service SHOULD provide a reminder for encouraging users to export their data.
Users are not necessary aware of the avaibility of data export features. A reminder is a good way for users to discover the feature and to remember to do regular export. This becomes critical when the volume of data is very large such as photo hosting sites.
Data Export Format
Data export feature MUST be done in a publicly described format.
For users to be able to reuse exported data, the data export format must be described. It will give the possibility to third party services, software developers or users themselves to develop their own converter program for reusing the data.
Data export feature SHOULD be done in a open format.
Users will benefit of having their data exported in an open format. More applications developers will be able to develop converters or even just reusing the user data such as a local archiving and search engine system on the user computer.
Enriched Data Exportation
Data Export Feature SHOULD export the data and the enriched data.
Once uploaded on a services, data continue to be enriched through user interactions (ex: keywords tagging) or automatically (ex: geographical names geolocation). Sometimes users will use a particular service because of this richer interaction. It is important for users to keep their enriched data.
For example, there are third parties export program for photographs hosting services which keep the enriched data.
Services In-House Backup Solutions
The service MUST describe its own data backup policy.
It is important for users to be aware of the data backup policy of the services. What is happening when there is a failure in the service such as hard drive crash. Will the data be saved once a week, once a day? How long will it take for the backuped data to be put online after the crash?
The bookmarking service Magnolia has lost all users data in January 2009.
Data Back-Up Removal
The service MUST ensure to have the user agreement before releasing historical data.
Personal data have an history. Users may have chosen to remove some data which were available publicly in the past. If a service gives access to historical data, it must get the user agreements before making these data available to the public.
Google recently released their 2000 index, but doing so they have been careful to respect today's accessibility of data. People can change their own policy for accessing their data.
Data Access Control
The service MUST provide a fine access control on personal data?
It is important for users to have the ability to control who/what has access to data. Closed social networks offer a very basic level of granularity (me, friends, public) which is unfortunately not sufficient. It is also very hard for users to control what can access to their data. Putting data online doesn't mean that all applications and softwares should be able to access it. One might want to block the indexing of pages by search engines bots. One might want to block a specific user agent.
Data License at Upload Time
Does the service understands the license you have specified on your own data.
For example, you specify a license in a Web page, does the service displays this license.
Data License Choice
Can you specify the Data License once you have uploaded your data?
Services Data License Policy
When you upload your data, does the service make it explicit what is their policy with regards to licenses?
Services Data Independence Grid
We could create a table showing how different services fulfil these requirements.
Formats List
There is a need for a list of data formats helping the user to keep its data. For example, a full export of a blog could be done with the Atom Format. The description of one's profile on a service such as MySpace or Facebook could be done with FOAF.
References
List of references will be built step by step.
Acknowledgements
Thanks to Olivier Théreaux.
Issues
- Should I create an issues list tracker?
- Should I commit the document in another space than La Grange?
Karl, 16 février 2009, maj: 2009-02-26