Web Tracking and User Privacy in the Age of Ads Business Models

Karl Dubost, Opera Software A.S., March 2011 (Acknowledgements)

This paper expresses some of the issues that Opera Software A.S. might encounter in its businesses and products. This input can be used for the W3C Workshop on Web Tracking and User Privacy (28/29 April 2011, Princeton, NJ, USA). A lot of the challenges have been already exposed in the W3C Workshop on Privacy for Advanced Web APIs in 2010 in London, UK.

Opera Position Summary

Browsers and Control of Online Identities

Privacy is not just a technological issue. Technologies have a deep impact by creating, enforcing or destroying social contexts which are related to privacy. That said, we believe that technological solutions do not equate with users’ privacy. Privacy differs based on each individual cultural expectations and national legislations.

We should be careful about using the word « Privacy » when we sometimes mean being in control over our own identities. Understanding and controlling the data aggregation shaping our identities is a core question. The chosen technologies can have a significant impact on it.

Network Infrastructure

The first source of Web Tracking is the network infrastructure. Internet protocols such as HTTP, SMTP, POP, etc. are using IP addresses for communicating between two pieces of software. Some systems such as Web proxies have given tools for people of adding a layer of opacity in between their real IP and the service they want to reach. It relies on a system of initial trust toward the chosen HTTP proxy. Recently, but accessed by a limited number of users, systems such as Tor creates a pool of IP addresses for accessing services « anonymously »

Some devices in the context of geolocation are broadcasting their Ethernet addresses, which creates a deeper challenge for controlling one’s own identity.

In the next few months, years, IPv6 addresses will expand due to the lack of IPv4 addresses. This will pose a far greater challenge than the initiatives that have been taken lately. Basically, IPv6 will theoretically make possible to identify each individual users, by having a unique IP address for each devices.

The first limitation on one’s ability to control the identity is the network infrastructure itself.

Browsers

Browsers are the main tool for accessing the Web and communicating with online services. They represent a critical tool by which we create our identities. A single action is, most of the time, benign. A longterm aggregation of data creates a profile that ourselves are not aware of. Users should be aware of what they are doing and having an environment simple enough to manage their own data inside the browser. What are these sources?

Browsers for creating a better interaction with the Web propose features that are useful and critical at the same time. Here are a few sources of collecting data related to the identification:

It is interesting to note that some people have proposed strategies to remove the ability of users of controlling their data. For example, evercookie creates a system which recreate the tracers by using different type of containers such as cookies, LSO (flash cookies), PNG storage, local storage, etc. When one context is destroyed by the user, it is automatically recreated by the others.

The issue is becoming even more complex in the case of applications using the Web, the HTTP protocol but with a different chrome, not providing the usual information management options that a browser propose. For example, there are applications or Web widgets such as feed readers, mailers, etc. using the HTTP protocol and all its possibilities for accessing and interacting with content and providing little if no options for controlling the data.

Services

We have seen that browsers are personal data repository. There are also the mediator in using online services. Many of these services operate because they can use the personal data of users.

The experience of the users is not only tied to the product itself but offers services for enabling an access to a specific identity in different contexts. Emails, data storage (among many others) are common online services with strong implications on Web Tracking and User Privacy.

Tracking can be a matter of analyzing the content itself. For example, using an online mail service, a user could require to have everything being encrypted and decrypted on the client side, but then would not benefit of search features. On the other hand, many online services operate by profiling user data and send advertisements. It is then important to figure out how to properly make users aware of it be on the service itself or in the browser UI.

Strategy Against Web Tracking

In the age of business models based on ads, there is a strong resistance from any services to abandon tracking features. The more the browser will propose blocking features, the more the services will create strategies for circumventing them. The Web has been built in a specific social context with trust as a premise. Finding the right mechanism will pose challenges not only in terms of revenues but also technical.

The « Do Not Track » mechanisms propose an interesting experiment but rely also on a trust system and assume that people and services will act in a good will.

We think that strategies related to Web Tracking and Data Control should rely on a few principles:

Finding the appropriate technologies and protocols enabling the control of data by users is of high interests for Opera Software. It is challenging and requires the expertise of many areas: legal, technological, UX, security, etc. Some technological choices which are easy to develop might have negative consequences for users such as giving a false sense of trust and/or security. Sometimes a technology without a legal framework to enforce it will have no practical effect for the user.

The Web industry is facing an interesting question: Do we have to be identified to not be tracked?

Acknowledgements