1. Introduction
This section is not normative.
Web applications store data locally on a user’s computer in order to provide functionality while the user is offline, and to increase performance when the user is online. These local caches have significant advantages for both users and developers, but present risks as well.
A user’s data is both sensitive and valuable; web developers ought to take reasonable steps to protect it. One such step would be to encrypt data before storing it. Another would be to remove data from the user’s machine when it is no longer necessary (for example, when the user signs out of the application, or deletes their account).
Site authors can remove data from a number of storage mechanisms via
JavaScript, but others are difficult to deal with reliably. Consider cookies,
for instance, which can be partially cleared via JavaScript access to document.cookie
. HttpOnly
cookies, however, can only
be removed via a number of Set-Cookie
headers in an HTTP
response. This, of course, requires exhaustive knowledge of all the cookies
set for a host, which can be complicated to ascertain. Cache is still harder;
no imperative interface to a browser’s network cache exists, period.
This document defines a new mechanism to deal with removing data from these and other types of local storage, giving web developers the ability to clear out a user’s local cache of data via the Clear-Site-Data HTTP response header.
1.1. Examples
1.1.1. Signing Out
https://supersecretsocialnetwork.example.com/logout
, and the
site author wishes to ensure that locally stored data is removed as a
result.
They can do so by sending the following HTTP header in the response:
Clear-Site-Data: { "types": [ "cache", "cookies", "storage", "executionContexts" ] }
1.1.2. Targeted Clearing
https://megacorp.example.com/logout
. Megacorp has a large
number of services available as subdomains, so many that it’s not entirely
clear which of them would be safe to clear as a response to a logout action.
One option would be to simply clear everything, and deal with the fallout.
Megacorp’s CEO, however, once lost hours and hours of progress in "Irate
Ibexes" due to inadvertent site-data clearing, and so refuses to allow such
a sweeping impact to the site’s users.
The developers know, however, that the "Minus" application is certainly safe to clear out. They can target this specific subdomain by including a request to that subdomain as part of the logout landing page (ideally as a CORS-enabled, CSRF-protected POST):
fetch("https://minus.megacorp.example.com/clear-site-data", { method: "POST", mode: "cors", headers: new Headers({ "CSRF": "[insert sekrit token here]" }) });
That endpoint would return proper CORS headers in response to that request’s preflight, and would return the following header for the actual request:
Clear-Site-Data: { "types": [ "cache", "cookies", "storage", "executionContexts" ] }
1.1.3. Keep Critical Cookies
https://ads-are-awesome.example.com/optout
. The site author
wishes to remove DOM-accessible data which might contain tracking
information, but needs to ensure that the opt-out cookie which the user has
just received isn’t wiped along with it.
They can do so by sending the following HTTP header in the response, which includes all the types except for "cookies":
Clear-Site-Data: { "types": [ "cache", "storage", "executionContexts" ] }
1.1.4. Kill Switch
They can reduce the risk of a persistent client-side XSS by sending the following HTTP header in a response to wipe out local sources of data:
Clear-Site-Data: { "types": [ "cache", "cookies", "storage", "executionContexts" ] }
Note: Installing a Service Worker guarantees that a request will go out to a server every ~24 hours. That update ping would be a wonderful time to send a header like this one in case of catastrophe. [SERVICE-WORKERS]
1.2. Goals
Generally, the goal is to allow web developers more control over the data stored locally by a user agent for their origins. In particular, developers should be able to reliably ensure the following:
-
Data stored in an origin’s client-side storage mechanisms like [INDEXEDDB], WebSQL, Filesystem,
localStorage
, andsessionStorage
is cleared. -
Cookies for an origin’s host are removed [RFC6265].
-
Web Workers (dedicated and shared) running for an origin are terminated.
-
Service Workers registered for an origin are terminated and deregistered.
-
Resources from an origin are removed from the user agent’s local cache.
-
All of the above can be propagated to the HTTP version of an HTTPS origin.
-
None of the above can be bypassed by a maliciously active document that retains interesting data in memory, and rewrites it if it’s cleared.
2. Clearing Site Data
Developers may instruct a user agent to clear various types of relevant data in two ways: an HTTP response header, and a JavaScript API:
2.1. The Clear-Site-Data
HTTP Response Header Field
The Clear-Site-Data
HTTP response header field
sends a signal to the user agent that it ought to remove all data
of a certain set of types. The header is represented by the following ABNF [RFC5234]:
Report-To = json-field-value ; See Section 2 of [[HTTP-JFV]], and Section 2 of [[RFC7159]]
The header’s value is interpreted as an array of JSON objects, as described in Section 4 of [HTTP-JFV].
Each object in the array represents a clearing action that the user agent MUST undertake, and will be parsed as defined in §3.1 Parsing.
The following subsections defined the initial set of known members in each JSON object the header’s value defines. Future versions of this document may define additional such members, and user agents MUST ignore unknown members when parsing the header.
2.1.1. The types
member
The types
member is an array of keywords designating the kinds
of data that the server wishes the user agent to remove. The member’s value
MUST be an array, and that array MUST contain only strings; any other types
will result in a parse error.
The following are the initial set of known types which may be specified in the member’s array value. Future versions of this document may define additional types, and user agents MUST ignore unknown types when parsing the header:
-
"
cache
" -
The "
cache
" type indicates that the server wishes to remove locally cached data associated with the origin of a particular response’surl
. This includes the network cache, of course, but will also remove data from various other caches which a user agent implements (prerendered pages, script caches, shader caches, etc.).Implementation details are in §3.4.3 Clear cache for origin.
When delivered with a response fromhttps://example.com/clear
, the following header will cause caches associated with the originhttps://example.com
: to be cleared:Clear-Site-Data: { "types": [ "cache" ] }
-
"
cookies
" -
The "
cookies
" type indicates that the server wishes to remove cookies associated with the origin of a particular response’surl
. Along with cookies, HTTP authentication credentials [RFC7235], and origin-bound tokens such as those defined by Channel ID [CHANNELID] and Token Binding [TOKBIND] are also cleared.Implementation details are in §3.4.4 Clear cookies for origin.
When delivered with a response fromhttps://example.com/clear
, the following header will cause cookies associated with the originhttps://example.com
to be cleared:Clear-Site-Data: { "types": "cookies" ] }
-
"
storage
" -
The "
storage
" type indicates that the server wishes to remove locally stored data associated with the origin of a particular response’surl
. This includes storage mechansims such as (localStorage
,sessionStorage
, [INDEXEDDB], [WEBDATABASE], etc), as well as tangentially related mechainsm such as service worker registrations.Implementation details are in §3.4.5 Clear DOM-accessible storage for origin.
When delivered with a response fromhttps://example.com/clear
, the following header will cause DOM-accessible storage for the originhttps://example.com
to be cleared:Clear-Site-Data: { "types": "storage" ] }
-
"
executionContexts
" -
The "
executionContexts
" type indicates that the server wishes to neuter and reload execution contexts currently rendering the origin of a particular response’surl
.Implementation details are in §3.4.1 Neuter browsing contexts matching origin.
When delivered with a response fromhttps://example.com/clear
, the following header will cause execution contexts displaying the originhttps://example.com
to be neutered and reloaded:Clear-Site-Data: executionContexts
2.2. JavaScript API
This might live more cleanly in [STORAGE].
navigator.storage.clear();
If they only wished to clear the otherwise inaccessible cache for the current origin:
navigator.storage.clear({ types: [ "cache" ], });
enum StorageClearType { "cache", "cookies", "storage", "executionContexts" }; dictionary StorageClearOptions { sequence<StorageClearType> types; }; partial interface StorageManager { Promise<void> clear(StorageClearOptions options); };
- clear(options)
-
Clears data based on the values in the options argument.
Returns a Promise that resolves when clearing is complete. If no
types
are specified, all data types will be cleared.Arguments for the StorageManager.clear(options) method. Parameter Type Nullable Optional Description options StorageClearOptions ✘ ✘ The data to clear.
2.3. Fetch Integration
Monkey patching! Talk with Anne.
If the Clear-Site-Data
header is present in an HTTP response, then data MUST be cleared before rendering the response to
the user. That is, before step #9 in the current main fetch algorithm,
execute the following step:
-
If response’s header list contains a header named
Clear-Site-Data
, then execute §3.2 Clear data for response on response.
Note: This happens after Set-Cookie
headers are
processed. If we clear cookies, we clear all of them. This is intentional, as
removing only certain cookies might leave an application in an indeterminate
and vulnerable state. Removing specific cookies is best done via expiration
using the Set-Cookie
header.
3. Algorithms
3.1. Parsing
3.1.1. Which data types ought to be removed for response?
-
If response does not contain a
Clear-Site-Data
header, return an empty list. -
Let types be an empty list.
-
Let list be the result of executing the algorithm defined in Section 4 of [HTTP-JFV] on the value of response’s
Clear-Site-Data
header. If that algorithm results in an error, return an empty list. -
For each item in list:
-
Return types.
3.2. Clear data for response
Given a response (response), this algorithm parses the Clear-Site-Data
header to determine what needs to be
cleared, which origins are affected, and then executes those requests.
-
If response’s
URL
is a priori insecure, skip the remaining steps of this algorithm.Some have suggested that this might not be a restriction we want (see Martin Thomson’s public-webappsec post on the topic, for example).
-
Let types be the result of §3.1.1 Which data types ought to be removed for response? executed on response.
-
Execute §3.4 Clear types for origin on types, response’s
url
's origin.
Note: Especially given the cross-context implications, user agents are are encouraged to give web developers some mechanism by which the clearing operation can be debugged. This might take the form of a console message or timeline entry indicating success.
3.3. Clear data for options
Given a StorageClearOptions
(options), this algorithm
determines what needs to be cleared, returns a Promise, and executes the
request asynchronously.
-
If the incumbent settings object is not a secure context, return a
Promise
rejected withNotSupportedError
. -
Let promise be a newly created
Promise
object. -
Return promise, and execute the remaining steps asynchronously.
-
Let types be an empty list.
-
If options’
types
is an empty sequence:-
Append
cache
,cookies
,storage
, andexecutionContexts
to types.
-
-
Otherwise, for each
StorageClearType
type in options’types
property:-
Append type to types.
-
-
Execute §3.4 Clear types for origin on types and the incumbent settings object’s origin.
-
Resolve promise with
undefined
.
3.4. Clear types for origin
-
If types contains "
executionContexts
", execute §3.4.1 Neuter browsing contexts matching origin on origin. -
If types contains "
cookies
", execute §3.4.4 Clear cookies for origin on origin. -
If types contains "
storage
", execute §3.4.5 Clear DOM-accessible storage for origin on origin. -
If types contains "
cache
", execute §3.4.3 Clear cache for origin on origin. -
If types contains "
executionContexts
", execute §3.4.2 Reload browsing contexts matching origin on origin.
3.4.1. Neuter browsing contexts matching origin
Given an origin (origin) this algorithm walks through the set of browsing contexts which the user agent knows about, and sandboxes each in order to prevent them from recreating cleared data (from in-memory JavaScript variables, for instance). Once data is cleared, the affected browsing contexts will be hard-reloaded, as defined in §3.4.2 Reload browsing contexts matching origin:
-
For each context in the user agent’s set of browsing contexts:
-
Let document be context’s active document.
-
While document is an
iframe srcdoc
document, let document be the active document of document’s browsing context container. -
If context’s origin is origin:
-
Parse a sandboxing directive using the empty string as the input, and document’s active sandboxing flag set as the output.
-
-
3.4.2. Reload browsing contexts matching origin
Given an origin (origin), this algorithm walks through the set of browsing contexts which the user agent knows about and reloads each of them:
-
For each context in the user agent’s set of browsing contexts:
-
Let document be context’s active document.
-
While document is an
iframe srcdoc
document, let document be the active document of document’s browsing context container. -
If context’s origin is origin:
-
Navigate context to document’s
URL
with replacement enabled and exceptions enabled. The source browsing context is context. This is a reload-triggered navigation.
-
-
3.4.3. Clear cache for origin
Given an origin (origin), this algorithm removes data from the user agent’s local caches that matches the origin.
-
Let host be origin’s
host
, canonicalized as per Section 5.1.2 of [RFC6265]. -
Let cache list be the set of entries from the network cache whose
target URI
host
is identical to host when canonicalized as per Section 5.1.2 of [RFC6265]. -
For each entry in cache list:
-
Remove entry from the network cache.
-
-
If a user agent implements caches beyond a pure network cache, it MUST remove all entries from those caches which match origin.
We’re dealing with the network cache here, as defined in [RFC7234], but that’s not nearly everything a user agent caches. How hand-wavey with the vendor-specific section can we be? For instance, Chrome clears out prerendered pages, script caches, WebGL shader caches, WebRTC bits and pieces, address bar suggestion caches, various networking bits that aren’t representations (HSTS/HPKP, SCDH, etc.). Perhaps [STORAGE] will make this clearer?
3.4.4. Clear cookies for origin
Given an origin (origin), this algorithm removes cookies
from the user agent’s cookie store whose domain
attribute domain-matches origin’s registered domain.
Note: We remove all the cookies for an entire registered domain, as
cookies ignore the same-origin policy, and there’s a distinct risk that we’d
leave applications in an ill-defined state if we only cleared cookies for a
particular subdomain. Consider accounts.google.com
vs mail.google.com
,
for instance, both of which have cookies that signal a user’s signed-in
status.
Note: This algorithm assumes that the user agent has implemented a cookie store (as discussed in Section 5.3 of [RFC6265]), which offers the ability to retrieve a list of cookies by host, and to remove individual cookies.
-
Let host be origin’s
host
, canonicalized as per Section 5.1.2 of [RFC6265]. -
Let registered be the registered domain of host.
-
Let cookie list be the set of cookies from the cookie store whose
domain
attribute is a domain-match with registered. -
For each cookie in cookie list:
-
Remove cookie from the cookie store.
-
-
Clear related origin-bound Channel IDs [CHANNELID] and tokens [TOKBIND].
-
Clear related HTTP authentication credentials [RFC7235].
The process of clearing both bound tokens/IDs and HTTP authentication is super hand-wavey. <https://github.com/w3c/webappsec-clear-site-data/issues/2>
3.4.5. Clear DOM-accessible storage for origin
-
For each area in the user agent’s set of local storage areas [HTML]:
-
For each area in the user agent’s set of session storage areas [HTML]:
-
For each database in the user agent’s set of Indexed Databases [INDEXEDDB]:
-
If database’s origin is origin:
-
Set database’s delete pending flag to
true
. -
For each connection in the set of all
IDBDatabase
objects connected to database:-
Execute the database closing steps on connection, setting the forced flag.
-
-
Execute the database deletion steps on database, passing in database’s origin and name.
-
-
-
For each registration in the user agent’s set of registered service worker registrations:
-
If registration’s scope URL’s origin is origin:
-
Execute
unregister()
on registration.
-
-
-
For any other script-accessible storage mechanism, the user agent MUST delete any data associated with this origin. This includes (but is not limited to) the following:
-
An origin’s WebSQL databases [WEBDATABASE].
-
An origin’s filesystems [file-system-api]
-
ChannelID
-
Plugin data (e.g. NPP_ClearSiteData)
-
Appcache.
-
4. Security Considerations
4.1. Incomplete Clearing
It is possible that an application could be put into an indeterminate state by clearing only one type of storage. We mitigate that to some extent by clearing all storage options as a block, and by requiring that the header be delivered over a secure connection.
5. Privacy Considerations
5.1. Web developers control the timing.
If triggered at appropriate times, Clear-Site-Data
can
increase a user’s privacy and security by clearing sensitive data from their
user agent. However, note that the web developer (and not the user)
is in control of when the clearing event is triggered. Even assuming a
non-malicious site author, users can’t rely on data being cleared at any
particular point, nor are users in control of what data types are cleared.
If a user wishes to ensure that site data is indeed cleared at some specific point, they ought to rely on the data-clearing functionality offered by their user agent.
At a bare minimum, user agents OUGHT TO (in the [RFC6919] sense of the words) offer the same functionality to users that they offer to web developers. Ideally, they will offer significantly more than we can offer at a platform level (clearing browsing history, for example).
5.2. Remnants of data on disk.
While Clear-Site-Data
triggers a clearing event in a
user’s agent, it is difficult to make promises about the state of a user’s
disk after a clearing event takes place. In particular, note that it is up
to the user agent to ensure that all traces of a site’s date is actually
removed from disk, which can be a herculean task (consider virtual memory,
as a good example of a larger issue).
In short, most user agents implement data clearing as "best effort", but can’t promise an exhaustive wipe.
If a user wishes to ensure that site data does not remain on disk, the best way to do so is to use a browsing mode that promises not to intentionally write data to disk (Chrome’s "Incognito", Internet Explorer’s "InPrivate", etc). These modes will do a better job of keeping data off disk, but are still subject to a number of limitations at the edges.
6. IANA Considerations
The permanent message header field registry should be updated with the following registration: [RFC3864]
6.1. Clear-Site-Data
- Header field name
- Clear-Site-Data
- Applicable protocol
- http
- Status
- standard
- Author/Change controller
- W3C
- Specification document
- This specification (See §2.1 The Clear-Site-Data HTTP Response Header Field)
7. Acknowledgements
Michal Zalewski proposed a variant of this concept, and Mark Knichel helped refine the details.