W3C Access Control Lists

CVS Version:
$Id: Overview.html,v 1.6 2005/03/20 07:34:44 eric Exp $
Author:
Eric Prud'hommeaux, W3C

Abstract

The W3C website serves over 100 million hits per month. While most resources are publicly readable, some documents are available only to the W3C members, team, or some custom set of groups and users. Further, the access for a document changes over the life of the document. In order to accommodate these needs, we've developed a flexible, dynamic access control system to control HTTP access (GET, PUT, POST...) to individual resources. These ACLs (Access Control Lists) are propagated to geographically distributed mirrors (France, Japan and the US) via an efficient replication system that is tolerant of network failures/delays. This document describes the architecture and deployment of the system. The apache modules, ACLs replication code and the interface for changing ACLs are all publicly available. It hoped that others might benefit from adopting the same system and sharing code.

Status of this Document

This document describes a system deployed at W3C. It is not endorsed by the W3C members, team, or any working group.

Table of Contents

1. Problem Statement

The migration of the W3C web site to the the ACLs system was motivated by a publication challenges.

1.1 Directories With Heterogeneous Access Privileges

The common view of a website is that certain directory trees within the website have different levels of access restriction. These needs are met by the standard apache auth modules which look for directives in the config files:

	<Location "/">
	    AuthType Basic
	    AuthName W3C
	    AuthUserFile users
	    AuthGroupFile groups
	    <Limit PUT DELETE>
		Require group putters
	    </Limit>
	</Location>
	<Location "/Team">
	    ...
	    Require group team
	</Location>
	<Location "/Member">
	    ...
	    Require group member
	</Location>

It is quite easy to establish custom access for arbitrary resources:

	<Location "/mystuff">
	    ...
	    Require group team group members user tom user dick user harry
	</Location>

but it requires write access to a either a system-wide or directory-wide config files. The W3C ACLs system moves this information into a relational database. This incurs the overhead of some inter-process communication, but also simplifies manipulation and congestion control.

1.2 Changing Access Privileges

Many documents on the W3C site are team-only or working-group-only for the initial drafts. Prior to the ACLs system, the migration from one level of access to another was usually accomplished by moving the document from one directory to another, eg. from /Team to /Member. Migrating complicated multi-resources documents was problematic and error-prone. The ACLs system has an interface that authors can invoke when they wish to publish a document to a different audience, or invite different people to edit the document.

2. Syncing Mirror Sites

The W3C web site is served on geographically distributed mirrors. Each of these sites can operated without network connectivity to a "master" site.

2.1 Mirror Site Technical Details

The DNS servers for w3.org analyze the requestor IP address and resolve www.w3.org to the closest (in the network topology) mirror. Creating or updating resources and ACLs for a resource requires pushing data to the mirrors sites. Update notifications are mailed to the mirrors. Procmail scripts on the mirrors update resources from CVS and update ACLs by performing SQL queries to populate local Berkely DB databases.

3. Database Schema

The authoritative state of the permissions for any resource, and the groups, users and IP addresses granted permission is stored in a MySQL database. The following examples summarize the access to three resources (http://www.w3.org/ which have corresponding access of ().

The large number of resources (over 350 thousand), and the desire that the system be able to determine the ACLs for a resource before that resource became obsolete demanded that resource names be a unique key. The uri table associates a given uri with an acl number:

primaryKey uri acl
1468 http://www.w3.org/Team/ 5
198 http://www.w3.org/Member/ 6
19 http://www.w3.org/ 7

The acls table associates an acl number with a set of accessors and a bit field of access privileges:

acl accessor access
5 102 0xf73
6 100 0xc32
6 102 0xf73
7 1 0xc32
7 102 0xf73

The accessors table associates an accessor number with a user name and password, an ip address, or a group name. Group membership is stored in a hierarchy table. The transitive closure of each group's membership within another group is stored in groupInclusions. The accessorInclusions table maintains a transitive closure of principles' (users and ip addresses) membership in groups. (For query uniformity, accessorInclusions has each principle as a member of itself.) Resolving the privileges for any user or ip address involves a join between uris, acls, accessorInclusions and accessors.

Calculating the transitive closure of principles in computationally intensive — it requires as many self-joins on hierarchy as the maximum generation count in any group membership (for example, user A is a member of group B is a member of group C). In everyday use, this computation is never needed. Adding user A to group Z requires only 2 joins between accessors and groupInclusions.

3.1 Mirror Update Technical Details

This is the closest thing to rocket science in the ACLs project. The goal that the mirrors be able to be unreachable for an arbitrary amount of time lead to a system that maintains the minimal diffs from any earlier to state to the current state. Each time a principle is removed from a group, an SUB entry replaces any earlier ADD entries for that principle in that group.

4. ACLs Manipulation Interface

Most database manipulation is done via a web interface called chacl. This interface design has been problematic. At first, I tried to make it too feature-full, which lead to confusion. An effort to find optimum trade-offs between simplicity and flexibility has resulted in two modes: prosaic and hard-to-use.

4.1 Chacl RDF Interface

The chacls script manipulates provisional ACLs, which are only commited to the database when the user so specifies. The provisional state is maintained in RDF; each interaction requires chacl to test the validity of the ACLs state for the client username/IP. Some client scripts compose RDF and submit it direclty to chacl.

5. Next Steps

Adding an undo feature to the system would make the administrators sleep a little easier at night. The current (untested) undo feature involves backup tapes and probably much suffering. The plans for this involve recording previous values of uris.acl whenever a script is about to update them. The chacl script has a unique session-id so it will be easy to maintain a transation history (as a stack) in the HTML interface and have the undo button pop and restore the top of the transaction stack.

A popular way for editing the web site is to use CVS (no subversion yet) to check out the relevant documents, edit them locally, and check them back in. Users could leverage this remote directory tree if they had an desktop ACLs tool that would inspect them tree and provide them with an interface for manipulating the documents. Such a tool could use the chacl RDF interface, which already does all the appropriate privilege tests.

6. Conclusion

Despite the interface, the W3C ACLs system is deployed and authors are either content or resigned to their fate. From the web user perspective, the ACLs system has been a transparent success. Administratively, the deployment has reached a steady state that requires very little expert attention. Document recipies for creating mirrors allow relatively blind administration. The system does not require any high-performance hardware — all daily operations are incremental operations and are optimized and require trivial amounts of memory and processor computation.


$Date: 2005/03/20 07:34:44 $

Valid XHTML 1.0!