PICS: Internet Access Controls Without Censorship

Paul Resnick
AT&T Research
600 Mountain Avenue
Murray Hill, NJ 07974
presnick@research.att.com

James Miller
World Wide Web Consortium
MIT Laboratory for Computer Science
Room NE43-355
545 Technology Square
Cambridge, MA 02139
jmiller@mit.edu

With its recent explosive growth, the Internet now faces a problem inherent in all media that serve diverse audiences: not all materials are appropriate for every audience. Societies have tailored their responses to the characteristics of the media [1, 3]: in most countries, there are more restrictions on broadcasting than on the distribution of printed materials. Any rules about distribution, however, will be too restrictive from some perspectives, yet not restrictive enough from others. We can do better-we can meet diverse needs by controlling reception rather than distribution. In the TV industry, this realization has led to the V-chip, a system for blocking reception based on labels embedded in the broadcast stream.

On the Internet, we can do still better, with richer labels that reflect diverse viewpoints, and more flexible selection criteria. PICS (note 2), the Platform for Internet Content Selection, establishes Internet conventions for label formats and distribution methods, while dictating neither a labeling vocabulary nor who should pay attention to which labels. It is analogous to specifying where on a package a label should appear, and in what font it should be printed, without specifying what it should say.

The PICS conventions have caught on quickly. In early 1996, Microsoft, Netscape, SurfWatch, CyberPatrol, and other software vendors announced PICS-compatible products. AOL, AT&T WorldNet, CompuServe, and Prodigy provide free blocking software that will be PICS-compliant by the end of 1996. RSACi and SafeSurf are offering their particular labeling vocabularies through on-line servers that produce PICS-formatted labels. In May of 1996, CompuServe announced that it will label all web content it produces using PICS-formatted RSACi labels.

Flexible Blocking

Not everyone needs to block reception of the same materials. Parents may not wish to expose their children to sexual or violent images. Businesses may want to prevent their employees from visiting recreational sites during hours of peak network usage. Governments may want to restrict reception of materials that are legal in other countries but not in their own. The "off" button (or disconnecting from the entire Net) is too crude: there should be some way to block only the inappropriate material. Appropriateness, however, is neither an objective nor a universal measure. It depends on at least three factors:

The supervisor: parenting styles differ, as do philosophies of management and government.
The recipient: what's appropriate for one fifteen year old may not be for an eight-year-old, or even all fifteen-year-olds.
The context: a game or chat room that is appropriate to access at home may be inappropriate at work or school.

Computer software can implement access controls that take into account all these factors. The basic idea, illustrated in Figure 1, is to interpose selection software between the recipient and the on-line documents. The software checks labels to determine whether to permit access to particular materials. It may permit access for some users but not others, or at some times but not others.

Figure 1: selection software automatically blocks access to some documents, but not others. Acknowledgment (note 3)

Prior to PICS there was no standard format for labels, so companies that wished to provide access control had to both develop the software and provide the labels. PICS provides a common format for labels, so that any PICS-compliant selection software can process any PICS-compliant label. A single site or document may have many labels, provided by different organizations. Consumers choose their selection software and their label sources (called rating services) independently, as illustrated in Figure 2. This separation allows both markets to flourish: companies that prefer to remain value-neutral can offer selection software without providing any labels; values-oriented organizations, without writing software, can create rating services that provide labels.

Figure 2: selection software blocks based on labels provided by publishers and third-party labeling services, and on selection criteria set by the parent.

PICS labels describe content on one or more dimensions. It is the selection software, not the labels themselves, that determine whether access will be permitted or prohibited. For example, if a rating service used the MPAA's movie-rating vocabulary, selection software might be configured to block an eight-year-old's access to PG-labeled documents, but to allow a fifteen-year-old's access to them. Parents can prohibit access to unlabeled documents, confining children to a zone known to be acceptable, or can allow access to any document that is not explicitly prohibited.

Each rating service can choose its own labeling vocabulary. For example, Yahoo labels might include a "coolness" dimension and a subject classification dimension.

Information publishers can self-label, just as manufacturers of children's toys currently label products with text such as, "Fun for ages 5 and up." Provided that publishers agree on a common labeling vocabulary, self-labeling is a simple mechanism well-matched to the distributed nature and high volume of information creation on the Internet.

When publishers are unwilling to participate, or can't be trusted to participate honestly, independent organizations can provide third-party labels. For example, the Simon Wiesenthal Center, which is concerned about Nazi propaganda and other hate speech, could label materials that are historically inaccurate or promote hate. Third-party labeling systems can also express features that are of concern to a limited audience. For example, a teacher might label a set of astronomical photographs and block access to everything else for the duration of a science lesson.

There are two PICS specification documents [6, 8]. The most important components are:

A syntax for describing a rating service, so that computer programs can present the service and its labels to users.
A syntax for labels, so that computer programs can process them. A label describes either a single document or a group of documents (e.g., a site.) A label may be digitally signed and may include a cryptographic hash of the associated document.
An embedding of labels (actually, lists of labels) into the RFC-822 transmission format [2] and the HTML document format.
An extension of the HTTP protocol, so clients can request that labels be transmitted with a document.
A query-syntax for an on-line database of labels (a label bureau.)

The accompanying sidebar illustrates these formats and protocols. Four technical features are worth highlighting.

First, the machine-readable service description is a resource that other computer programs can use for automatically generating interfaces that present the service to users. Consider the prototype shown in Figure 3, for configuring selection software. Here the parent is setting rules for what Johnny can visit, based on a rating service which has separate dimensions for language, nudity/sex, and violence. The parent drags the slider to indicate the maximum permitted value on the violence scale, noting the height of the thermometer and the text description (e.g., "Strong, vulgar language…") associated with each level on the scale. The software has taken the thermometer icons and text directly from the service description.

Figure 3: Prototype software (note 4) draws on text and icons in the service description to automatically generate a user interface for configuring selection rules.

Second, a rating service can provide variants of its service description tailored to different languages and cultures. The core elements remain the same, but the text and icons can be different. As a result, the service need not provide multiple versions of labels. The labels rely only on the common, core elements, but, using the variants of the service description, a single label can be displayed to different users in different languages.

Third, we have used URLs wherever universally distinct identifiers are required. For example, the identifier for a rating service is a URL. This has two advantages. First, a URL is a self-describing identifier, because it can be used to retrieve a descriptive document. Second, it leverages the Internet domain name registration system to permit decentralized choice of the identifier, while still guaranteeing distinctness from identifiers chosen by others.

Fourth, we specify that a response to a request for multiple labels must preserve the order of the request. If the server knows several alternative URLs that identify a single document, and the client asks for a label for one of those URLs, the server can send back a label for one of the alternates. The client can still match the label with its original request, from its position in the response, even though the document URLs do not match.

What PICS Doesn't Specify

In general, PICS specifies only those technical issues that affect interoperability. It does not specify how selection software or rating services work, just how they work together.

PICS-compatible software can implement selective blocking in various ways. One possibility is to build it into the browser on each computer, as announced by Microsoft and Netscape. A second method-one used in products such as CyberPatrol and SurfWatch-is to perform this operation as part of each computer's network protocol stack. A third possibility is to perform the operation somewhere in the network, for example at a proxy server used in combination with a firewall. Each alternative affects efficiency, ease of use, and security. For example, a browser could include nice interface features such as graying out blocked links, but it would be fairly easy for a child to install a different browser and bypass the selective blocking. The network implementation may be the most secure, but could create a performance bottleneck if not implemented carefully.

PICS does not specify how parents or other supervisors set configuration rules. One possibility is to provide a configuration tool like that shown in Figure 3. Even that amount of configuration may be too complex, however. Another possibility is for organizations and on-line services to provide preconfigured sets of selection rules. For example, an on-line service might team up with UNICEF to offer "Internet for kids" and "Internet for teens" packages, containing not only preconfigured selection rules, but also a default home page provided by UNICEF.

Labels can be retrieved in various ways. Some clients might choose to request labels each time a user tries to access a document. Others might cache frequently requested labels or download a large set from a label bureau and keep a local database, to minimize delays while labels are retrieved.

PICS specifies very little about how to run a labeling service, beyond the format of the service description and the labels. Rating services must make the following choices:

The labeling vocabulary. A common set of dimensions would make publishers' self-labels more useful to consumers but cultural divergence may make it difficult to arrive at a single set of dimensions. Governments may also mandate country-specific vocabularies. Third party labelers are likely to use a wide range of other dimensions.
Granularity. Services can label entire sites, or individual documents and images.
Who creates the labels. Services can employ professionals, volunteers, or computers to do the labeling. They can also delegate all or part of the label-creation task to content creators or to other rating services.
Coverage. Some services may strive for comprehensive coverage of the entire Internet, others for narrower areas such as pornography or educational sites. An interesting intermediate offering may be to label the documents that subscribers ask about: while there are thousands of sites and millions of documents available on the Internet, any particular set of users is likely to ask for access to a much smaller set.
Revenue generation. Some organizations that provide labels may choose not to charge anyone, relying on donations or levies on members. Other services can charge subscribers, charge intermediaries such as on-line services for the right to redistribute labels, or charge sites for the privilege of being labeled. We might even see the rise of labeling intermediaries who pay a royalty to values-oriented organizations such as UNICEF for the right to label documents with the UNICEF logo, according to criteria set by UNICEF.

Other Uses for Labels

New infrastructures are often used in unplanned ways, to meet latent needs. There will be many labeling vocabularies that are unrelated to access controls. The PICS specifications also plan for unplanned uses, by including extension mechanisms for adding new functionality. PICS is a new resource available to anyone who wishes to associate data with documents on the Internet, even documents that others control. Some of the promising applications include:

Collaborative labeling services could permit everyone to contribute labels, and use those labels to guide others toward interesting materials [4, 7]. Guidance can be personalized by matching end-users with others who have similar tastes, as reflected in their ratings of documents that both have examined [5, 10, 12].
On-line journals could publish all submissions, but attach review labels that each reader could interpret as guides to the best articles [9]. While PICS-compatible labeling services can associate text phrases or icons with values on numeric scales, so that a frequently used annotation such as "seminal article" can be encoded, PICS labels can not include arbitrary text. A PICS label can, however, include the URL of another document that contains textual annotations, which provides a means of integrating PICS with more general annotation platforms such as ComMentor [11].
Labeling vocabularies may be designed for classification rather than blocking, coupled with indexing engines that search based on labels and with browsers that display them.
Intellectual property vocabularies may develop for notifying people about who owns a document and how it may be copied and used. Of course, this is only one piece of the intellectual property protection puzzle, since it offers notification but not enforcement.
Privacy vocabularies may develop. End-users could express their privacy preferences and labels would notify them of what information is gathered about their interactions with a web site, and how that information will be used.
Reputation vocabularies may develop. The Better Business Bureau could associate labels with commercial sites that had especially good or especially bad business practices. Privacy groups could label sites according to their information practices. There could even be labels for Usenet authors according to the quality of the messages they post; posts from those with poor reputations could be screened out.

Conclusion

PICS provides a labeling infrastructure for the Internet. It is values-neutral: it can accommodate any set of labeling dimensions, and any criteria for assigning labels. Any PICS-compatible software can interpret labels from any source, because each source provides a machine-readable description of its labeling dimensions.

Around the world, governments are considering restrictions on on-line content. Since children differ, contexts of use differ, and values differ, blanket restrictions on distribution can never meet everyone's needs. Selection software can meet diverse needs, by blocking reception, and labels are the raw materials for implementing context-specific selection criteria. The availability of large quantities of labels will also lead to new sorting, searching, filtering, and organizing tools that help users surf the Internet more efficiently.

References

[1] J. Berman and D. Weitzner, "User Control: Renewing the Democratic Heart of the First Amendment in the Age of Interactive Media," Yale Law Journal, vol. 104, pp. 1619, 1995.

[2] D. Crocker, "RFC-822: Standard for the Format of ARPA Internet Text Messages," http://ds.internic.net/rfc/rfc822.txt, August 1982.

[3] I. de Sola Poole, Technologies of Freedom. Cambridge: MIT Press, 1983.

[4] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the ACM, vol. 35, pp. 61-70, 1992.

[5] W. Hill, L. Stead, and M. Rosenstein, "Recommending and Evaluating Choices in a Virtual Community of Use," Proceedings of CHI 95 Conference on Human Factors in Computing Systems, Denver: ACM. 194-201.

[6] T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax and Communication Protocols," World Wide Web Consortium http://w3.org/PICS/labels.html, May 5 1996.

[7] D. Maltz and K. Ehrlich, "Pointing the Way: Active Collaborative Filtering," Proceedings of CHI 95 Conference on Human Factors in Computing Systems, Denver: ACM. 202-209.

[8] J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating Systems (and Their Machine Readable Descriptions)," World Wide Web Consortium http://w3.org/PICS/services.html, May 5 1996.

[9] A. M. Odlyzko, "Tragic Loss or Good Riddance? The Impending Demise of Traditional Scholarly Journals," International Journal of Human-Computer Studies, vol. 42, pp. 71-122, 1995.

[10] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: An Open Architecture for Collaborative Filtering of Netnews," Proceedings of CSCW 94 Conference on Computer Supported Cooperative Work, New York: ACM. 175-186.

[11] M. Roscheisen, C. Mogensen, and T. Winograd, "A Platform for Third-Party Value-Added Information Providers: Architecture, Protocols, and Usage Examples," Stanford University CSDTR/DLTR (http://www-diglib.stanford.edu/diglib/pub/reports/commentor.html), November 1994, updated April 1995 1995.

[12] U. Shardanand and P. Maes, "Social Information Filtering: Algorithms for Automating "Word of Mouth"," Proceedings of CHI 95 Conference of Human Factors in Computing Systems, Denver: ACM. 210-217.

A Tour of the PICS Specifications

Figure 4 shows the description of a sample rating service, based on the MPAA's movie-rating scheme. The initial section includes general information about the service. The second section describes each of the dimensions, or categories, and the scales used for each. In this case, there is just a single category, with five possible values: G through NC-17. In actual labels, these values would be represented by the integers 0-4; the service description allows a software program to determine that a value of 1 corresponds to the PG rating and even to display the PG.gif icon to a user.


((PICS-version 1.1)
 (rating-system "http://MPAAscale.org/Ratings/Description/")
 (rating-service "http://MPAAscale.org/v1.0")  
 (icon "icons/MPAAscale.gif")  
 (name "The MPAA's Movie-rating Service")  
 (description "A rating service based on the MPAA's movie-rating scale")

 (category    
  (transmit-as "r") 
  (name "Rating")
  (label (name "G") (value 0) (icon "icons/G.gif"))
  (label (name "PG") (value 1) (icon "icons/PG.gif"))
  (label (name "PG-13") (value 2) (icon "icons/PG-13.gif"))
  (label (name "R") (value 3) (icon "icons/R.gif"))
  (label (name "NC-17") (value 4) (icon "icons/NC-17.gif"))))

Figure 4: A PICS-compatible description of a service that is based on the MPAA movie rating scheme.

Figure 5 shows a sample PICS label (actually a label list containing just one label.) The URL on the first line, which identifies the labeling service, makes it possible to redistribute labels yet still identify their original sources. The label can also include information about itself, such as the date on which it was created, the date it will expire, that the label is associated with a certain document (in this case, "http://www.gcf.org/stuff.html"), and the label's author. The last line shows the attributes that describe the document: a "language" value of 3; "sex" 2; and "violence" 0.

(PICS-1.1 "http://old.rsac.org/v1.0/" labels 
 on "1994.11.05T08:15-0500"              
 until "1995.12.31T23:59-0000"              
 for "http://www.gcf.org/stuff.html"
 by "John Doe"              
 ratings (l 3 s 2 v 0))

Figure 5: A sample label list from the service described in Figure 4.

Anything that can be named by a URL can be labeled, including documents that are accessed via FTP, gopher, or Netnews, as well as HTTP. PICS proposes a URL naming system for IRC, so that chat rooms with stable topics can be labeled.

Labels can include two optional security features (not shown in the example.) The first is a cryptographic hash of the labeled document, in the form of an MD5 message digest. This enables software to detect whether changes have been made to the document after the label was created. The second is a digital signature on the contents of the label itself, which allows software to verify that a label really was created by the service mentioned in it and that the label has not been altered.

PICS specifies three ways to distribute labels. The first is to embed labels in HTML documents, using the META element in the document header. The general format is<META http-equiv="PICS-Label" content='labellist'>. Other document formats could be similarly extended.

The second distribution method is for a client to ask an HTTP server to send labels along with the documents it requests. Figure 6 shows a sample interaction: the HTTP GET request includes an extra header line asking for labels and saying which service's labels should be sent back. The server includes two extra header lines in the response, one of which contains the labels.

Client sends to HTTP server www.greatdocs.com: 
GET foo.html HTTP/1.1
Accept-Protocol: {PICS-1.0 {params full {services "http://www.gcf.org/1.0/"}}}

Server responds to client: 
HTTP/1.1 200 OKDate: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: {PICS-1.0 {headers PICS-Label}}
PICS-Label: …label here…
Content-type: text/html
…contents of foo.html…

Figure 6: Method 2- requesting a document and associated label from an http server.

The third way to distribute labels is through a label bureau that dispenses only labels. A bureau can distribute labels created by one or more services. This separation of labels from content allows third-party labeling even when the publishers do not wish to distribute the labels: for example, the Simon Wiesenthal Center can label hate speech without the cooperation of neo-Nazi groups.

A label bureau is implemented as an HTTP server that accepts URL query strings in a special format. Suppose a label bureau is available at http://www.labels.org/Ratings. A client interested in a label for the document http://www.questionable.org/images would send the request shown in Figure 7 to the server at www.labels.org.

GET /Ratings?opt=generic&     
	u="http%3A%2F%2Fwww.questionable.org%2Fimages"&
   s="http%3A%2F%2Fwww.gcf.org%2Fv2.5"    
   HTTP/1.0

Figure 7: Method 3- requesting a label from a label bureau, separately from the document the label refer to. Note that inside a URL query string it is necessary to encode : as %3A and / as %2F.

Notes

1. This article has been accepted for publication in Communications of the ACM. Copyright c 1996 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.

2. PICS is an effort of the World Wide Web Consortium at MIT's Laboratory for Computer Science, drawing on the resources of a broad cross-section of the industry. Project history, a long list of supporting organizations, and details of the specifications may be found at http://w3.org/PICS.

3. Thanks to Netscape for providing Figures 1 and 2.

4. Figure 3 was generated in November of 1995, using prototype software written by MIT student Jason Thomas. Source code for this prototype, and other reference software are available from the PICS home page. Since then, RSAC has created a new rating system for the Internet (RSACi), which separates nudity and sex as separate dimensions. Several companies have announced products that, like this prototype, can read any PICS service description and generate a user interface with similar features to the one shown in Figure 3.