© Copyright 1996 by ACM, Inc. (note 1) Appeared in Communications of the ACM, 1996, vol. 39(10), pp. 87-93.
Paul Resnick
AT&T Research
600 Mountain Avenue
Murray Hill, NJ 07974
presnick@research.att.com
James Miller
World Wide Web Consortium
MIT Laboratory for Computer Science
Room NE43-355
545 Technology Square
Cambridge, MA 02139
jmiller@mit.edu
With its recent explosive growth, the Internet now faces a problem inherent in all media that serve diverse audiences: not all materials are appropriate for every audience. Societies have tailored their responses to the characteristics of the media [1, 3]: in most countries, there are more restrictions on broadcasting than on the distribution of printed materials. Any rules about distribution, however, will be too restrictive from some perspectives, yet not restrictive enough from others. We can do better-we can meet diverse needs by controlling reception rather than distribution. In the TV industry, this realization has led to the V-chip, a system for blocking reception based on labels embedded in the broadcast stream.
On the Internet, we can do still better, with richer labels that reflect diverse viewpoints, and more flexible selection criteria. PICS (note 2), the Platform for Internet Content Selection, establishes Internet conventions for label formats and distribution methods, while dictating neither a labeling vocabulary nor who should pay attention to which labels. It is analogous to specifying where on a package a label should appear, and in what font it should be printed, without specifying what it should say.
The PICS conventions have caught on quickly. In early 1996, Microsoft, Netscape, SurfWatch, CyberPatrol, and other software vendors announced PICS-compatible products. AOL, AT&T WorldNet, CompuServe, and Prodigy provide free blocking software that will be PICS-compliant by the end of 1996. RSACi and SafeSurf are offering their particular labeling vocabularies through on-line servers that produce PICS-formatted labels. In May of 1996, CompuServe announced that it will label all web content it produces using PICS-formatted RSACi labels.
Not everyone needs to block reception of the same materials. Parents may not wish to expose their children to sexual or violent images. Businesses may want to prevent their employees from visiting recreational sites during hours of peak network usage. Governments may want to restrict reception of materials that are legal in other countries but not in their own. The "off" button (or disconnecting from the entire Net) is too crude: there should be some way to block only the inappropriate material. Appropriateness, however, is neither an objective nor a universal measure. It depends on at least three factors:
Computer software can implement access controls that take into account all these factors. The basic idea, illustrated in Figure 1, is to interpose selection software between the recipient and the on-line documents. The software checks labels to determine whether to permit access to particular materials. It may permit access for some users but not others, or at some times but not others.
Figure 1: selection software
automatically blocks access to some documents, but not others. Acknowledgment (note 3)
Prior to PICS there was no standard format for labels,
so companies that wished to provide access control had to both
develop the software and provide the labels. PICS provides a common
format for labels, so that any PICS-compliant selection software
can process any PICS-compliant label. A single site or document
may have many labels, provided by different organizations. Consumers
choose their selection software and their label sources (called
rating services) independently, as illustrated in Figure
2. This separation allows both markets to flourish: companies
that prefer to remain value-neutral can offer selection software
without providing any labels; values-oriented organizations, without
writing software, can create rating services that provide labels.
Figure 2: selection software
blocks based on labels provided by publishers and third-party
labeling services, and on selection criteria set by the parent.
PICS labels describe content on one or more dimensions. It is the selection software, not the labels themselves, that determine whether access will be permitted or prohibited. For example, if a rating service used the MPAA's movie-rating vocabulary, selection software might be configured to block an eight-year-old's access to PG-labeled documents, but to allow a fifteen-year-old's access to them. Parents can prohibit access to unlabeled documents, confining children to a zone known to be acceptable, or can allow access to any document that is not explicitly prohibited.
Each rating service can choose its own labeling vocabulary. For example, Yahoo labels might include a "coolness" dimension and a subject classification dimension.
Information publishers can self-label, just as manufacturers of children's toys currently label products with text such as, "Fun for ages 5 and up." Provided that publishers agree on a common labeling vocabulary, self-labeling is a simple mechanism well-matched to the distributed nature and high volume of information creation on the Internet.
When publishers are unwilling to participate, or can't be trusted to participate honestly, independent organizations can provide third-party labels. For example, the Simon Wiesenthal Center, which is concerned about Nazi propaganda and other hate speech, could label materials that are historically inaccurate or promote hate. Third-party labeling systems can also express features that are of concern to a limited audience. For example, a teacher might label a set of astronomical photographs and block access to everything else for the duration of a science lesson.
There are two PICS specification documents [6, 8]. The most important components are:
The accompanying sidebar illustrates these formats and protocols. Four technical features are worth highlighting.
First, the machine-readable service description is a resource that other computer programs can use for automatically generating interfaces that present the service to users. Consider the prototype shown in Figure 3, for configuring selection software. Here the parent is setting rules for what Johnny can visit, based on a rating service which has separate dimensions for language, nudity/sex, and violence. The parent drags the slider to indicate the maximum permitted value on the violence scale, noting the height of the thermometer and the text description (e.g., "Strong, vulgar language ") associated with each level on the scale. The software has taken the thermometer icons and text directly from the service description.
Figure 3: Prototype
software (note 4) draws on text and icons in the service description to
automatically generate a user interface for configuring selection rules.
Second, a rating service can provide variants of its service description tailored to different languages and cultures. The core elements remain the same, but the text and icons can be different. As a result, the service need not provide multiple versions of labels. The labels rely only on the common, core elements, but, using the variants of the service description, a single label can be displayed to different users in different languages.
Third, we have used URLs wherever universally distinct identifiers are required. For example, the identifier for a rating service is a URL. This has two advantages. First, a URL is a self-describing identifier, because it can be used to retrieve a descriptive document. Second, it leverages the Internet domain name registration system to permit decentralized choice of the identifier, while still guaranteeing distinctness from identifiers chosen by others.
Fourth, we specify that a response to a request for multiple labels must preserve the order of the request. If the server knows several alternative URLs that identify a single document, and the client asks for a label for one of those URLs, the server can send back a label for one of the alternates. The client can still match the label with its original request, from its position in the response, even though the document URLs do not match.
In general, PICS specifies only those technical issues that affect interoperability. It does not specify how selection software or rating services work, just how they work together.
PICS-compatible software can implement selective blocking in various ways. One possibility is to build it into the browser on each computer, as announced by Microsoft and Netscape. A second method-one used in products such as CyberPatrol and SurfWatch-is to perform this operation as part of each computer's network protocol stack. A third possibility is to perform the operation somewhere in the network, for example at a proxy server used in combination with a firewall. Each alternative affects efficiency, ease of use, and security. For example, a browser could include nice interface features such as graying out blocked links, but it would be fairly easy for a child to install a different browser and bypass the selective blocking. The network implementation may be the most secure, but could create a performance bottleneck if not implemented carefully.
PICS does not specify how parents or other supervisors set configuration rules. One possibility is to provide a configuration tool like that shown in Figure 3. Even that amount of configuration may be too complex, however. Another possibility is for organizations and on-line services to provide preconfigured sets of selection rules. For example, an on-line service might team up with UNICEF to offer "Internet for kids" and "Internet for teens" packages, containing not only preconfigured selection rules, but also a default home page provided by UNICEF.
Labels can be retrieved in various ways. Some clients might choose to request labels each time a user tries to access a document. Others might cache frequently requested labels or download a large set from a label bureau and keep a local database, to minimize delays while labels are retrieved.
PICS specifies very little about how to run a labeling service, beyond the format of the service description and the labels. Rating services must make the following choices:
New infrastructures are often used in unplanned ways, to meet latent needs. There will be many labeling vocabularies that are unrelated to access controls. The PICS specifications also plan for unplanned uses, by including extension mechanisms for adding new functionality. PICS is a new resource available to anyone who wishes to associate data with documents on the Internet, even documents that others control. Some of the promising applications include:
PICS provides a labeling infrastructure for the Internet. It is values-neutral: it can accommodate any set of labeling dimensions, and any criteria for assigning labels. Any PICS-compatible software can interpret labels from any source, because each source provides a machine-readable description of its labeling dimensions.
Around the world, governments are considering restrictions on on-line content. Since children differ, contexts of use differ, and values differ, blanket restrictions on distribution can never meet everyone's needs. Selection software can meet diverse needs, by blocking reception, and labels are the raw materials for implementing context-specific selection criteria. The availability of large quantities of labels will also lead to new sorting, searching, filtering, and organizing tools that help users surf the Internet more efficiently.
[1] J. Berman and D. Weitzner, "User Control: Renewing the Democratic Heart of the First Amendment in the Age of Interactive Media," Yale Law Journal, vol. 104, pp. 1619, 1995.
[2] D. Crocker, "RFC-822: Standard for the Format of ARPA Internet Text Messages," http://ds.internic.net/rfc/rfc822.txt, August 1982.
[3] I. de Sola Poole, Technologies of Freedom. Cambridge: MIT Press, 1983.
[4] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the ACM, vol. 35, pp. 61-70, 1992.
[5] W. Hill, L. Stead, and M. Rosenstein, "Recommending and Evaluating Choices in a Virtual Community of Use," Proceedings of CHI 95 Conference on Human Factors in Computing Systems, Denver: ACM. 194-201.
[6] T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax and Communication Protocols," World Wide Web Consortium http://w3.org/PICS/labels.html, May 5 1996.
[7] D. Maltz and K. Ehrlich, "Pointing the Way: Active Collaborative Filtering," Proceedings of CHI 95 Conference on Human Factors in Computing Systems, Denver: ACM. 202-209.
[8] J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating Systems (and Their Machine Readable Descriptions)," World Wide Web Consortium http://w3.org/PICS/services.html, May 5 1996.
[9] A. M. Odlyzko, "Tragic Loss or Good Riddance? The Impending Demise of Traditional Scholarly Journals," International Journal of Human-Computer Studies, vol. 42, pp. 71-122, 1995.
[10] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: An Open Architecture for Collaborative Filtering of Netnews," Proceedings of CSCW 94 Conference on Computer Supported Cooperative Work, New York: ACM. 175-186.
[11] M. Roscheisen, C. Mogensen, and T. Winograd, "A Platform for Third-Party Value-Added Information Providers: Architecture, Protocols, and Usage Examples," Stanford University CSDTR/DLTR (http://www-diglib.stanford.edu/diglib/pub/reports/commentor.html), November 1994, updated April 1995 1995.
[12] U. Shardanand and P. Maes, "Social Information
Filtering: Algorithms for Automating "Word of Mouth","
Proceedings of CHI 95 Conference of Human Factors in Computing
Systems, Denver: ACM. 210-217.
Figure 4 shows the description of a sample rating service, based on the MPAA's movie-rating scheme. The initial section includes general information about the service. The second section describes each of the dimensions, or categories, and the scales used for each. In this case, there is just a single category, with five possible values: G through NC-17. In actual labels, these values would be represented by the integers 0-4; the service description allows a software program to determine that a value of 1 corresponds to the PG rating and even to display the PG.gif icon to a user.
((PICS-version 1.1) (rating-system "http://MPAAscale.org/Ratings/Description/") (rating-service "http://MPAAscale.org/v1.0") (icon "icons/MPAAscale.gif") (name "The MPAA's Movie-rating Service") (description "A rating service based on the MPAA's movie-rating scale")
(category (transmit-as "r") (name "Rating") (label (name "G") (value 0) (icon "icons/G.gif")) (label (name "PG") (value 1) (icon "icons/PG.gif")) (label (name "PG-13") (value 2) (icon "icons/PG-13.gif")) (label (name "R") (value 3) (icon "icons/R.gif")) (label (name "NC-17") (value 4) (icon "icons/NC-17.gif"))))
Figure 4: A PICS-compatible description of a service that is based on the MPAA movie rating scheme.
Figure 5 shows a sample PICS label (actually a label
list containing just one label.) The URL on the first line, which
identifies the labeling service, makes it possible to redistribute
labels yet still identify their original sources. The label can
also include information about itself, such as the date on which
it was created, the date it will expire, that the label is associated
with a certain document (in this case, "http://www.gcf.org/stuff.html"),
and the label's author. The last line shows the attributes that
describe the document: a "language" value of 3; "sex"
2; and "violence" 0.
(PICS-1.1 "http://old.rsac.org/v1.0/" labels
on "1994.11.05T08:15-0500"
until "1995.12.31T23:59-0000"
for "http://www.gcf.org/stuff.html"
by "John Doe"
ratings (l 3 s 2 v 0))
Figure 5: A sample label list from the service described
in Figure 4.
Anything that can be named by a URL can be labeled, including documents that are accessed via FTP, gopher, or Netnews, as well as HTTP. PICS proposes a URL naming system for IRC, so that chat rooms with stable topics can be labeled.
Labels can include two optional security features (not shown in the example.) The first is a cryptographic hash of the labeled document, in the form of an MD5 message digest. This enables software to detect whether changes have been made to the document after the label was created. The second is a digital signature on the contents of the label itself, which allows software to verify that a label really was created by the service mentioned in it and that the label has not been altered.
PICS specifies three ways to distribute labels. The first is to embed labels in HTML documents, using the META element in the document header. The general format is<META http-equiv="PICS-Label" content='labellist'>. Other document formats could be similarly extended.
The second distribution method is for a client to ask an HTTP server to send labels along with the documents it requests. Figure 6 shows a sample interaction: the HTTP GET request includes an extra header line asking for labels and saying which service's labels should be sent back. The server includes two extra header lines in the response, one of which contains the labels.
Client sends to HTTP server www.greatdocs.com: GET foo.html HTTP/1.1 Accept-Protocol: {PICS-1.0 {params full {services "http://www.gcf.org/1.0/"}}}
Server responds to client: HTTP/1.1 200 OKDate: Thursday, 30-Jun-95 17:51:47 GMT MIME-version: 1.0 Last-modified: Thursday, 29-Jun-95 17:51:47 GMT Protocol: {PICS-1.0 {headers PICS-Label}} PICS-Label: label here Content-type: text/html contents of foo.html
Figure 6: Method 2- requesting a document and associated label from an http server.
The third way to distribute labels is through a label bureau that dispenses only labels. A bureau can distribute labels created by one or more services. This separation of labels from content allows third-party labeling even when the publishers do not wish to distribute the labels: for example, the Simon Wiesenthal Center can label hate speech without the cooperation of neo-Nazi groups.
A label bureau is implemented as an HTTP server that accepts URL query strings in a special format. Suppose a label bureau is available at http://www.labels.org/Ratings. A client interested in a label for the document http://www.questionable.org/images would send the request shown in Figure 7 to the server at www.labels.org.
GET /Ratings?opt=generic& u="http%3A%2F%2Fwww.questionable.org%2Fimages"& s="http%3A%2F%2Fwww.gcf.org%2Fv2.5" HTTP/1.0
Figure 7: Method 3- requesting a label from a label bureau, separately from the document the label refer to. Note that inside a URL query string it is necessary to encode : as %3A and / as %2F.
2. PICS is an effort of the World Wide Web Consortium at MIT's Laboratory for Computer Science, drawing on the resources of a broad cross-section of the industry. Project history, a long list of supporting organizations, and details of the specifications may be found at http://w3.org/PICS.
3. Thanks to Netscape for providing Figures 1 and 2.
4. Figure 3 was generated in November of 1995, using prototype software written by MIT student Jason Thomas. Source code for this prototype, and other reference software are available from the PICS home page. Since then, RSAC has created a new rating system for the Internet (RSACi), which separates nudity and sex as separate dimensions. Several companies have announced products that, like this prototype, can read any PICS service description and generate a user interface with similar features to the one shown in Figure 3.