Henrik Frystyk Nielsen
World-Wide Web Consortium, MIT/LCS,
@(#) $Id: Position.html,v 1.2 1998/05/14 02:10:08 frystyk Exp $

W3C Reference Library

Abstract

This position paper discusses the API of the W3C Reference Library, a.k.a. "libwww". It introduces some of the basic concepts like streams, call-out functions, and plug-in modules. Libwww is freely available from the World-Wide Web Consortium's Web site together with documentation and example applications.

Introduction

Most Web applications regardless of functionality share some commonalties such as protocol modules, transport interfaces, and other various low-level Internet related features. While many application programmers get around this by "reinventing the wheel" every time a new application is written, there is an obvious need for a basic Web API. The libwww was designed to provide such an API. In this paper we will discuss some of the experiences that we encountered doing so and what can be improved.

Libwww has been part of the World-Wide Web almost from the beginning. However, as the design criteria for Web applications in general have changed dramatically, the basic design of libwww has undergone several major revisions. The design ideas presented in this document are based on the most recent version 4.1 which is to be released in June 1996. The current libwww API was designed with the following goals for a generic Web API in mind:

Light-weight:: The API should be a platform independent, medium level API with support for an open-ended feature set rather than directly providing a full-fleshed feature set by itself.
Application independent:: The API should impose no restrictions on the type of application using it. It should be usable by all types of applications such as servers, clients, robots, and proxies. To borrow a term from the X world: It should provide a set of mechanisms for accessing the Web without imposing a special policy on how to do it.
Layered:: The API should allow for easy integration with other APIs both below and above the libwww itself in complexity and level of abstraction.

The libwww Core

The libwww API is a small, light-weight API based on a central registry called the core. The core provides a frame work for applications to register an open-ended set of modules that can provide the functionality and profile desired by that application. The core is itself divided into three layers described by each of their object:

Request Object: The Request object represents a request issued by the application. All requests are associated with a URL representing the resource on which an operation is to be performed. In most cases, a libwww request results in some kind of network activity handled by a protocol module. All protocol modules are application plug-ins so there is no limit to the type of request libwww can perform. Each Request object has an input and an output stream associated with them that can accept data from and return data to the application respectively.
Net Object: The Net object represents a connection, for example to the Internet or the local file system. Depending on the URL and the method specified by the request, there can be multiple Net objects per request. As connections are directly associated with system resources, the number of active Net objects is limited by the core to a maximum number specified by the application. Also, in handling Net objects, a considerable effort is made to maximize use of persistent connections in that multiple requests to the same remote host are serialized where possible.
Channel Object: Each open socket and file descriptor is associated with a Channel object. The Channel object is associated with an input stream and an output stream that are capable of reading and writing data to a transport respectively. The channel streams are connected to the Request object streams either directly or via stream chains as described later.

By itself, the core is not capable of performing any Web related tasks, they are all provided through plug-ins and call-out functions registered by the application. This model enables libwww to be application neutral in that the application feature set or profile is provided by the application and not by libwww. In the next two sections the concept of plug-ins and call-out functions are described.

Stream Objects

All data flow between the application layer and the transport layer is handled using streams. Streams are objects that accept sequences of characters. Streams do not require an output, but in most cases, they send data along after having performed a certain operation on the data. Examples can be to insert MIME headers, or strip out a HTTP response line. In case the output is itself a stream, stream objects can be cascaded into stream chains. As mentioned, Channel objects and Request objects both have two stream chains associated with them. The connection between the request streams and the channel streams is made using stream chains which can be setup at run-time using Converters.

The Converter stream class is sub-classed from the generic stream class. Converters are filters which can change the current representation (or media type) of a data object. Examples of conversions can be from one image format to another, or to "convert" an HTML document into presenting the document to the user in a widget. As Converters are in fact streams, there can be multiple Converters inserted into a single stream chain from the Request object to the Channel object, for example.

Input and output streams are responsible for reading and writing data to and from a transport, for example a BSD socket interface. By using a stream based interface to the transport layer, it is very easy to add special transport mechanisms, for example, using a multiplexed transport protocol. Also it gives a consistent interface for sending objects, as well as reading objects, which is a requisite for building interactive Web applications.

Plug-in Modules

Plug-ins are modules that can be registered by the application at run-time. Plug-ins are an open-ended method for adding new functionality to the application. Characteristic for the evolution of libwww is that the set of features that are handled through plug-ins are constantly increasing. In version 4.1 of the Library, the categories of plug-ins include:

Client and server side protocol modules
Low-level protocol transport modules
User dependent modules
Data format handlers

One of the main advantages of using plug-ins is that the feature set can change dynamically as required by the application. This allows the traditional boundaries between application types such as "clients" and "servers" to be broken down. In fact there is little difference between registering a server profile and a client feature set, or profile, or having an application change profile from a client to a server application at run-time.

Request Call-out Functions

Request call-out functions is another open-ended method for applications to add functionality to the core. An application registers a new feature simply by using a generic callback registration process. There are two main points where call-out functions are activated:

Before a request is handed to the protocol module
After the protocol module has terminated

At each of these points the list of registered call-out functions is traversed and each of the call-out functions are called. The Library comes with a set of standard call-out functions that cover some often-used features like:

Cache validation
Rule file matching
Proxying requests
Logging
History List
...

Request call-out functions can be registered as being local to a specific request, or as being global to all requests. This mechanism allows existing applications to be extended with little or no modifications, New features can be inserted by sub-modules by registering independent call-out routines to be handled by libwww. The latest example of how this mechanism can be used is the implementation of a PICS module, which is incorporated into any libwww client application by registering itself as call-out functions. Other functions like signature handling can be handed the same way.

Threads and Pseudo Threads

Threads are in many situations a great advantage, but in general can only be regarded as reliable if they are native to a specific platform, or as an integral part of the programming language used. Unfortunately, as this is not the case in ANSI C, libwww has a model for handling pseudo-threads based on interleaved I/O. This requires that the I/O descriptor can be handled non-preemptively, which is the case for BSD sockets and WinSock socket descriptors. As real threads, pseudo-threads impose certain programming techniques to be applied on the application programmer. For example, pseudo-threads are single stack, single process entities, and all state dependent variables must be stored in a "thread" object. The result of this is that all streams and protocol modules must keep local state of where they are. Real threads do not require non-preemptive I/O and hence, much of the state information can be kept as part of the thread environment.

Implementation Experience

Most of the libwww API has been designed using a large set of iterations based on trial and error. The library has been around for a considerable amount of time and represents a significant knowledge base for designing Web APIs. As a drawback on the history behind libwww, it can be noted that as libwww is based on ANSI C, it can not take advantage of many of the features that are now available in modern programming languages. In order to prove that libwww provides a consistent API that can be used by multiple types of applications, a small set of example applications was developed representing the most typical Web applications.

Command Line Tool: This application which shows how to use libwww for building simple batch mode tools for accessing the Web. The tool supports HTTP, FTP, Gopher, NNTP, Telnet, and WAIS. The HTTP support is consistent with the HTTP/1.0 specification including the methods PUT, POST, and DELETE.
Mini Robot: A simple application which shows how to use libwww for building robots. The robot has no constraint model but uses pseudo-threads and interleaved I/O which allows for a large number of outstanding requests. The robot supports HTTP, NNTP, FTP and Gopher using either the GET or the HEAD method.
Mini Server: A small application showing how to implement a server or a proxy using libwww. The Mini Server also uses pseudo-threads and interleaved I/O which makes it highly portable and very fast. The server does only support GET.

In addition to this set we have much experience from real applications like the Arena browser and other Web GUI based clients. The Library is freely available from the World-Wide Web Consortium software distribution together with all the example applications as well as Arena.

Lessons Learned

A summarization of the lessons learned from developing the libwww API as it looks today is a follows:

APIs must be layered: No single API can provide the flexibility required to support different types of applications. Medium level APIs can provide cross application functionality, and high level APIs can provide support for specialized applications.
APIs must support a dynamic, open-ended set of features: No API, regardless of the complexity, should impose a limit on adding functionality. The experience from developing libwww shows that no feature can in fact be considered essential enough that it should not be dynamically replaceable. The core registry mechanism in libwww is a step in that direction but does still impose a set of assumptions on what is considered "essential".
APIs must be thread safe: As native kernel threads become increasingly common and programming languages start supporting threads, more and more applications will take advantage of threads and the flexibility they provide. This means that APIs need not only to be thread aware but must actively support threads. Currently, libwww, is thread aware via its pseudo thread model, but it is does not have full thread support.
Formalized APIs are required: In practice, most APIs depend on their immediate environment such as the features provided by a specific programming language. Examples of features that do have a major impact on API design are garbage collection, class inheritance and character sets. Better tools for describing API in a language independent, formalized fashion, are required in order to supply truly language independent, interoperable APIs.

Henrik Frystyk Nielsen, frystyk@w3.org