This position paper discusses the API of the W3C Reference Library, a.k.a. "libwww". It introduces some of the basic concepts like streams, call-out functions, and plug-in modules. Libwww is freely available from the World-Wide Web Consortium's Web site together with documentation and example applications.
Most Web applications regardless of functionality share some commonalties such as protocol modules, transport interfaces, and other various low-level Internet related features. While many application programmers get around this by "reinventing the wheel" every time a new application is written, there is an obvious need for a basic Web API. The libwww was designed to provide such an API. In this paper we will discuss some of the experiences that we encountered doing so and what can be improved.
Libwww has been part of the World-Wide Web almost from the beginning. However, as the design criteria for Web applications in general have changed dramatically, the basic design of libwww has undergone several major revisions. The design ideas presented in this document are based on the most recent version 4.1 which is to be released in June 1996. The current libwww API was designed with the following goals for a generic Web API in mind:
The libwww API is a small, light-weight API based on a central registry called the core. The core provides a frame work for applications to register an open-ended set of modules that can provide the functionality and profile desired by that application. The core is itself divided into three layers described by each of their object:
By itself, the core is not capable of performing any Web related tasks, they are all provided through plug-ins and call-out functions registered by the application. This model enables libwww to be application neutral in that the application feature set or profile is provided by the application and not by libwww. In the next two sections the concept of plug-ins and call-out functions are described.
All data flow between the application layer and the transport layer is handled using streams. Streams are objects that accept sequences of characters. Streams do not require an output, but in most cases, they send data along after having performed a certain operation on the data. Examples can be to insert MIME headers, or strip out a HTTP response line. In case the output is itself a stream, stream objects can be cascaded into stream chains. As mentioned, Channel objects and Request objects both have two stream chains associated with them. The connection between the request streams and the channel streams is made using stream chains which can be setup at run-time using Converters.
The Converter stream class is sub-classed from the generic stream class. Converters are filters which can change the current representation (or media type) of a data object. Examples of conversions can be from one image format to another, or to "convert" an HTML document into presenting the document to the user in a widget. As Converters are in fact streams, there can be multiple Converters inserted into a single stream chain from the Request object to the Channel object, for example.
Input and output streams are responsible for reading and writing data to and from a transport, for example a BSD socket interface. By using a stream based interface to the transport layer, it is very easy to add special transport mechanisms, for example, using a multiplexed transport protocol. Also it gives a consistent interface for sending objects, as well as reading objects, which is a requisite for building interactive Web applications.
Plug-ins are modules that can be registered by the application at run-time. Plug-ins are an open-ended method for adding new functionality to the application. Characteristic for the evolution of libwww is that the set of features that are handled through plug-ins are constantly increasing. In version 4.1 of the Library, the categories of plug-ins include:
One of the main advantages of using plug-ins is that the feature set can change dynamically as required by the application. This allows the traditional boundaries between application types such as "clients" and "servers" to be broken down. In fact there is little difference between registering a server profile and a client feature set, or profile, or having an application change profile from a client to a server application at run-time.
Request call-out functions is another open-ended method for applications to add functionality to the core. An application registers a new feature simply by using a generic callback registration process. There are two main points where call-out functions are activated:
At each of these points the list of registered call-out functions is traversed and each of the call-out functions are called. The Library comes with a set of standard call-out functions that cover some often-used features like:
Request call-out functions can be registered as being local to a specific request, or as being global to all requests. This mechanism allows existing applications to be extended with little or no modifications, New features can be inserted by sub-modules by registering independent call-out routines to be handled by libwww. The latest example of how this mechanism can be used is the implementation of a PICS module, which is incorporated into any libwww client application by registering itself as call-out functions. Other functions like signature handling can be handed the same way.
Threads are in many situations a great advantage, but in general can only be regarded as reliable if they are native to a specific platform, or as an integral part of the programming language used. Unfortunately, as this is not the case in ANSI C, libwww has a model for handling pseudo-threads based on interleaved I/O. This requires that the I/O descriptor can be handled non-preemptively, which is the case for BSD sockets and WinSock socket descriptors. As real threads, pseudo-threads impose certain programming techniques to be applied on the application programmer. For example, pseudo-threads are single stack, single process entities, and all state dependent variables must be stored in a "thread" object. The result of this is that all streams and protocol modules must keep local state of where they are. Real threads do not require non-preemptive I/O and hence, much of the state information can be kept as part of the thread environment.
Most of the libwww API has been designed using a large set of iterations based on trial and error. The library has been around for a considerable amount of time and represents a significant knowledge base for designing Web APIs. As a drawback on the history behind libwww, it can be noted that as libwww is based on ANSI C, it can not take advantage of many of the features that are now available in modern programming languages. In order to prove that libwww provides a consistent API that can be used by multiple types of applications, a small set of example applications was developed representing the most typical Web applications.
In addition to this set we have much experience from real applications like the Arena browser and other Web GUI based clients. The Library is freely available from the World-Wide Web Consortium software distribution together with all the example applications as well as Arena.
A summarization of the lessons learned from developing the libwww API as it looks today is a follows: