Strawman W3C software architecture notes

There is no reason why a particular set of network protocol standards should imply any particular software architecture within the peer agents, until the mobility of code makes the distinction between remote operation and local interfaces arbitrary. However, for the purposes of making reference code for those protocols a sound architecture is necessary; and besides, there is call for the standardization of the APIs for their own sakes, for the mixing of software from different manuafacturers.

Current (1993-4-5) design

The W3C reference code is required not only to be modular, so that but also to be extensible easily by the addition of new code at build time or run time. The 1993 design of the library involved the notion of registering subclasses of certian given classes:

Objects for presenting documents of a given Content-Type to the user;
Format converters between different Content-Type values;
Protocol implementations for different URI scheme values;

In each case, a specific function (such as HTRegisterProtocol) is defined to allow the functions to be added; a separate table of registered objects is kept; and a small numbre of core subclasses were provided in the library. In each case at registration an entry point is passed to the new code, typically that of a creation routine for a c++-like object whose first element is an "isa" pointer to a jump table of method entry points for the new module.

This allows extra functionality to be added at runtime by code linked to the core library. It does not address the issues of dynamic loading, or of inter-process communication, so it was in practice only used at initialization time for code linked in by the application developer. In two cases, there was a separate provision made in totally separate ways for adding functionality outside the process. Proxy servers can implement new URI schemes, with registration using environment variables (etc) and communication through HTTP, and helper applications can present new content types, with registration through (on unix) a "mailcap" file and communication through shared files and command line arguments.

Next step

There is now a call for the registration of further types of extension:

Handlers for previously unknown rfc822 headers in HTTP or other messages;
In servers, objects corresponding to certain URIs as in the CGI interface;
Compression, encryption and payment algorithms;

and so on. These extensions can easily be expressed by the registration of subclasses of generic objects and handled in a similar way. However, rather than write specific code for each occasion, it seems reasonable to use a generic technique.

The need for CCI (Client-client interface) standards demonstrates that intra-process communication is not sufficient and inter-application links are needed. RPC techniques such as ILU, OLE2/DCE, etc clearly are designed to do this, and world presumably mesh well with a generic registration system.

The typical things you neeed to be able to do with a new subclass are

Register a statically linked module at initialization time;
Dynamically load (OS permitting), link and register a module;
Launch a new application, link it in as a module using IPC
Find libraries or applications on demand using some search algorithm;

These facilities are all typically used in some form already. We just have to generalize how W3C reference code uses them.

The important point, of course is the general architecture, and of secondary importance is the question of whether tools are used to generate the stubs or registration code.

Platform specifics

An advantage of a generic subclass registration system is that it can be mapped onto platform-specific facilities once per platform. It is reasonable to use local OS-specific conventions for IPC, dynamic linking, and program invokation.

Extension modules need callback interfaces, and these too we will have to map onto local IPC conventions.

It is not proposed that we reinvent any IPC wok which we can pick up and is sufficiently well-defined and open.

Specific Extension classes

The reference code consists then of two parts. The framework code contains the basic API, and the registration functions, and the functions that search for and invoke registered subclasses. The other part consists of a set of "kernel" modules which provide basic standard functionality and also provide examples for the creation of extension modules.

URI scheme

Function: Provides access to objects in a given name space.
Current: HTProtocol, HTRegisterProtocol, etc locally; Proxy server.
Parameters: URI scheme string

Local object access

Function: In server, provides access to specific parts of URI space
Current: Communication with theCommon Gateway Interface (CGI), registration in server configuration file.
Parameters: URI template

Note similarity with proxy

Format converter

Function: Converts from Content-Type a to Content-Type b
Current: HTConverter
Parameters: Contant-Type names a and b; quality factor

Presentation

Function: Renders an object for the user (converts from Content-Type a to dymmy type "www/present")
Current: HTConverter
Parameters: Content-type name a; quality factor

This is currently handled as a special case of a format converter as it simplifies the code. In fact in the noninteractive case, www/present simply represents any output format which is acceptable to the user.

Header handler

Function: Performs whatever handling is necessary for an rfc822-style header h
Current: none
Parameters: The header keyword h

A header handler needs a lot of call-backs to allow it to manipulate the body processing pipeline, change other headers, abort transactions, operate a sub-protocol over the same channel, etc. The design of these callback interfaces is non-trivial.

It is an open question as to whether the bulk of security and payment protocols can be grafted on in this way.

Hash algorithm

Function: Message digest algorithm
Current: none
Parameters: Algorithm name

Conclusions

Clearly the list above is extensible itself. In the security area

PK algorithm
Bulk symetric algorithm
Certificate verification algorithm

are the sorts of things which one might register.

A common parameter may be a "quality" factor giving some way of disambiguating a choice of apparently equivalent extensions, and which can be used in a negotiation process.

What is not apparent is whether a common selection algorithm can be used. It looks as though many subclasses will be registered using one identifying name, and a search for an exact match to that name is all that is required. The format conversion modules are currently unique in that they have two parameters, and so a more complex search is required to construct a stack of such modules given input and output formats. (See HTStreamStack which does not do the complete job).

I feel (June 1995) that getting the set of extension classes defined is an important early step in the design of the next phase. It will separate the tasks of writing framework and core modules, and will give users a good idea of what they are getting. -t

1995, TimBL