A protocol is a language that is used between computers. Most
protocols are fairly simple, consisting of not much more than a
handful of commands and a description of the format for the returned
answers. For example, the NNTP protocol lists a number of commands
such as article, list,
and newgroups, and it says that every command
must be on a separate line and that the responses will be preceded by
a line with a 3-digit number. The Gopher protocol is even simpler. A
protocol is not meant to be used by humans, because it is designed to
be simple for computers, which is not necessarily simple for human
beings.
This chapter describes a number of protocols that are in use on the
Internet, in sofar as they are useful for the purposes of this report,
which is information retrieval and distributed services. Protocols at
a low level, that are not directly of importance to the subjects of
this report are left out.
One of the hypotheses underlying this report is, that more of the
protocols should be hidden, in favour of a single (or just a few) client programs that speak all
of them. Such clients should be organized by function, not by
protocol. Too many of the current client programs know just a single
protocol. ftp is built around the ftp protocol,
gopher around the gopher protocol. Even though
part of their functionality (fetching files) overlaps, the user still
has to choose a protocol, not a function.
- Usenet/NNTP
-
The main news service and public discussion forum, uses the NNTP
protocol.
- Mailing lists
-
Also called Listservers, after an often used program.
Discussions among a limited number of people. Makes use of E-mail.
- Gopher
-
An easy-to-use file retrieval program, based on hierarchical,
distributed menus. See also Veronica
- FTP
-
File Transfer Protocol, a protocol for copying files to and from
remote machines
- Archie
-
A database of locations for all files that are publicly available
through FTP. Uses the Prospero protocol.
- World Wide Web/HTTP
-
A distributed hypermedia system, uses the HTTP protocol.
- WAIS/Z39.50
-
A full text indexing system, works both stand-alone and over a
network, in the latter case it uses the Z39.50 protocol.
- E-mail
-
The electronic equivalent of the postal service. Several protocols
are in use (SMTP, UUCP, POP, etc.)
- Telnet, rlogin
-
Protocols that allows people to `log in' to
remote machines
- rcp, NFS, AFS
-
Rcp is `remote copy', a sort of one-shot FTP. NFS and AFS are systems
for `mounting' the file system of a remote machine as if it were a
local hard disk.
- Hyper-G
-
A distributed hypermedia system, that supports multiple navigation
models.
- DEC VTX
-
An early hypermedia system by Digital Equipment Corporation,
described as a `Videotex' system.
- Prospero
-
A `virtual file system'; offers multiple views of a distributed
file system.
Usenet is the collective name for a public discussion forum based on
the NNTP protocol. Newsreader
programs show a list of newsgroups and each newsgroup
contains articles. Old articles are automatically removed after a
certain period. There are over 2500 different newsgroups and the
number is growing every day.
Client programs include rn, xrn,
nn, tin (all on Unix and/or X)
trumpet (MS-DOS).
An article that is posted to Usenet quickly makes its way to all
connected computers around the world. Small computers only store a
subset of the articles or none at all. Large computers store all of
them, many Megabytes each day. People at small computers contact the
nearest larger computer via the NNTP protocol in order to read and
post articles. The same protocol is also used between the larger
computers when distributing articles.
NNTP is anonymous, which means that it doesn't care about the
identity of the client; no passwords are required.
Gopher is a networked information retrieval and publishing tool, based
on the concept of hierarchical menus. Basically, it supports three
types of items: documents (including images, sounds, etc.), menus (or
directories, containing links to other items), and services (called
`links', such as telnet or CSO
servers). Information as to where documents or menus are stored
remains hidden from the user, giving the impression that `Gopher
space' is a single, extremely large system.
Gopher, unlike WWW, doesn't distinguish between the system, the
protocol and the format of the data. It also stresses the simplicity
of the client programs (the Gopher browsers), requiring all added
intelligence to be added to the servers instead. This allowed a rapid
spread of Gopher over the world, but seems rather inflexible in regard
of future enhancements.
- Types of data and services and their representation in Gopher
menus and Gopher viewers.
Note that `local' here means `local to the server', in other
words, the table is created from the viewpoint of the information
publisher.
The item type is what is indicated on the screen, next to
the item's name. The information source is of interest only
to the server that must provide the information. The output
format is used by the client program to select the right kind
of viewer.
Item type Information source Output format
Document Local file, Text (ASCII),
Compressed local file, GIF, Image,
Remote (other Gopher), Sound, MPEG,
FTP'ed file, Binary,
Output of a program MIME-encoded
Menu Local menu file, (implied)
Compressed local file,
Remote (other Gopher),
FTP directory,
Segmented (mail) file,
Search for files (grep),
Search for files (WAIS),
Search for menu items,
Search segmented file,
Output of a program
Form Local form file + Text (ASCII),
local program GIF, Image
Sound, MPEG,
Binary,
MIME-encoded,
Menu
Service Telnet, Tn3270, CSO (implied)
The inventors of Gopher were inconsistent when they implemented the
system. Based on what different Gopher implementations provide (in
particular the Minnesota `original' gopherd and John Franks'
gn), the following could be a breakdown of menu-items vs
information sources vs output formats (see table).
Not all combinations are currently implemented and a few additional
output formats are defined, though it seems better to reduce the
number of formats defined by Gopher itself and instead rely on the
MIME standard to
encode the contents. In the table above that advise has been followed,
even though it remains to be seen if the Gopher community will
actually go in that direction.
The latest version of the Gopher protocol is called Gopher+ and it
includes facilities for automatic negotiations between server and
client to determine the best format for some piece of information and
an extension for interactive forms.
Veronica is a database of Gopher items (an item is a title plus a
pointer to a document or menu). It is updated daily. There are at the
moment four such databases, in different parts of the world. The
database is accessible through Gopher. It accepts queries for keywords
and responds with a Gopher menu consisting of all matching titles.
Veronica stores all titles that appear in Gopher menus anywhere in the
world, but they are stored without their context. A query for a
particular keyword returns a list of matching titles, but removed from
their context, the titles may be rather uninformative. E.g., a title
that consists of just the word `Europe' might have been meaningful in
its original menu, but taken out of context and stored in a Veronica
database, there is very little indication of what the document
entitled `Europe' actually contains.
Still, despite its limitations, Veronica is a very useful tool when
looking for information in `Gopher-space.'
FTP is a protocol for file management on a remote machine. It has
commands for copying files to and from remote machines and for
renaming and deleting files. It protects access by username &
password combinations, but it is mostly used in the form of `anonymous
ftp', which means that the username `anonymous' is recognized, with
any password. Internet etiquette demands that people that make contact
as `anonymous' provide their E-mail address as password, so that the
maintainers of the ftp site can more easily see who has used the
facilities.
Just like Veronica is an index into Gopher, Archie is an index into
anonymous FTP. The Archie database stores filenames from a large
number of anonymous FTP archives. The database can be queried with
partial filenames or regular expressions and it will return a list of
matching filenames together with the addresses where they can be found.
Although the Archie databases (there are about twenty of them around
the world) are not updated as frequently as Veronica, they are great
for finding the latest versions or nearest copies of software or
documents. Of course, Archie suffers from the same problem as
Veronica, and that is that the filenames do not convey much
information about the contents of a file, but in Archie the context is
shown in the form of a directory path.
Archie is an application of the Prospero protocol, but for people
without Prospero clients, there is also the possibility to log into a
machine running an Archie database and give commands inside a
restricted shell.
WWW is a distributed hypermedia system. It defines both a protocol,
HTTP (HyperText Transfer Protocol), and a hypertext file format,
HTML (HyperText Markup Language). Many machines around the world act
as WWW servers, meaning that they have a collection of hyper-documents
that they will transfer on request. Each document has a unique name, a
so-called Universal Resource Locator (URL). Inside
a document there may be references to other documents, also in the
form of URL's. The client program that a user runs on his own machine
knows how to contact these servers and how to obtain a document, given
its URL.
Documents need not be text. They can be single-media or multi-media.
Hyperlinks are possible in text and in pictures; not yet in time-based
media, such as sound and movies.
The client determines the range of formats that it recognizes. Some
clients are smarter than others. A typical list of supported formats
is: formatted and unformatted text, PostScript, images in various
formats, sound in various formats, and animations in MPEG format.
Typically, WWW clients know a number of protocols, such as FTP,
Gopher, NNTP and, of course, HTTP. (The most recent definition
of the protocol can be found in Geneva.)
A list of WWW
clients is also available from CERN in Geneva.
If someone wants use WWW to publish his own work, he (or rather his
system operator) will need to set up a WWW server. At least on a Unix
system that is not difficult to do. The details are
also available from CERN.
WWW supports format negotiations between server and client for case
when information is available in several formats. Interactive forms
are possible with a range of buttons, radio-buttons, and input
fields.
Authentication is possible in three different ways (as of October
1993): through username & password, Kerberos, and Internet address
masks.
VTX is a product of computer manufacturer DEC (Digital Equipment
Corporation). It is described as a `videotext' system. Information is
structured as a tree, with each subject collected in `stories' of
several `pages'. Each page is either a menu, a query form, or an
information page. Pages can indicate that they need an external
application to be displayed properly. A story can be distributed over
several machines, without the user being aware of that.
The VTX databases must be kept on DEC VAX machines, but client
programs (readers) are available for other computers as well.
In contrast to the other systems described in this chapter VTX is a
commercial system, which means that license fees have to be paid for
every VTX server and every reader program on another computer.
The KUB-gids project of the University of Brabant (Tilburg) uses DEC
VTX as the basis for the KUB-gids CWIS. KUB-gids cannot be accessed
from the outside, except by logging in to one of the university's
machines. (Log in as `kubgids' on machine kubgids.kub.nl.)
Some information on DEC VTX can be found in an article in Byte or in a flyer from
DEC that is available
by FTP.
WAIS is a full-text indexing system, that can work both locally and
over a network. A collection of files together forms a database and
WAIS is used to create an extensive index into this database,
usually every word of every file is indexed. A WAIS server processes
incoming queries, consisting of a number of keywords, and returns a
list of matches. The server can also return the full text of a file in
response to a query. Instead of keywords, a query can also refer to a
document, which is interpreted as a query for other documents that are
`similar to' the indicated one.
WAIS uses a scoring mechanism to determine how `similar' a document
is to a set of keywords or to another document. The server computes a
number between 0 and 1000, with 1000 being assigned to the best match.
The computation is based on the number of times a keyword occurs and
how many of the keywords occur.
The WAIS protocol is a subset of the ANSI standard Z39.50
protocol, which was developed especially for queries to bibliographic
databases, such as library catalogues.
E-mail is used for communication between two people. It is similar to
the normal mail, but much faster. One person writes a letter and sends
it to an address. An address is usually of the form user@machine, where machine can contains several parts
separated with dots, such as let.rug.nl for the
Faculty of Arts (let) of the University of
Groningen (rug) in the Netherlands (nl).
The message is copied to the recipient's machine and stored in a file
known as the `mailbox.' E-mail programs, so-called Mail User Agents
(MUA's), provide several commands to help with composing a letter,
remembering addresses, and replying to mail.
There are several protocols in use between machines to exchange mail.
A few of the more common ones are UUCP, SMTP, and POP. Since E-mail
already has a long history, there are several gateways in operation, so that it
is possible to send mail to a user on a machine that uses a different
protocol.
Apart from the protocols, there are also standards that describe how
the contents of a letter must be coded when it contains something
other than plain text. Such a standard is MIME. The encoding and decoding is
normally handled by the MUA, but not all E-mail readers can handle
MIME yet.
A listserver is a program that continually watches for incoming mail
on a certain mailbox and forwards any message to a list of other
addresses. Such a mailing list can bring together people with a common
interest. There are hundreds of mailing lists, each for a different
subject.
Most listservers can automatically handle requests to subscribe or
unsubscribe. Many also keep an archive of past discussions.
Subscribers can send special messages to a separate address to
retrieve files from the archive or to get other information about the
mailing list (such as who are on it.)
Unfortunately, mail readers (MUA's) have no special support for
mailing lists. They handle messages from such lists the same as
messages coming from a normal address. The proposal includes some comments
about possible enhancements to mail readers.
Telnet and rlogin are programs that let some computer act as a
terminal for another computer. They tell the remote computer that it
now has an extra terminal and from then on they simply copy whatever
the user types on the local machine to the remote machine, the output
from the remote machine is similarly copied to the display of the
local machine.
They work in a similar manner as the popular communications software
for PC's, that works over modem lines, except that an Internet
connection is used instead of a modem (dial-up) line.
The underlying protocol is called TCP/IP. It is a low-level protocol,
in the sense that it doesn't deal with the meaning of the transferred
data (characters), but only with ensuring that it is transferred
without errors. Protocols such as Gopher and HTTP work on a higher
level: they rely on TCP/IP for transferring data error-free, but they
also interpret the data in some way, to determine what to do with it
Rcp is a program to copy files from one machine to another. It is a
bit like ftp, except that it is much simpler.
NFS and AFS are systems that make a connection to a remote machine and
then present the filesystem of the remote machine as if it was a disk
hanging directly off the local computer. Once NFS or AFS is started,
remote files and directories are indistinguishable from local ones.
Hyper-G is a large, networked hypermedia system, being developed at
Graz University of Technology (Austria). Like WWW it uses an
SGML-like notation for hypertext, but unlike WWW it stores all link
information in a central database, instead of in the documents
themselves.
Hyper-G supports different types of organization: not only hypertext,
but also hierarchical. Keyword searches are also available. There are
several levels of user authentication. Hyper-G also automatically
selects a document in the user's language, if translated versions are
available. Users can attach annotations to all documents; the
annotations are visible to other users.
Compared to WWW, Hyper-G has more similarities than differences, and
the latest developments in both suggest that the developers have
similar ideas. There is a gateway from the Hyper-G system in Graz to
WWW. Hyper-G is not as portable as WWW, and there are only two sites
running the system. But it is an interesting system, that still
includes some ideas that have not yet shown up in WWW. On the other
hand, the proposed new standard for WWW includes things that are not
available in Hyper-G, such as interactive forms.
Prospero is system
for presenting `virtual directories.' The directories contain files
and other directories, which may actually be located on different
machines. Prospero automatically uses Ftp (and possibly other
protocols, including its own) to retrieve files transparently.
The directories can be composed by every user to his own taste. It is
also possible to make personal directories available to others, which
gives people a way of publishing information or offering information
services. Publishing directories requires the installation of a
Prospero server, in the same manner as for Gopher, WWW, etc.
Directories can also be created by so-called filters, programs that
apply certain criteria to a set of files & directories in order to
distribute them over directories. These automatic directories are
dynamically updated, whenever the original files or directories change.
With suitable presentation software, it is possible to make the
virtual directories look like normal Unix-style directories, but also
like Gopher menus, or even WWW documents.
The best-known application of Prospero is in the Archie software database. Archie queries are in
fact filters, and the result is a virtual directory containing all
files and directories that match the query.