Presentation Modules
The HTML parser has three different levels of APIs in order to make the
implementation as flexible as possible. Depending on which API is used by
the application, the output can be a stream, a structured stream or a set
of callback functions as indicated in the figure below:
The default HTML parser in libwww is very simple. You can look at the
Amaya browser/editor for a complete structured
parser.
-
SGML Stream Interface
-
This interface provides the most basic API consisting of the output from
a stream without any form for structure imposed on the data. The internal
SGML parser parses the data sequence,
identifies SGML markup tags, and passes the information on the the
HTML parser. However, if the application
has its own SGML parser and HTML parser, the internal parsers can be disabled
by removing the internal HTML converter called
HTMLPresent()
used to present a graphic object on the screen from both the global and the
local list of converters and presenters.
-
HTML Structured Stream Interface
-
If the application has its own HTML parser that understands the structured
output from the internal SGML parser then the second API can be used. The
current HTML parser in the Library is very basic and does not understand
many of the new features in HTML 2 and 3.
-
HText Call Back Interface
-
The last API can be in case the application prefers to use the internal HTML
parser and only wants to provide a platform dependent definition of the callback
functions defined in the HText module.
Now, the parsing is all done internally in the Library and the application
is only called with segments of fully parsed HTML. The callback functions
are all defined as prototypes in the HText
module but the client must provide the actual code that defines the
presentation method used for a specific HTML tag.
Due to the limited functionality of the internal HTML parsing module, many
applications have chosen to implement their own HTML parser. Therefore many
regard the HTML parser module as being an application specific module
instead of a dynamic module. This will be alleviated in the next
version of the Library, which hopefully will ease the use of the internal
HTML parser.
Registrering the HTML
Parser
Henrik Frystyk Nielsen,
libwww@w3.org,
@(#) $Id: HTML.html,v 1.12 1996/12/09 03:23:54 jigsaw Exp $