W3C libwww Using

Presentation Modules

The HTML parser has three different levels of APIs in order to make the implementation as flexible as possible. Depending on which API is used by the application, the output can be a stream, a structured stream or a set of callback functions as indicated in the figure below:

HTMLParser

The default HTML parser in libwww is very simple. You can look at the Amaya browser/editor for a complete structured parser.

SGML Stream Interface
This interface provides the most basic API consisting of the output from a stream without any form for structure imposed on the data. The internal SGML parser parses the data sequence, identifies SGML markup tags, and passes the information on the the HTML parser. However, if the application has its own SGML parser and HTML parser, the internal parsers can be disabled by removing the internal HTML converter called HTMLPresent() used to present a graphic object on the screen from both the global and the local list of converters and presenters.
HTML Structured Stream Interface
If the application has its own HTML parser that understands the structured output from the internal SGML parser then the second API can be used. The current HTML parser in the Library is very basic and does not understand many of the new features in HTML 2 and 3.
HText Call Back Interface
The last API can be in case the application prefers to use the internal HTML parser and only wants to provide a platform dependent definition of the callback functions defined in the HText module. Now, the parsing is all done internally in the Library and the application is only called with segments of fully parsed HTML. The callback functions are all defined as prototypes in the HText module but the client must provide the actual code that defines the presentation method used for a specific HTML tag.

Due to the limited functionality of the internal HTML parsing module, many applications have chosen to implement their own HTML parser. Therefore many regard the HTML parser module as being an application specific module instead of a dynamic module. This will be alleviated in the next version of the Library, which hopefully will ease the use of the internal HTML parser.

DocumentationRegistrering the HTML Parser


Henrik Frystyk Nielsen, libwww@w3.org,
@(#) $Id: HTML.html,v 1.12 1996/12/09 03:23:54 jigsaw Exp $