HTML Parser and Generator Implementations
This is a sort of "Family Tree" of HTML parser implementations,
annotated with notes on features and bugs.
I'm working on updating the HTML parser in our reference code.
See: A Lexical Analyzer for HTML and Basic
SGML.
See also: HTML Testing and Certification
- SGML.c in LibWWW
- The first HTML parser ever released was in the library/linemode
distribution back in '92 or so. It supported broken markup such as:
<xmp>... </foo> ... </xmp>
<a href=http://foo.bar/>...</a>
- NCSA Mosaic 2.4 -- didn't use CERN code, but was inspired by it.
- Spyglass Mosaic -- re-write of NCSA code
- Netscape -- re-implementation of NCSA code
- MS IE -- inspired by Netscape
- htmllib.py used in
grail
- Based on regexps. Guido wrote the first web spider, I believe.
This parser treats P, LI, DT, DD as empty elements. Nifty formatter
code.
- SGML Lexical Analyzer
-
Tools that Write HTML
- LaTeX2HTML
- Creates documents with missing quotes around the attribute values.
-
Connolly
$Id: implementations.html,v 1.1 2000/06/19 17:13:03 janet Exp $