From friendly@hotspur.psych.yorku.ca Mon Mar 6 09:56:48 1995 Article: 21183 of comp.infosystems.www.providers From: friendly@hotspur.psych.yorku.ca (Michael Friendly) Subject: Waterlooo Script GML -> HTML translator Date: Wed, 1 Mar 95 21:59:53 MET Organization: York University, Ontario, Canada Below is a description of the revised version of a translator for Waterloo Script/GML to HTML, now about 80% complete and available via LISTSERV. It does not work for IBM Script and only works with GML-encoded tags, but does a reasonable job. If anyone improves it, please send the result to me at the VM1 address at the end of this note. Michael Friendly York University GMLHTML: A GML to HTML Translator for Waterloo Script/GML March 1, 1995 ----------------------------------------------------------------------- GMLHTML A number of schemes have been developed for translating various docu- ment formats to HTML. Most of these rely on parsing the source docu- ment using languages such as perl, awk, REXX, etc. For Waterloo Script/GML documents, I have seen and tried several translation schemes based on REXX and/or Xedit, and have not been satisfied with the results. It then occurred to me that one could use Script itself to perform the translation, by replacing the GML macros with equivalents which output HTML codes. A side benefit of this approach is that the resulting HTML file is automatically formatted as well. A limitation is that this approach can only deal with GML-encoded text; Script control words (e.g., .bd) cause the text to be formatted, but are not translated into HTML equi- valtents. How does it work? The GMLHTML package consists of the main file, GMLHTML SCRIPT, and a set of subsidiary files, HML$xxxx SCRIPT, which contain the new defini- tions of GML macros. A design-goal of this implementation was to allow the same Script source file to be translated to HTML or to be formatted normally. To do this in a general way, the GMLHTML SCRIPT file must be imbedded in the Script source file just before the :GDOC tag. HTML EXEC inserts the line .im GMLHTML ;.* inserted by HTMLPREP in the source file, and runs Script to produce a LISTING file, then removes the GMLHTML imbed line from the source file. (If you have TOUCH EXEC, the original time stamp of the source file is restored.) Finally, the LISTING file is post-processed through a REXX pipeline filter to remove ASA control characters, and correct a few awkward fea- tures of the translation process (such as page breaks), producing the HTML file. What does it do? GMLHTML produces HTML encodings for the following GML tags: * Headings, H1 - H6 are mapped to... . Heading levels H1 and H2 also generate an appropriate anchor, of the form for the heading. Appendices are handled similarly. Cross-references to headings (HREF tag) are not currently handled. * Lists: Ordered, unordered, and definition lists use the correspond- ing HTML tags <OL>, <UL>, and <DL>. GML glossary lists (GL) are treated like DLs; simple lists (SL) use the HTML <MENU> tag. Bibli- ographic lists (BL) are treated as unordered lists. You can change the assignments for BL and SL by modifying lines in GMLHTML SCRIPT. * Highlighted phrases, HP0- HP3 are mapped to the appropriate combina- tions of <I> and <B>. However, note that not all browsers treat <I><B> ... </B></I> cumulatively. The FONT= attribute, often used as :HP0 FONT=MONO is not treated specially, and disappears in the output. * Figures & tables are treated as pre-formatted text, using the <pre> tag. This works reasonably well for inline, textual display materi- al, but cannot, of course, handle material designed for paste-in using the DEPTH= attribute. Figures and tables generate an anchor, of the form <A NAME="Fig_xxx">, and FIGREF/TABREF tags generate the appropriate links (<A HREF="#Fig_xxx">Figure nn</A>). * Examples, XMP, are treated as pre-formatted text, using the <pre> tag. * Quotes: Q ... eQ is mapped using the entity '"' for the '"' character. Long quotes (LQ ... eLQ) use the <blockquote> tag. * Paragrphs, notes are mapped to <P> * Equations: Just a start. Display formulas (DF) are surrounded by HTML comments and treated as pre-formatted text. <!-- DF --><pre> y sub ijkl = mu sub ijk + epsilon sub ijkl , </pre><!-- eDF --> What appears inside is whatever the formula processor produces. If you use ">" and "<" instead of "gt" and "lt" inside formulas, these will be translated to the HTML entities, ">" and "<", respectively. Note that the characters "<" and ">" are reserved metacharacters in HTML, but are used for grouping in GML formulas (e.g., g sub <1 1>). The post-processing carried out by HTML EXEC translates the characters "<", "%" (thin space), and ">" inside dis- play formulas to blanks. * Title page: The FRONTM and TITLEP tags generate an HTML <HEAD> sec- tion, with a <TITLE> tag and HTML comments constructed from the AUTHOR and DATE tags. Beware: if your document does not contain a FRONTM section, the HTML document produced may confuse some brow- sers. What doesn't it do? GMLHTML does NOT currently handle the following GML tags: * H0 tag (should be mapped to H1) * Table of Contents (TOC) * Index (INDEX) * Footnotes (FN) * Endnotes (EN) * Inline formulas (F) * Graphic segments inlined with the Script .si control word. Availability The GMLHTML package may be obtained from LISTSERV@YORKVM1 (bitnet) or LISTSERV@VM1.YORKU.CA (internet) by sending sending a mail message con- taining the line: GET GMLHTML PACKAGE -- Michael Friendly Internet: friendly@vm1.yorku.ca Psychology Department NeXTmail: friendly@hotspur.psych.yorku.ca York University Voice: 416 736 5118 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA