XML extensions to HTML

This note presents some ideas on the requirements and potential solutions for embedding XML within HTML documents. It was prepared as briefing material for the XML in HTML meeting in February 1998.

HTML needs an extension mechanism

HTML needs an extension mechanism, and XML provides a solution together with a few conventions for declaring XML elements and binding them to software for processing them.

Ease of authoring should be paramount!

For simple extensions, the effort required of authors should be minimal. The trade off should not be biased towards ease of processing -- computers are cheap, people are not.

HTML pages with XML extensions should be viewable in older browsers

Authors can in principle provide two versions of their pages and have the server figure out which version to send for each request: one for older browsers and one for new.

In practice, this is rather tedious and it will be preferable to use the same version for all browsers. There are several ways of approaching this, as will be explained below.

Recognizing an extension when you see one

The simplest extension is an empty element, for instance:

    <simple/>

Of course this could have some parameters:

    <simple param1="value1" param2="value2"/>

HTML parsers could use a special attribute to recognize extension elements, for instance, we could use the attribute xml-extension, e.g.

    <simple xml-extension="http://www.sonix.com/widget"/>

where the value of this attribute is an URL that serves to bind the element to the software needed to process it. This approach rapidly becomes tedious and wasteful when a given element needs to be repeated many times on the same page.

Declaring extensions in advance

The XML name space draft suggests a related possibility where the URL is declared in advance together with a local name that can be used as a prefix for a variety of elements covered by the name space, for instance, the following example imagines a set of widgets for building interfaces to an audio mixing desk:

<?namespace href="http://www.sonix.com/widgets" as="widgets"?>

  ... some time later on in the document ...

    <widgets:slider id="volume"\>
    <widgets:gauge id="volume-gauge"\>

But <?instructions?> are visible in many older browsers

This line includes a processing instruction here: which shows up in Netscape 3.0 but not in Netscape 4.0 and other newer browsers such as Opera.

This problem can be avoided by using a dedicated element for declaring extensions. This element will have to be added to HTML. Here is a possibility:

<declare href="http://www.sonix.com/widgets" as="widgets">

  ... some time later on in the document ...

    <widgets:slider id="volume"\>
    <widgets:gauge id="volume-gauge"\>

The reason why I switch names will become clearer later on this note.

But why bother with the prefix at all? Why not just list the extension tags in the declaration? For instance:

<declare href="http://www.sonix.com/widgets" 
    elements="slider gauge light">

  ... some time later on in the document ...

    <slider id="volume"\>
    <gauge id="volume-gauge"\>

This works nicely except when you want to include two different extensions which use the same tag name. A simple solution would be to support both the as and elements attributes. If there is a clash, the author uses the prefix notation otherwise he or she just lists the element names with the elements attribute.

To those of you who argue that this complicates writing processing softare: Get Real! Go talk to non-technical authors and get their reactions! It only takes a few lines of code to support both mechanisms, and avoids the need for clunky prefixes in both start and end tags.

Ask folks which they find easiest between the following:

    <apply>
       <plus/>
       <apply>
          <times/>
          <ci>a</ci>
          <ci>x</ci>
       </apply>
       <ci>b</ci>
    </apply>

and

    <mathml:apply>
       <mathml:plus/>
       <mathml:apply>
          <mathml:times/>
          <mathml:ci>a</mathml:ci>
          <mathml:ci>x</mathml:ci>
       <mathml:/apply>
       <mathml:ci>b</mathml:ci>
    </mathml:apply>

which would you rather see?

Providing a fallback

In some cases, it will be fine for the extensions to be invisible when pages are viewed with older browsers. In such cases, the extensions add value, but are not indispensable.

If a fall back is needed, this can be achieved by incorporating text or HTML markup within the XML elements. A simple example is an extension for mathematical equations:

    <f>ax^2+bx+c=0</f>

where the content of the element provides an acceptable fall back when using an older browser, or when support for processing the element is unavailable (the plugin isn't installed).

Textual content that should be invisible in older browsers can be hidden in several ways depending on how far back you want to go. The vast majority of deployed browsers will hide contents of the SCRIPT element, for example:

    <script language=xml>
        <foo> ... hidden contents ... </foo>
    </script>

I have inserted this markup after this paragraph so you can test my assertion! It is hidden on both Netscape 3.0 and 4.0 and also on Opera.

Did you see anything? For the tiny fraction of browsers that do show the contents, you can use the traditional comment hack:

    <script language=xml>
      <!--
        <foo> ... hidden contents ... </foo>
      -->
    </script>

Perhaps we can add some attributes to the HTML script element? For instance the equivalent of Netscape's pluginurl attribute that points at the plugin file to download as a jar file.

Binding extensions to Code

Some scenarios:

You have installed a cool plugin and now want to make use of it in your html pages. How do you get your browser to invoke the right plugin?
You have written a cool extension in Java. How do you get the browser to download and run this code when it sees your extension?
You want to use CSS to style the XML tags you have embedded in an HTML page. How do you get the browser to do this?

These scenarios can supported by adding suitable attributes to the element used to declare extensions. An obvious attribute is type which gives the Content type associated with the resource needed to process the element. For example:

    <declare element="math" type="application/mathml">

    <math>
        <apply>
           <plus/>
           <apply>
              <times/>
              <ci>a</ci>
              <ci>x</ci>
           </apply>
           <ci>b</ci>
        </apply>
    </math>

In this example, the browser consults its registry to find the plugin for the content type "application/mathml".

Say you want to point at the Java code to interpret an element, how would you do that?

You will want to give the Java class name, and the URL for a Jar file that contains it and other resources needed for this application. We could borrow attributes from the HTML object or applet elements. Another possibility is to bind to an object element which declares the details of how to process the xml extension. For instance:

    <object declare
            id="view3d"
            codetype="application/java"
            classid="Graph3D" 
            archive="wow3d.jar"
            width="300" height="200">
      Java applet that presents xml data in 3D.
    </object>


    <declare element="data" code="#view3d">

    <data>
      ... financial data
    </data>

Here I introduced the code attribute to reference the code needed to interpret the extension element "data". It would be natural to also allow codetype to specify the content type for the code referenced by the code attribute. This will be useful when code references the implementation directly.

My final example shows how you can use CSS to present an extension written in XML:

    <style type=text/css>
      warning { display: block; color: red }
    </style>

    <declare element="warning">
    ...

    <warning>
       This is a grade A warning!
    </warning>

The style element in the document head defines the style properties to be applied to the warning element. The declare element is used to let the parser know that warning is an xml element and should be parsed accordingly. Without the declaration, the parser will simply discard the start and end tags.

Summary

The HTML parser needs to be told that an element is an xml element. This can be done with a declaration appearing in advance of the first occurrence of the element. The parsing of elements declared as xml elements then follows xml rules. You don't need to declare elements that appear within the contents of declared elements.

In conclusion I propose we add one new element to HTML to declare extension elements and specify how they are to be processed:

  <!element declare - O EMPTY>
  <!attlist declare
    href        %URI;        #IMPLIED  -- identifies namespace --
    as          NAME         #IMPLIED  -- local value of namespace --
    elements    NAMES        #IMPLIED  -- list of tag names --
    type     %Content-type   #IMPLIED  -- for the extension --
    code        %URI:        #IMPLIED  -- binds to code --
    codetype %Content-type   #IMPLIED  -- for the code --
    >

When textual content needs to be hidden from older browsers, I propose we use the script element as a wrapper. For this purpose I propose we add the code and codetype attributes to the HTML script element to specify the code to be used to interpret the contents of the script element.

In many cases the existing type attribute will be sufficient to bind to the code, and should override the language attribute which is needed to hide the contents of the script element. This is because Netscape 3.0 assumes the contents are JavaScript unless an explicit value for the language attribute is provided.

This example shows how to hide "a", "x" and "b" in the following MathML markup:

  <script language=xml type="application/mathml">
    <--
    <math>
        <apply>
           <plus/>
           <apply>
              <times/>
              <ci>a</ci>
              <ci>x</ci>
           </apply>
           <ci>b</ci>
        </apply>
    </math>
    -->
  </script>

This is included just after this line, so you can try it out for yourself on different browsers:

$a x b$

Dave Raggett <dsr@w3.org>, 28th January 1998.