7 MathML interactions with the Wide World

Overview: Mathematical Markup Language (MathML) Version 3.0
Previous: 6 Characters, Entities and Fonts
Next: A Parsing MathML

7 MathML interactions with the Wide World
    7.1 Invoking MathML Processors: namespace, extensions, and mime-types
        7.1.1 Recognizing MathML in an XML Model
        7.1.2 Resource Types for MathML Documents
    7.2 Transferring MathML in Desktop Environments
        7.2.1 Basic Flavors' Names and Contents
        7.2.2 Recommended Behaviours when Transferring
        7.2.3 Discussion
        7.2.4 Examples
            7.2.4.1 Example 1
            7.2.4.2 Example 2
            7.2.4.3 Example 3
            7.2.4.4 Example 4
    7.3 Combining MathML and Other Formats
        7.3.1 Mixing MathML and HTML
        7.3.2 Linking
        7.3.3 Images
        7.3.4 MathML and Graphical Markup
    7.4 Using CSS with MathML

Because MathML is, typically, embedded in a wider context, it is important to describe the conditions that processors should acknowledge in order to recognize XML fragments as MathML. This chapter describes the fundamental mechanisms to recognize and transfer MathML markup fragments within a larger environment such as an XML document or a desktop file-system, it raises the issues of combining external markup within MathML, then indicates how cascading style sheets can be used within MathML.

This chapter applies to both content and presentation MathML and indicates a particular processing model to the semantics, annotation and annotation-xml elements defined in Section 5.3 Attributions in Strict Content MathML.

7.1 Invoking MathML Processors: namespace, extensions, and mime-types

7.1.1 Recognizing MathML in an XML Model

Within an XML document supporting namespaces (TODO: cite xmlns and xml specs), the preferred method to recognize MathML markup is by the identification of the math element in the appropriate namespace, i.e. that of URI http://www.w3.org/1998/Math/MathML.

This is the recommended method to embed MathML within [XHTML] documents. Some user-agents' setup may require supplementary information to be available, such as the MicroSoft behaviour specification (TODO: quote) used in the MathType browser-extension (TODO:quote).

Markup-language specifications that wish to embed MathML may provide special conditions independent of this recommendation. The conditions should be equivalent and the elements' local-names should remain the same.

7.1.2 Resource Types for MathML Documents

Although rendering MathML expressions often occurs in place in a Web browser, other MathML processing functions take place more naturally in other applications. Particularly common tasks include opening a MathML expression in an equation editor or computer algebra system. It is important therefore to specify the encoding-names that MathML fragments should be called with:

MIME types [RFC2045], [RFC2046] offer a strategy that can be used in current user agents to invoke a MathML processor. This is primarily useful when referencing separate files containing MathML markup from an embed or object element, or within a desktop environment. (TODO: check that this still applies)

[RFC3023] assigns MathML the MIME type application/mathml+xml which is the official mime-type. The W3C Math Working Group recommends the standard file extension .mml within a registry associating file formats to file-extension. In MathML 1.0, text/mathml was given as the suggested MIME type. This has been superceded by RFC3023. In the next section, alternate encoding names are provided for the purposes of desktop transfers.

Issue specify-encoding-names-in-details wiki (member only)
Encoding Names jungle

Encoding names are specified in the section below, and are described in the chapter 5 as attribute values of the annotation* elements. Moreover, content-types and encoding-names have a fairly similar semantic and are even used exchangeably in some environments (e.g. Java's DataFlavor).

It might be worth trying to homogenize our list and maybe specify mime-type equivalence for each encoding names.

Resolution None recorded

7.2 Transferring MathML in Desktop Environments

MathML expressions are often exchanged between applications using the familiar copy-and-paste or drag-and-drop paradigms. This section provides recommended ways to process MathML while applying these paradigms.

Applying them will transfer MathML fragments between the contexts of two applications by making them available in several flavors, often called clipboard formats or data flavors. The copy-and-paste paradigm lets application place content in a central clipboard, one data-stream per clipboard format; consuming applications negotiate by choose to read the data of the format they elect. The drag-and-drop pardigm lets application offer content by declaring the available formats and potential recipients accept or reject a drop based on this list; the drop action then lets the receiving application request the delivery of the format in the indicated format. The list of flavors is generally ordered, going from the most wishable to the least wishable flavor.

Current desktop platforms offer both of these transfer paradigms using similar transfer architectures. In this section we specify what applications should provide as transfer-flavors, how they should be named, and how they should handle the special semantics, annotation, and annotation-xml elements.

To summarize the two negotiation mechanisms, we shall, here, be talking of flavors, each having a name (a character string) and a content (a stream of binary data), which are exported.

7.2.1 Basic Flavors' Names and Contents

MathML contains two distinct vocabularies: one for encoding mathematical semantics called Chapter 4 Content Markup and one for encoding visual presentation called Chapter 3 Presentation Markup. Some MathML-aware applications import and export only one of these vocabularies, while other may be capable of producing and consuming both. Consequently, we propose three distinct MathML flavors:

Flavor Name Description
MathML Content Instance contains content MathML markup only
MathML Presentation Instance contains presentation MathML markup only
MathML Any well-formed MathML instance presentation markup, content markup, or a mixture of the two is allowed

Note that Content MathML, Presentation MathML and MathML are the exact strings that should be used to describe the flavors described above. On operating systems that allow such, applications should register such names (e.g. Windows' RegisterClipboardFormat).

When transferring MathML, for example when placing it within a clipboard, an application MUST ensure the content is a well-formed XML instance of a MathML schema. Specifically:

  1. The instance MUST begin with a XML processing instruction, e.g. <?xml version="1.0">

  2. The instance MUST contain exactly one root math element.

  3. Since MathML is frequently embedded within other XML document types, the instance MUST declare the MathML namespace on the root math element. In addition, the instance SHOULD use a schemaLocation attribute on the math element to indicate the location of MathML schema documents against which the instance is valid. Note that the presence of the schemaLocation attribute does not require a consumer of the MathML instance to obtain or use the cited schema documents.

  4. The instance MUST use numeric character references (e.g. &#x03b1;) rather than character entity names (e.g. &alpha;) for greater interoperability.

  5. The character encoding for the instance MUST be either specified in the XML header, UTF-16, or UTF-8. UTF-16-encoded data MUST begin with a byte-order mark (BOM). If no BOM or encoding is given, the character encoding will be assumed to be UTF-8.

7.2.2 Recommended Behaviours when Transferring

Applications that transfer MathML SHOULD adhere to the following conventions:

  1. Applications that have pure presentation markup and/or pure content markup versions of an expression SHOULD offer as many of these two flavors as are available.

  2. When both presentation and content are exported, recipients should consider it equivalent to a single MathML instance in which presentation and content are combined at the top level using MathML's semantics element (see Section 5.5.1 Top-level Parallel Markup). (TODO: issue: in DnD you can't read several, at least in java) The order between flavors determines whether presentation wraps content, or vice-versa. Usually, Presentation MathML should be offered first so that it wraps the Content MathML.

  3. When an application has a mixed presentation and content version in addition to pure presentation and/or content versions, it should export the mixed versionafter the pure presentation and/or content markup versions, and mark it as the generic MathML flavor.

  4. When an application cannot produce pure presentation and/or content markup versions, or cannot determine whether MathML data is pure presentation or content markup (e.g. data being passed through from a third application,) it should export only one version marked as the generic MathML flavor.

  5. An application that only has pure presentation and/or content markup versions of an expression available SHOULD NOT export a second copy of the data marked as the generic MathML flavor.

  6. When an application exports a MathML fragment whose root element is a semantics element, it SHOULD offer, after the flavors above, a flavor for each annotation or annotation-xml element: the flavor should be given by the encoding attribute value, and the content should be the child text in UTF-8 (if the annotation element contains only textual data), a valid XML fragment (if the annotation-xml element contains children), or the data resulting of requesting the URL given by the href attribute.

  7. As a final fallback applications SHOULD export a version of the data in plain-text flavor (such as CF_UNICODETEXT, UnicodeText, NSStringPboardType, text/plain, ...). When an application has multiple versions of an expression available, it may choose the version to export as text at its discretion. Since some older MathML-aware programs expect MathML instances transferred as text to begin with a math element, the text version should generally omit the XML processing instruction, DOCTYPE declaration and other XML prolog material before the math element. Similarly, the BOM should be omitted for Unicode text encoded as UTF-16. Note, the Unicode text version of the data should always be the last flavor exported, following the principle that exported flavors should be ordered with the most specific flavor first and the least specific flavor last.

7.2.3 Discussion

For purposes of determining whether a MathML instance is pure content markup or pure presentation markup, the math element and the semantics, annotation and annotation-xml elements should be regarded as belonging to both the presentation and content markup vocabularies. This is obvious for the root math element which is required for all MathML expressions. However, the semantics element and its child annotation elements comprise an arbitrary annotation mechanism within MathML, and are not tied to either presentation or content markup. Consequently, applications consuming MathML should always process these four elements even if the application only implements one of the two vocabularies.

It is worth noting that the above recommendations allow agents producing MathML to provide binary data for the clipboard, for example as an image or an application-specific format. The sole method to do so is to reference the binary data by the href attribute since XML child-text does not allow arbitrary byte-streams.

While the above recommendations are intended to improve interoperability between MathML-aware applications utilizing the transfer flavors, it should be noted that they do not guarantee interoperablility. For example, references to external resources (e.g. stylesheets, etc.) in MathML data can also cause interoperability problems if the consumer of the data is unable to locate them, just as can happen when cutting and pasting HTML or many other data types. Applications that make use of references to external resources are encouraged to make users aware of potential problems and provide alternate ways for obtaining the referenced resources. In general, consumers of MathML data containing references they cannot resolve or do not understand should ignore them.

7.2.4 Examples

7.2.4.1 Example 1

An e-Learning application has a database of quiz questions, some of which contain MathML. The MathML comes from multiple sources, and the e-Learning application merely passes the data on for display, but does not have sophisticated MathML analysis capabilities. Consequently, the application is not aware whether a given MathML instance is pure presentation or pure content markup, nor does it know whether the instance is valid with respect to a particular version of the MathML schema. It therefore places the following data formats on the clipboard:

Flavour Name Flavor Content
MathML
<?xml version="1.0"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">...</math>
Unicode Text
<math xmlns="http://www.w3.org/1998/Math/MathML">...</math>

7.2.4.2 Example 2

An equation editor is able to generate pure presentation markup, valid with respect to MathML 2.0, 2nd Edition. Consequently, it exports the following flavors:

Flavour Name Flavor Content
Presentation MathML
<?xml version="1.0"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">...</math>
Tiff (a rendering sample)
Unicode Text
<math xmlns="http://www.w3.org/1998/Math/MathML">...</math>

7.2.4.3 Example 3

A schema-based content management system contains multiple MathML representations of a collection of mathematical expressions, including mixed markup from authors, pure content markup for interfacing to symbolic computation engines, and pure presentation markup for print publication. Due to the system's use of schemas, markup is stored with a namespace prefix. The system therefore can transfer the following data:

Flavour Name Flavor Content
Presentation MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
  ...
  <mml:mrow>
</mml:math>
Content MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:apply>
  ...
  <mml:apply>
</mml:math>
MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
    <mml:apply> ... content markup within presentation markup ... </mml:apply>
    ...
  </mml:mrow>
</mml:math> 
TeX
{x \over x-1}
Unicode Text
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
  ...
  <mml:mrow>
</mml:math>

7.2.4.4 Example 4

A similar content management system is web-based and delivers MathML representations of mathematiacly expressions. The system is able to produce presentation MathML, content MathML, TeX and pictures in PNG format. In web-pages being browsed, it could produce a MathML fragment such as the following:

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <mml:semantics>
    <mml:mrow>...</mml:mrow>
    <mml:annotation-xml encoding="MathML content">...</mml:annotation-xml>
    <mml:annotation encoding="TeX">{1 \over x}</mml:annotation>
    <mml:annotation encoding="image/png" href="formula3848.png"/>
  </mml:semantics>
</mml:math>

A web-browser that receives such a fragment and tries to export it as part of a drag-and-drop action, can offer the following flavors:

Flavour Name Flavor Content
Presentation MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
  ...
  <mml:mrow>
</mml:math>
Content MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:apply>
  ...
  <mml:apply>
</mml:math>
MathML
<?xml version="1.0"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
    <mml:apply> ... content markup within presentation markup ... </mml:apply>
    ...
  </mml:mrow>
</mml:math> 
TeX
{x \over x-1}
image/png (the content of the picture file, requested from formula3848.png
Unicode Text
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd">
  <mml:mrow>
  ...
  <mml:mrow>
</mml:math>

7.3 Combining MathML and Other Formats

Since MathML is most often generated by authoring tools, it is particularly important that opening a MathML expression in an editor should be easy to do and to implement. In many cases, it will be desirable for an authoring tool to record some information about its internal state along with a MathML expression, so that an author can pick up editing where he or she left off. The following markup is proposed:

  1. For any extra information that is expected to be semantically equivalent MathML-3 proposes the usage of the semantics element presented in Section 5.3 Attributions in Strict Content MathML.

  2. For any extra information that cannot be declared as such, and is, expectedly, private to the application. MathML-3 suggests to use the maction, see Section 3.6.1 Bind Action to Sub-Expression (maction).

7.3.1 Mixing MathML and HTML

In order to fully integrate MathML into XHTML, it should be possible not only to embed MathML in XHTML, as described in Section 7.1.1 Recognizing MathML in an XML Model, but also to embed XHTML in MathML. However, the problem of supporting XHTML in MathML presents many difficulties. Therefore, at present, the MathML specification does not permit any XHTML elements within a MathML expression, although this may be subject to change in a future revision of MathML.

In most cases, XHTML elements (headings, paragraphs, lists, etc.) either do not apply in mathematical contexts, or MathML already provides equivalent or better functionality specifically tailored to mathematical content (tables, mathematics style changes, etc.). However, there are two notable exceptions, the XHTML anchor and image elements. For this functionality, MathML relies on the general XML linking and graphics mechanisms being developed by other W3C Activities.

7.3.2 Linking

Issue Linking-and-marking-ids wiki (member only)
Linking and Marking IDs

We wish to stop using xlink for links since it seems unimplemented and add the necessary attributes at presentation elements.

Resolution None recorded

MathML has no element that corresponds to the XHTML anchor element a. In XHTML, anchors are used both to make links, and to provide locations to which a link can be made. MathML, as an XML application, defines links by the use of the mechanism described in the W3C Recommendation "XML Linking Language" [XLink].

A MathML element is designated as a link by the presence of the attribute xlink:href. To use the attribute xlink:href, it is also necessary to declare the appropriate namespace. Thus, a typical MathML link might look like:

<mrow xmlns:xlink="http://www.w3.org/1999/xlink"
      xlink:href="sample.xml">
  ...
</mrow>

MathML designates that almost all elements can be used as XML linking elements. The only elements that cannot serve as linking elements are those which exist primarily to disambiguate other MathML constructs and in general do not correspond to any part of a typical visual rendering. The full list of exceptional elements that cannot be used as linking elements is given in the table below.

MathML elements that cannot be linking elements
mprescripts none
malignmark maligngroup

Note that the XML Linking [XLink] and XML Pointer Language [XPointer] specifications also define how to link into a MathML expressions. Be aware, however, that such links may or may not be properly interpreted in current software.

7.3.3 Images

The img element has no MathML equivalent. The decision to omit a general mechanism for image inclusion from MathML was based on several factors. However, the main reason for not providing an image facility is that MathML takes great pains to make the notational structure and mathematical content it encodes easily available to processors, whereas information contained in images is only available to a human reader looking at a visual representation. Thus, for example, in the MathML paradigm, it would be preferable to introduce new glyphs via the mglyph element which at a minimum identifies them as glyphs, rather than simply including them as images.

7.3.4 MathML and Graphical Markup

Apart from the introduction of new glyphs, many of the situations where one might be inclined to use an image amount to displaying labeled diagrams. For example, knot diagrams, Venn diagrams, Dynkin diagrams, Feynman diagrams and commutative diagrams all fall into this category. As such, their content would be better encoded via some combination of structured graphics and MathML markup. However, at the time of this writing, it is beyond the scope of the W3C Math Activity to define a markup language to encode such a general concept as "labeled diagrams." (See http://www.w3.org/Math for current W3C activity in mathematics and http://www.w3.org/Graphics for the W3C graphics activity.)

One mechanism for embedding additional graphical content is via the semantics element, as in the following example:

<semantics>
  <apply>
    <intersect/>
    <ci>A</ci>
    <ci>B</ci>
  </apply>
  <annotation-xml encoding="SVG1.1">
    <svg xmlns="http://www.w3.org/2000/svg"  viewBox="0 0 290 180">
      <clipPath id="a">
      <circle cy="90" cx="100" r="60"/>
      </clipPath>
      <circle fill="#AAAAAA" cy="90" cx="190"
              r="60" style="clip-path:url(#a)"/>
      <circle stroke="black" fill="none" cy="90" cx="100" r="60"/>
      <circle stroke="black" fill="none" cy="90" cx="190" r="60"/>
    </svg>
  </annotation-xml>
  <annotation-xml encoding="application/xhtml+xml">
    <img xmlns="http://www.w3.org/1999/xhtml" src="intersect.gif" alt="A intersect B"/>
  </annotation-xml>
</semantics>

Here, the annotation-xml elements are used to indicate alternative representations of the Content MathML depiction of the intersection of two sets. The first one is in the "Scalable Vector Graphics" format [SVG1.1] (see [XHTML-MathML-SVG] for the definition of an XHTML profile integrating MathML and SVG), the second one uses the XHTML img element embedded as an XHTML fragment. In this situation, a MathML processor can use any of these representations for display, perhaps producing a graphical format such as the image below.

\includegraphics{intersect}

Note that the semantics representation of this example is given in the Content MathML markup, as the first child of the semantics element. In this regard, it is the representation most analogous to the alt attribute of the img element in XHTML, and would likely be the best choice for non-visual rendering.

7.4 Using CSS with MathML

When MathML is rendered in an environment that supports [CSS21], controlling mathematics style properties with a CSS stylesheet is obviously desirable. MathML 2.0 has significantly redesigned the way presentation element style properties are organized to facilitate better interaction between MathML renderers and CSS style mechanisms. It introduces four new mathematics style attributes with logical values. Roughly speaking, these attributes can be viewed as the proper selectors for CSS rules that affect MathML.

Controlling mathematics styling is not as simple as it might first appear because mathematics styling and text styling are quite different in character. In text, meaning is primarily carried by the relative positioning of characters next to one another to form words. Thus, although the font used to render text may impart nuances to the meaning, transforming the typographic properties of the individual characters leaves the meaning of text basically intact. By contrast, in mathematical expressions, individual characters in specific typefaces tend to function as atomic symbols. Thus, in the same equation, a bold italic 'x' and a normal italic 'x' are almost always intended to be two distinct symbols that mean different things. In traditional usage, there are eight basic typographical categories of symbols. These categories are described by mathematics style attributes, primarily the mathvariant attribute.

Text and mathematics layout also obviously differ in that mathematics uses 2-dimensional layout. As a result, many of the style parameters that affect mathematics layout have no textual analogs. Even in cases where there are analogous properties, the sensible values for these properties may not correspond. For example, traditional mathematical typography usually uses italic fonts for single character identifiers, and upright fonts for multicharacter identifier. In text, italicization does not usually depend on the number of letters in a word. Thus although a font-slant property makes sense for both mathematics and text, the natural default values are quite different.

Because of the difference between text and mathematics styling, only the styling aspects that do not affect layout are good candidates for CSS control. MathML 3.0 captures the most important properties with the new mathematics style attributes, and users should try to use them whenever possible over more direct, but less robust, approaches. A sample CSS stylesheet illustrating the use of the mathematical style attributes is available in Appendix E Sample CSS Style Sheet for MathML. Users should not count on MathML implementations to implement any other properties than those in the Font, Colors, and Outlines families of properties described in [CSS2] and implementations should only implement these properties within MathML-elements. Note that these prohibitions do not apply to CSS stylesheets that implement the MathML-CSS profile. (TODO: quote).

TODO: add equivalence statements and conflict resolution and stress that CSS changes should not be considered meaningful.

Generally speaking, the model for CSS interaction with the math style attributes runs as follows. A CSS style sheet might provide a style rule such as:

math *.[mathsize="small"] {
  font-size: 80%
}

This rule sets the CSS font-size properties for all children of the math element that have the mathsize attribute set to small. A MathML renderer would then query the style engine for the CSS environment, and use the values returned as input to its own layout algorithms. MathML does not specify the mechanism by which style information is inherited from the environment. However, some suggested rendering rules for the interaction between properties of the ambient style environment and MathML-specific rendering rules are discussed in Section 3.2.2 Mathematics style attributes common to token elements, and more generally throughout Chapter 3 Presentation Markup.

It should be stressed, however, that some caution is required in writing CSS stylesheets for MathML. Because changing typographic properties of mathematics symbols can change the meaning of an equation, stylesheet should be written in a way such that changes to document-wide typographic styles do not affect embedded MathML expressions. By using the MathML 2.0 mathematics style attributes as selectors for CSS rules, this danger is minimized.

Another pitfall to be avoided is using CSS to provide typographic style information necessary to the proper understanding of an expression. Expressions dependent on CSS for meaning will not be portable to non-CSS environments such as computer algebra systems. By using the logical values of the new MathML 3.0 mathematics style attributes as selectors for CSS rules, it can be assured that style information necessary to the sense of an expression is encoded directly in the MathML.

MathML 3.0 does not specify how a user agent should process style information, because there are many non-CSS MathML environments, and because different users agents and renderers have widely varying degrees of access to CSS information. In general, however, developers are urged to provide as much CSS support for MathML as possible.

Overview: Mathematical Markup Language (MathML) Version 3.0
Previous: 6 Characters, Entities and Fonts
Next: A Parsing MathML