Content Markup

4.1 Introduction

In MathML3, content markup is divided into two subsets "Strict"- and "Pragmatic" Content MathML. The first subset uses a minimal set of elements representing the meaning of a mathematical expression in a uniform structure, while the second one tries to strike a pragmatic balance between verbosity and formality. Both forms of content Expressions are legitimate and have their role in representing mathematics. Strict Content MathML is canonical in a sense and simplifies the implementation of content MathML processors and the comparison of content expressions and Pragmatic Content MathML is much simpler and more intuitive for humans to understand, read, and write.

Strict Content MathML3 expressions can directly be given a formal semantics in terms of "OpenMath Objects" [OpenMath2004], and we interpret Pragmatic Content MathML3 expressions by specifying equivalent Strict variants, so that they inherit their semantics.

Editorial note: MiKo
We are using the notions of "Strict" and "Pragmatic" Content MathML in this working draft, even though they do not fully convey the intention of the representations choices. However, they carry the intuition much better than the terms "canonical" and "legacy" we used before, since they are less judgmental.

4.2 Strict Content MathML

4.2.1 The structure of MathML3 Content Expressions

MathML content encoding is based on the concept of an expression tree built up from

basic expressions, i.e. Numbers, Symbols, and Identifiers
derived expressions, i.e. function applications and binding expressions, and
attributions
error markup

As a general rule, the terminal nodes in the tree represent basic mathematical objects such as numbers, variables, arithmetic operations and so on. The internal nodes in the tree generally represent some kind of function application or other mathematical construction that builds up a compound object. Function application provides the most important example; an internal node might represent the application of a function to several arguments, which are themselves represented by the terminal nodes underneath the internal node.

This section provides the basic XML Encoding of content MathML expression trees. General usage and the mechanism used to associate mathematical meaning with symbols are provided here. [mathml3cds] provides a complete listing of the specific Content MathML symbols defined by this specification along with full reference information including attributes, syntax, and examples. It also describes the intended semantics of those symbols and suggests default renderings. The rules for using presentation markup within content markup are explained in Section 5.4.2 Presentation Markup in Content Markup.

4.2.2 Encoding OpenMath Objects

Strict Content MathML is designed to be and XML encoding of OpenMath Objects (see [OpenMath2004]), which constitute the semantics of strict content MathML expressions. The table below gives an element-by-element correspondence between the OpenMath XML encoding of OpenMath objects and strict content MathML.

strict Content MathML	OpenMath
`cn`	`OMI`, `OMF`
`csymbol`	`OMS`
`ci`	`OMV`
`apply`	`OMA`
`bind`	`OMBIND`
`bvar`	`OMBVAR`
`condition`	`OMC`
`share`	`OMR`
`semantics`	`OMATTR`, `OMATP`
`annotation`, `annotation-xml`	`OMFOREIGN`
`error`	`OME`

Note that with this correspondence, strict content MathML also gains the OpenMath binary encoding as a space-efficient way of encoding content MathML expressions.

4.2.3 Numbers

The cn element is the MathML token element used to represent numbers. The supported types of numbers include integers, real numbers, double precision floating point numbers, rational numbers and complex numbers. Where it makes sense, the base in which the number is written can be specified. For most numeric values, the content of a cn element should be either PCDATA or other cn elements.

The permissible attributes on the cn are:

Name	Values	Default
`type`	"integer" \| "real" \| "double" \| "e-notation," \| "rational" \| "complex-cartesian" \| "complex-polar"	real
`base`	number	10

The attribute type is used to specify the kind of number being represented. The pre-defined values are given in the table above. Unless otherwise specified, the default "real" is used.

The attribute base is used to specify how the content is to be parsed. The attribute value is a base 10 positive integer giving the value of base in which the PCDATA is to be interpreted. The base attribute should only be used on elements with type "integer" or "real". Its use on cn elements of other type is deprecated. The default value for base is "10".

Each data type implies that the content be of a certain form, as detailed below.

integer

An integer is represented by an optional sign followed by a string of one or more "digits". How a "digit" is interpreted depends on the base attribute. If base is present, it specifies the base for the digit encoding, and it specifies it base 10. Thus base='16' specifies a hexadecimal encoding.

When base > 10, letters are used in alphabetical order as digits. For example,

<cn base="16">7FE0</cn>

encodes the number written as 32736 in base ten.

When base > 36, some integers cannot be represented using numbers and letters alone and it is up to the application what additional characters (if any) may be used for digits. For example,

<cn base="1000">10F</cn>

represents the number written in base 10 as 1,000,015. However, the number written in base 10 as 1,000,037 cannot be represented using letters and numbers alone when base is 1000.

real

A real number is presented in radix notation. Radix notation consists of an optional sign ("+" or "-") followed by a string of digits possibly separated into an integer and a fractional part by a "decimal point". Some examples are 0.3, 1, and -31.56. If a different base is specified, then the digits are interpreted as being digits computed to that base (in the same was as described for type "integer").

double

This type is used to mark up those double-precision floating point numbers that can be represented in the IEEE 754 standard. This includes a subset of the (mathematical) real numbers, negative zero, positive and negative real infinity and a set of "not a number" values.

The content of a cn element may be PCDATA (representing numeric values as described below), a infinity element (representing positive real infinity), a minfinity element (representing negative real infinity) or a notanumber element.

If the content is PCDATA, it is interpreted as a real number in scientific notation. The number then has one or two parts, a significand and possibly an exponent. The significand has the format of a base 10 real number, as described above. The exponent (if present) has the format of a base 10 integer as described above. If the exponent is not present, it is taken to have the value 0. The value of the number is then that of the significand times ten to the power of the exponent.

A special case of PCDATA content is recognized. If a number of the above form has a negative sign and all digits of the signifcand are zero, then it is taken to be a negative zero in the sense of the IEEE 754 standard.

e-notation

This type is deprecated. It is recommended to use double or real instead.

A real number may be presented in scientific notation using this type. Such numbers have two parts (a significand and an exponent) separated by a <sep/> element. The first part is a real number, while the second part is an integer exponent indicating a power of the base. For example, 12.3<sep/>5 represents 12.3 times 10⁵. The default presentation of this example is 12.3e5.

rational

A rational number is given as two integers giving the numerator and denominator of a quotient. These should themselves be given as nested cn elements.

For backward compatibility, deprecated usage allows the two integers to be given as PCDATA separated by <sep/>. If a base is present in this deprecated use, it specifies the base used for the digit encoding of both integers.

complex-cartesian

A complex cartesian number is given as two numbers giving the real and imaginary parts. These should themselves be given as nested cn elements. As for rational numbers, the deprecated use of <sep/> is also allowed.

complex-polar

A complex polar number is given as two numbers giving the magnitude and angle. These should themselves be given as nested cn elements. As for rational numbers, the deprecated use of <sep/> is also allowed.

constant

This type was deprecated in MathML 2.0 and is now no longer supported.

4.2.4 Symbols and Identifiers

The notion of constructing a general expression tree is essentially that of applying an operator to sub-objects. For example, the sum "x+y" can be thought of as an application of the addition operator to two arguments x and y. And the expression "cos(π)" as the application of the cosine function to the number π.

In Content MathML, elements are used for operators and functions to capture the crucial semantic distinction between the function itself and the expression resulting from applying that function to zero or more arguments. This is addressed by making the functions self-contained objects with their own properties and providing an explicit apply construct corresponding to function application. We will consider the apply construct in the next section.

In a sum expression "x+y" above, x and y typically taken to be "variables", since they have properties, but no fixed value, whereas the addition function is a "constant" or "symbol" as it denotes a specific function, which is defined somewhere externally. (Note that "symbol" is used here in the abstract sense and has no connection with any presentation of the construct on screen or paper).

4.2.4.1 Content Identifiers

Strict Content MathML3 uses the ci element (for "content identifier") to construct a variable, or an identifier that is not a symbol. Its PCDATA content is interpreted as a name that identifies it. Two variables are considered equal, iff their names are in the respective scope (see Section 4.2.6 Bindings and Bound Variables for a discussion). A type attribute indicates the type of object the symbol represents. Typically, ci represents a real scalar, but no default is specified.

Name	values	default
type	string	unspecified

4.2.4.2 Content Symbols

Due to the nature of mathematics the meaning of the mathematical expressions must be extensible. The key to extensibility is the ability of the user to define new functions and other symbols to expand the terrain of mathematical discourse. The csymbol element is used represent a "symbol" in much the same way that ci is used to construct a variable. The difference is that csymbol should refer to some mathematically defined concept with an external definition referenced via the content dictionary attributes, whereas ci is used for identifiers that are essentially "local" to the MathML expression.

In MathML3, external definitions are grouped in Content Dictionaries (structured documents for the definition of mathematical concepts; see [OpenMath2004] and [mathml3cds]).

We need three bits of information to fully identify a symbol: a symbol name, a Content Dictionary name, and (optionally) a Content Dictionary base URI, which we encode in the textual content (which is the symbol name) and two attributes of the csymbol element: cd and cdbase. The Content Dictionary is the location of the declaration of the symbol, consisting of a name and, optionally, a unique prefix called a cdbase which is used to disambiguate multiple Content Dictionaries of the same name. There are multiple encodings for content dictionaries, this referencing scheme does not distinguish between them. If a symbol does not have an explicit cdbase attribute, then it inherits its cdbase from the first ancestor in the XML tree with one, should such an element exist. In this document we have tended to omit the cdbase for brevity.

Name	values	default
cdbase	URI	inherited
cd	URI	required

Editorial note: MiKo
need to fix the default URI here

Current CD default for `csymbol`
Issue default_cd	`wiki (member only)`
We might make the `cd` attribute optional? Then that would refer to the current CD if we are in one, or we could make `cd` inherit like `cdbase`. That would save bandwidth
Resolution	None recorded

There are other properties of the symbol that are not explicit in these fields but whose values may be obtained by inspecting the Content Dictionary specified. These include the symbol definition, formal properties and examples and, optionally, a Role which is a restriction on where the symbol may appear in a MathML expression tree. The possible roles are described in Section 8.5 Symbol Roles.

<csymbol cdbase="http://www.example.com" cd="VectorCalculus">Christoffel</csymbol>

For backwards compatibility with MathML2 and to facilitate the use of MathML within a URI-based framework (such as RDF [rdf] or OWL [owl]), the content of the name, cd, and cdbase can be combined in the definitionURL attribute: we provide the following scheme for constructing a canonical URI for an MathML Symbol, which can be given in the definitionURL attribute.

URI = cdbase-value + '/' + cd-value + '#' + name-value

In the case of the Christoffel symbol above this would be the URL

http://www.example.com/VectorCalculus#Christoffel

For backwards compatibility with MathML2, we do not require that the definitionURL point to a content dictionary. But if the URL in this attribute is of the form above, it will be interpreted as the canonical URL of a MathML3 symbol. So the representation above would be equivalent to the one below:

<csymbol definitionURL="http://www.example.com/VectorCalculus">Christoffel</csymbol>

What is the official URI for MathMLCDs
Issue MathML_CDs_URI	`wiki (member only)`
We still have to fix this. Maybe it should correspond to the final resting place for CDs.
Resolution	None recorded

URI encoding of `cdbase`/`cd`/`name` triplet
Issue definitionURL_encoding	`wiki (member only)` ISSUE-17 (member only)
The URI encoding of the triplet we propose here does not work (not yet for MathMLCDs and not at all for OpenMath2 CDs). The URI reference proposed uses a bare name pointer `#Christoffel` at the end, which points to the element that has and `ID`-type attribute with value `Christoffel`, which is not present in either of these formats. Moreover, it does not scale well with extended CD formats like the OMDoc 1.8 format currently under development
Resolution	None recorded

cdbase default value
Issue cdbase-default	`wiki (member only)` ISSUE-13 (member only)
For the inheritance mechanism to be complete, it would make sense to define a default cdbase attribute value, e.g. at the math element. We'd support expressions ignorant of cdbase as they all are thus far. Something such as `http://www.w3.org/Math/CDs/official` ? Moreover the MathML content dictionaries should contain such.
Resolution	None recorded

4.2.5 Function Application

The most fundamental way of building a compound object in mathematics is by applying a function or an operator to some arguments. MathML supplies an infrastructure to represent this in expression trees, which we will present in this section.

An apply element is used to build an expression tree that represents the result of applying a function or operator to its arguments. The tree corresponds to a complete mathematical expression. Roughly speaking, this means a piece of mathematics that could be surrounded by parentheses or "logical brackets" without changing its meaning.

Name	values	default
cdbase	URI	inherited

For example, (x + y) might be encoded as

<apply><csymbol cd="algebra-logic">plus</csymbol><ci>x</ci><ci>y</ci></apply>

The opening and closing tags of apply specify exactly the scope of any operator or function. The most typical way of using apply is simple and recursive. Symbolically, the content model can be described as:

<apply> op a b </apply>

where the operands a and b are MathML expression trees themselves, and op is a MathML expression tree that represents an operator or function. Note that apply constructs can be nested to arbitrary depth.

An apply may in principle have any number of operands:

<apply> op a b [c...] </apply>

For example, (x + y + z) can be encoded as

<apply>
  <csymbol cd="algebra-logic">plus</csymbol>
  <ci>x</ci>
  <ci>y</ci>
  <ci>z</ci>
</apply>

Mathematical expressions involving a mixture of operations result in nested occurrences of apply. For example, a x + b would be encoded as

<apply><csymbol cd="algebra-logic">plus</csymbol>
  <apply><csymbol cd="algebra-logic">times</csymbol>
    <ci>a</ci>
    <ci>x</ci>
  </apply>
  <ci>b</ci>
</apply>

There is no need to introduce parentheses or to resort to operator precedence in order to parse the expression correctly. The apply tags provide the proper grouping for the re-use of the expressions within other constructs. Any expression enclosed by an apply element is viewed as a single coherent object.

An expression such as (F+G)(x) might be a product, as in

<apply><csymbol cd="algebra-logic">times</csymbol>
  <apply><csymbol cd="algebra-logic">plus</csymbol>
    <ci>F</ci>
    <ci>G</ci>
  </apply>
  <ci>x</ci>
</apply>

or it might indicate the application of the function F + G to the argument x. This is indicated by constructing the sum

<apply><csymbol cd="algebra-logic">plus</csymbol><ci>F</ci><ci>G</ci></apply>

and applying it to the argument x as in

<apply>
  <apply><csymbol cd="algebra-logic">plus</csymbol>
    <ci>F</ci>
    <ci>G</ci>
  </apply>
  <ci>x</ci>
</apply>

Both the function and the arguments may be simple identifiers or more complicated expressions.

The apply element is conceptually necessary in order to distinguish between a function or operator, and an instance of its use. The expression constructed by applying a function to 0 or more arguments is always an element from the codomain of the function. Proper usage depends on the operator that is being applied. For example, the plus operator may have zero or more arguments, while the minus operator requires one or two arguments to be properly formed.

If the object being applied as a function is not already one of the elements known to be a function (such as sin or plus) then it is treated as if it were a function.

4.2.6 Bindings and Bound Variables

Some complex mathematical objects are constructed by the use of bound variables. For instance the integration variables in an integral expression is one.

4.2.6.1 Bindings

Such expressions are represented as MathML expression trees using the bind element. Its first child is a MathML expression that represents a binding operator (the integral operator in our example). This can be followed by a non-empty list of bvar elements for the bound variables, possibly augmented by the qualifier element condition (see Section 4.2.7 Qualifiers. The last child is the body of the binding, it is another content MathML expression.

Name	values	default
cdbase	URI	inherited

4.2.6.2 Bound Variables

The bvar element is a special qualifier element that is used to denote the bound variable of a binding expression, e.g. in sums, products, and quantifiers or user defined functions.

Name	values	default
cdbase	URI	inherited

4.2.6.3 Examples

<bind>
  <csymbol cd="algebra-logic">forall</csymbol>
  <bvar><ci>x</ci></bvar>
  <apply><csymbol cd="relations">eq</csymbol>
    <apply><csymbol cd="algebra-logic">minus</csymbol><ci>x</ci><ci>x</ci></apply>
    <cn>0</cn>
  </apply>
</bind>

<bind>
  <csymbol cd="calculus_veccalc">int</csymbol>
  <bvar><ci xml:id="var-x">x</ci></bvar>
  <apply><csymbol cd="algebra-logic">power</csymbol>
    <ci definitionURL="#var-x"><mi>x</mi></ci>
    <cn>7</cn>
  </apply>
</bind>

Editorial note: MiKo
We need to say something about alpha-conversion here for OpenMath compatibility.

4.2.7 Qualifiers

The integrals we have seen so far have all been indefinite, i.e. the range of the bound variables range is unspecified. In many situations, we also want to specify range of bound variables, e.g. in definitive integrals. MathML3 provides the optional condition element as a general restriction mechanism for binding expressions.

4.2.7.1 Conditions

A condition element contains a single child that represents a truth condition. Compound conditions are indicated by applying operators such as and in the condition. Consider for instance the following representation of a definite integral.

Name	values	default
cdbase	URI	inherited

4.2.7.2 Examples

<bind>
  <int/>
  <bvar><ci>x</ci></bvar>
  <condition>
    <apply><csymbol cd="sets">in</csymbol>
      <apply><interval/><cn>0</cn><infty/></apply>
    </apply>
  </condition>
  <apply><sin/><ci>x</ci></apply>
</bind>

Here the condition element restricts the bound variables to range over the non-negative integers. A number of common mathematical constructions involve such restrictions, either implicit in conventional notation, such as a bound variable, or thought of as part of the operator rather than an argument, as is the case with the limits of a definite integral.

A typical use of the condition qualifier is to define sets by rule, rather than enumeration. The following markup, for instance, encodes the set {x | x < 1}:

<bind><set/>
  <bvar><ci>x</ci></bvar>
  <condition><apply><lt/><ci>x</ci><cn>1</cn></apply></condition>
  <ci>x</ci>
</bind>

In the context of quantifier operators, this corresponds to the "such that" construct used in mathematical expressions. The next example encodes "for all x in N there exist prime numbers p, q such that p+q = 2x".

<bind><csymbol cd="algebra-logic">forall</csymbol>
  <bvar><ci>x</ci></bvar>
  <condition>
    <apply><csymbol cd="sets">in</csymbol>
      <ci>x</ci>
      <csymbol cd="contstants">naturalnumbers</csymbol>
    </apply>
  </condition>
  <bind><csymbol cd="algebra-logic">exists</csymbol>
     <bvar><ci>p</ci></bvar>
     <bvar><ci>q</ci></bvar>
     <condition>
       <apply><csymbol cd="algebra-logic">and</csymbol>
         <apply><csymbol cd="sets">in</csymbol><ci>p</ci><primes/></apply>
         <apply><csymbol cd="sets">in</csymbol><ci>q</ci><primes/></apply>
       </apply>
     </condition>
     <apply><csymbol cd="relations">eq</csymbol>
        <apply><csymbol cd="algebra-logic">plus</csymbol><ci>p</ci><ci>q</ci></apply>
        <apply><csymbol cd="algebra-logic">times</csymbol><cn>2</cn><ci>x</ci></apply>
     </apply>
   </bind>
</bind>

This use extends to multivariate domains by using extra bound variables and a domain corresponding to a cartesian product as in

<bind><intexp/>
  <bvar><ci>x</ci></bvar>
  <bvar><ci>y</ci></bvar>
  <condition>
    <apply><csymbol cd="algebra-logic">and</csymbol>
      <apply><csymbol cd="relations">leq</csymbol><cn>0</cn><ci>x</ci></apply>
      <apply><csymbol cd="relations">leq</csymbol><ci>x</ci><cn>1</cn></apply>
      <apply><csymbol cd="relations">leq</csymbol><cn>0</cn><ci>y</ci></apply>
      <apply><csymbol cd="relations">leq</csymbol><ci>y</ci><cn>1</cn></apply>
    </apply>
  </condition>
  <apply>
    <csymbol cd="algebra-logic">times</csymbol>
    <apply><csymbol cd="algebra-logic">power</csymbol><ci>x</ci><cn>2</cn></apply>
    <apply><csymbol cd="algebra-logic">power</csymbol><ci>y</ci><cn>3</cn></apply>
  </apply>
</bind>

4.2.8 Structure Sharing

To conserve space, MathML3 expression trees can make use of structure sharing

4.2.8.1 The `share` element

This element has an href attribute whose value is the value of a URI referencing an xml:id attribute of a MathML expression tree. When building the MathML expression tree, the share element is replaced by a copy of the MathML expression tree referenced by the href attribute. Note that this copy is structurally equal, but not identical to the element referenced. The values of the share will often be relative URI references, in which case they are resolved using the base URI of the document containing the share element.

Name	values	default
href	URI

For instance, the mathematical object f(f(f(a,a),f(a,a)),f(a,a),f(a,a)) can be encoded as either one of the following representations (and some intermediate versions as well).

<math>         <math>
  <apply>                         <apply>
    <ci>f</ci>                      <ci>f</ci> 
    <apply>                         <apply xml:id="t1">
      <ci>f</ci>                      <ci>f</ci>
      <apply>                         <apply xml:id="t11">
        <ci>f</ci>                      <ci>f</ci>
        <ci>a</ci>                      <ci>a</ci>
        <ci>a</ci>                      <ci>a</ci>
      </apply>                        </apply>
      <apply>                         <share href="#t11"/>
        <ci>f</ci>
        <ci>a</ci> 
        <ci>a</ci>
      </apply>                                
    </apply>                      </apply>
    <apply>                       <share href="#t1"/>
      <ci>f</ci>
      <apply>
        <ci>f</ci>
        <ci>a</ci>
        <ci>a</ci>
      </apply>
      <apply>
        <ci>f</ci>
        <ci>a</ci>
        <ci>a</ci>
      </apply>
    </apply>
  </apply>
</math>                     </math>

4.2.8.2 An Acyclicity Constraint

We say that an element dominates all its children and all elements they dominate. An share element dominates its target, i.e. the element that carries the xml:id attribute pointed to by the href attribute. For instance in the representation above the apply element with xml:id="t1" and also the second share dominate the apply element with xml:id="t11".

The occurrences of the share element must obey the following global acyclicity constraint: An element may not dominate itself. For instance the following representation violates this constraint:

<apply xml:id="foo">
    <csymbol cd="algebra-logic">plus</csymbol>
    <cn>1</cn>
    <apply>
       <csymbol cd="algebra-logic">plus</csymbol>
       <cn>1</cn>
       <share href="foo"/>
    </apply> 
  </apply>

Here, the apply element with xml:id="foo" dominates its third child, which dominates the share element, which dominates its target: the element with xml:id="foo". So by transitivity, this element dominates itself, and by the acyclicity constraint, it is not an MathML expression tree. Even though it could be given the interpretation of the continued fraction $\frac{1}{1 + \frac{1}{1 + \frac{1}{1 + \ldots}}}$ this would correspond to an infinite tree of applications, which is not admitted by Content MathML

Note that the acyclicity constraints is not restricted to such simple cases, as the following example shows:

<apply xml:id="bar">              <apply xml:id="baz">
    <csymbol cd="algebra-logic">plus</csymbol>  <csymbol cd="algebra-logic">plus</csymbol>
    <cn>1</cn>                      <cn>1</cn>
    <share href="baz"/>             <share href="bar"/>
  </apply>                        </apply>

Here, the apply with xml:id="bar" dominates its third child, the share with href="baz", which dominates its target apply with xml:id="baz", which in turn dominates its third child, the share with href="bar", this finally dominates its target, the original apply element with xml:id="bar". So this pair of representations violates the acyclicity constraint.

4.2.8.3 Structure Sharing and Binding

Note that the share element is a syntactic referencing mechanism: an share element stands for the exact element it points to. In particular, referencing does not interact with binding in a semantically intuitive way, since it allows for variable capture. Consider for instance

<bind xml:id="outer">
  <lambda/>
  <bvar><ci>x</ci></bvar>
  <apply>
    <ci>f</ci>
    <bind xml:id="inner">
      <lambda/>
      <bvar><ci>x</ci></bvar>
      <share xml:id="copy" href="#orig"/>
    </bind>
    <apply xml:id="orig"><ci>g</ci><ci>X</ci></apply>
  </apply>
</bind>

it represents the term $\lambda{x}.f(\lambda{x}.g(x),g(x))$ which has two sub-terms of the form g(x) , one with xml:id="orig" (the one explicitly represented) and one with xml:id="copy", represented by the share element. In the original, the variable x is bound by the outer bind element, and in the copy, the variable x is bound by the inner bind element. We say that the inner bind has captured the variable X.

It is well-known that variable capture does not conserve semantics. For instance, we could use α-conversion to rename the inner occurrence of x into, say, y arriving at the (same) object $\lambda{x}.f(\lambda{y}.g(y),g(x))$ Using references that capture variables in this way can easily lead to representation errors, and is not recommended.

4.2.8.4 Structure Sharing and `cdbase`

Editorial note: MiKo
say something about `cdbase` here.

4.2.9 Attribution via `semantics`

Content elements can be adorned with additional information via the semantics element, see Section 5.3 Semantic Annotations beyond Alternate Representations for details. As such, the semantics element should be considered part of both presentation MathML and content MathML. MathML3 considers a semantics element (strict) content MathML, if and only if its first child is (strict) content MathML. All MathML processors should process the semantics element, even if they only process one of those subsets.

Editorial note: MiKo
Give an elaborated example from the types note here (or in the primer?), reference Section 8.4 Type Declarations

4.2.10 In Situ Error Markup

A content error expression is made up of a symbol and a sequence of zero or more MathML expression trees. This object has no direct mathematical meaning. Errors occur as the result of some treatment on an expression tree and are thus of real interest only when some sort of communication is taking place. Errors may occur inside other objects and also inside other errors.

Name	values	default
cdbase	URI	inherited

To encode an error caused by a division by zero, we would employ a aritherror Content Dictionary with a DivisionByZero symbol with role error we would use the following expression tree:

<cerror>
  <csymbol cd="aritherror">DivisionByZero</csymbol>  
  <apply><divide/><ci>x</ci><cn>0</cn></apply>
</cerror>

Note that the error should cover the smallest erroneous subexpression so cerror can be a subexpression of a bigger one, e.g.

<apply><csymbol cd="relations">eq</csymbol>
  <cerror>
    <csymbol cd="aritherror">DivisionByZero</csymbol>  
    <apply><divide/><ci>x</ci><cn>0</cn></apply>
  </cerror>
  <cn>0</cn>
</apply>

If an application wishes to signal that the content MathML expressions it has received is invalid or is not well-formed then the offending data must be encoded as a string. For example:

<cerror> 
  <csymbol cd="parser">invalid_XML</csymbol>
  <mtext> &#x3C;<!--LESS-THAN SIGN-->apply&#x3E;<!--GREATER-THAN SIGN-->&#x3C;<!--LESS-THAN SIGN-->cos&#x3E;<!--GREATER-THAN SIGN--> &#x3C;<!--LESS-THAN SIGN-->ci&#x3E;<!--GREATER-THAN SIGN-->v&#x3C;<!--LESS-THAN SIGN-->/ci&#x3E;<!--GREATER-THAN SIGN--> &#x3C;<!--LESS-THAN SIGN-->/apply&#x3E;<!--GREATER-THAN SIGN--> </mtext>
</cerror>

Note that the < and > characters have been escaped as is usual in an XML document.

4.3 Pragmatic Content MathML

MathML3 content markup differs from earlier versions of MathML in that it has been regularized and based on the content dictionary model introduced by OpenMath [OpenMath2004].

MathML3 also supports MathML2 markup as a pragmatic representation that is easier to read and more intuitive for humans. We will discuss this representation in the following and indicate the equivalent strict representations. Thus the "pragmatic content MathML" representations inherit the meaning from their strict counterparts.

4.3.1 Numbers with "constant" type

The cn element can be used with the value "constant" for the type attribute and the Unicode symbols for the content. This use of the cn is deprecated in favor of the number constants exponentiale, imaginaryi, true, false, notanumber, pi, eulergamma, and infinity in the content dictionary constants CD, or the use of csymbol with an appropriate value for the definitionURL attribute. For example, instead of using the pi element, an instance of <cn type="constant">π</cn> could be used.

4.3.2 `csymbol` Elements with Presentation MathML

Strict equivalent for `csymbol` with pMathML content
Issue csymbol_pmathml_strict	`wiki (member only)`
What is the strict equivalent for the case of a `csymbol` with pMathML content, we do not have a good way of determining that either from the pMathML (we could take the element content stripped of elements; I am assuming this in the example below for now) or from the `definitionURL`. But as David convinced me, this does not work, so we still need to discuss this. We also need to keep the use of symbol names as fragment identifiers in mind.
Resolution	None recorded

In pragmatic MathML3 the csymbol element can contain presentation MathML instead of the symbol name. For example,

<csymbol definitionURL="http://www.example.com/ContDiffFuncs.htm">
  <msup><mi>C</mi><mn>2</mn></msup>
</csymbol>

encodes an atomic symbol that displays visually as C² and that, for purposes of content, is treated as a single symbol representing the space of twice-differentiable continuous functions. This pragmatic representation is equivalent to

<semantics>
  <csymbol definitionURL="http://www.example.com/ContDiffFuncs.htm">C2</csymbol>
  <annotation-xml encoding="MathMLP">
    <msup><mi>C</mi><mn>2</mn></msup>
  </annotation-xml>
</semantics>

Both can be used interchangeably.

4.3.3 Symbols and Identifiers With Presentation MathML

In Pragmatic Content MathML, the ci and csymbol elements can contain a general presentation construct (see Section 3.1.6 Summary of Presentation Elements), which is used for rendering (see Section 8.6 Rendering of Content Elements). In this case, the definitionURL attribute can be used to associate a name with with a ci element, which identifies it. See the discussion of bound variables (Section 4.2.6 Bindings and Bound Variables) for a discussion of an important instance of this. For example,

<ci definitionURL="c1"><msub><mi>c</mi><mn>1</mn></msub></ci>

encodes an atomic symbol that displays visually as c₁ which, for purposes of content, is treated as a atomic concept representing a real number.

Instances of the bound variables are normally recognized by comparing the XML information sets of the relevant ci elements after first carrying out XML space normalization. Such identification can be made explicit by placing an xml:id on the ci element in the bvar element and referring to it using the definitionURL attribute on all other instances. An example of this approach is This xml:id based approach is especially helpful when constructions involving bound variables are nested.

It can be necessary to associate additional information with a bound variable one or more instances of it. The information might be something like a detailed mathematical type, an alternative presentation or encoding or a domain of application. Such associations are accomplished in the standard way by replacing a ci element (even inside the bvar element) by a semantics element containing both it and the additional information. Recognition of and instance of the bound variable is still based on the actual ci elements and not the semantics elements or anything else they may contain. The xml:id based approach outlined above may still be used.

A ci element with Presentation MathML content is equivalent to a semantics construction where the first child is a ci whose content is the value of the definitionURL attribute and whose second child is an annotation-xml element with the MathML Presentation. For example the Strict Content MathML equivalent to the example above would be

<semantics>
  <ci>c1</ci>
  <annotation-xml encoding="PMathML">
    <msub><mi>c</mi><mn>1</mn></msub>
  </annotation-xml>
</semantics>

4.3.4 Elementary MathML Types on Tokens

The ci element uses the type attribute to specify the basic type of object that it represents. While any CDATA string is a valid type, the predefined types include "integer", "rational", "real", "complex", "complex-polar", "complex-cartesian", "constant", "function" and more generally, any of the names of the MathML container elements (e.g. vector) or their type values. For a more advanced treatment of types, the type attribute is inappropriate. Advanced types require significant structure of their own (for example, vector(complex)) and are probably best constructed as mathematical objects and then associated with a MathML expression through use of the semantics element.

Editorial note: MiKo
Give the Strict equivalent here by techniques from the Types Note

4.3.5 Token Elements

For convenience and backwards compatibility MathML3 provides empty token elements for the operators and functions of the K-14 fragment of mathematics. The general rule is that for any symbol defined in the MathML3 content dictionaries (see Chapter 8 MathML3 Content Dictionaries), there is an empty content element with the same name. For instance, the empty MathML element

<plus/>

is equivalent to the element

<csymbol cdbase="http://w3.org/Math/CD" cd="algebra-logic" name="plus"/>

both can be used interchangeably.

In MathML2, the definitionURL attribute could be used to modify the meaning of an element to allow essentially the same notation to be re-used for a discussion taking place in a different mathematic domain. This use of the attribute is deprecated in MathML3, in favor of using a csymbol with the same definitionURL attribute.

4.3.6 Tokens with Attributes

In MathML2, the meaning of various token elements could be specialized via various attributes, usually the type attribute. Strict Content MathML does not have this possibility, therefore these attributes are either passed to the symbols as extra arguments in the apply or bind elements, or MathML3 adds new symbols for the non-default case to the respective content dictionaries.

We will summarize the cases in the following table:

pragmatic Content MathML	strict Content MathML
<diff type="function"/>	<csymbol cd="calculus_veccalc">diff</csymbol>
<diff type="algebraic"/>	<csymbol cd="calculus_veccalc">aDiff</csymbol>

Editorial note: MiKo
systematically consider all the cases here

4.3.7 Container Markup

To retain compatibility with MathML2, MathML3 provides an alternative representation for applications of constructor elements. For instance for the set element, the following two representations are considered equivalent

<set><ci>a</ci><ci>b</ci><ci>c</ci></set>

<apply><set/><ci>a</ci><ci>b</ci><ci>c</ci></apply>

and following the discussion in section Section 4.2.4 Symbols and Identifiers they are equivalent to

<apply><csymbol cd="sets">set</csymbol><ci>a</ci><ci>b</ci><ci>c</ci></apply>

Other constructors are interval, list, matrix, matrixrow, vector, apply, lambda, piecewise, piece, otherwise

4.3.8 Domain of Application in Applications

The domainofapplication element was used in MathML2 an apply element which denotes the domain over which a given function is being applied. In contrast to its use as a qualifier in the bind element, the usage in the apply element only marks the argument position for the range argument of the definite integral.

MathML3 supports this representation as a pragmatic form. For instance, the integral of a function f over an arbitrary domain C can be represented as

<apply><int/>
  <domainofapplication><ci>C</ci></domainofapplication>
  <ci>f</ci>
</apply>

in the Pragmatic Content MathML representation, it is considered equivalent to

<apply><int/><ci>C</ci><ci>f</ci></apply>

Editorial note: MiKo
be careful with `Int` and `int` here

4.3.9 Domain of Application in Bindings

The domainofapplication was intended to be an alternative to specification of range of bound variables for condition. Generally, a domain of application D can be specified by a condition element requesting that the bound variable is a member of D. For instance, we consider the Pragmatic Content MathML representation

<apply><int/>
  <bvar><ci>x</ci></bvar>
  <domainofapplication><ci type="set">D</ci></domainofapplication>
  <apply><ci type="function">f</ci><ci>x</ci></apply>
</apply>

as equivalent to the Strict Content MathML representation

<bind><intexp/>
  <bvar><ci>x</ci></bvar>
  <condition><apply><in/><ci>x</ci><ci type="set">D</ci></apply></condition>
  <apply><ci type="function">f</ci><ci>x</ci></apply>
</bind>

4.3.10 Integrals with Calling patterns

MathML2 used the int element for the definite or indefinite integral of a function or algebraic expression on some sort of domain of application. There are several forms of calling sequences depending on the nature of the arguments, and whether or not it is a definite integral. Those forms using interval, condition, lowlimit, or uplimit, provide convenient shorthand notations for an appropriate domainofapplication.

Editorial note: Miko
the following must be reworked

MathML separates the functionality of the int element into three different symbols: int, defint, and defintset. The first two are integral operators that can be applied to functions and the latter is binding operators for integrating an algebraic expression with respect to a bound variable.

The following two indefinite function integrals are equivalent.

<apply><int/><sin/></apply>

<apply><intfun/><sin/></apply>

The following two definite function integrals are equivalent (see also Section 4.3.8 Domain of Application in Applications).

<apply><int/>
 <domainofapplication><ci type="set">D</ci></domainofapplication>
 <sin/>
</apply>

<apply><defintfun/><ci type="set">D</ci><sin/></apply>

The following two indefinite integrals over algebraic expressions are equivalent.

<apply><bvar><ci>x</ci></bvar><int/><apply><sin/><ci>x</ci></apply></apply>

<bind><bvar><ci>x</ci></bvar><intexp/><apply><sin/><ci>x</ci></apply></bind>

The following two definite function integrals are equivalent.

<apply><int/>
 <bvar><ci>x</ci></bvar>
 <domainofapplication><ci type="set">D</ci></domainofapplication>
 <apply><sin/><ci>x</ci></apply>
</apply>

<bind><intexp/>
 <bvar><ci>x</ci></bvar>
 <domainofapplication><ci type="set">D</ci></domainofapplication>
 <apply><sin/><ci>x</ci></apply>
</bind>

4.3.11 degree

The degree element is a qualifier used by some MathML containers to specify that, for example, a bound variable is repeated several times.

Editorial note: MiKo
specify a complete list of containers that allow `degree` elements, so far I see `diff`, `partialdiff`, `root`

The degree element is the container element for the "degree" or "order" of an operation. There are a number of basic mathematical constructs that come in families, such as derivatives and moments. Rather than introduce special elements for each of these families, MathML uses a single general construct, the degree element for this concept of "order".

<bind><diff/>
  <bvar><ci>x</ci><degree><cn>2</cn></degree></bvar>
  <apply><power/><ci>x</ci><cn>5</cn></apply>
</bind>

<bind>
  <partialdiff/>
  <bvar>
    <ci>x</ci>
    <degree><ci> n </ci></degree>
  </bvar>
  <bvar>
    <ci>y</ci>
    <degree><ci>m</ci></degree>
  </bvar>
  <apply><sin/>
    <apply><times/><ci>x</ci><ci>y</ci></apply>
  </apply>
</bind>

A variable that is to be bound is placed in this container. In a derivative, it indicates which variable with respect to which a function is being differentiated. When the bvar element is used to qualify a derivative, the bvar element may contain a child degree element that specifies the order of the derivative with respect to that variable.

<apply>
  <diff/>
  <bvar>
    <ci>x</ci>
    <degree><cn>2</cn></degree>
  </bvar>
  <apply><power/><ci>x</ci><cn>4</cn></apply>
</apply>

it is equivalent to

<bind>
  <apply><diff/><cn>2</cn></apply>
  <bvar><ci>x</ci></bvar>
  <apply><power/><ci>x</ci><cn>4</cn></apply>
</bind>

Editorial note: MiKo
what do we want to use for degree?

Note that the degree element is only allowed in the container representation. The strict representation takes the degree as a regular argument as the second child of the apply or bind element.

Editorial note: MiKo
Make sure that all `MMLdefinition`s of degree-carrying symbols get a paragraph like the one for `root`.

The default rendering of the degree element and its contents depends on the context. In the example above, the degree elements would be rendered as the exponents in the differentiation symbols:

$\frac{\partial^{n+m}}{\partial x^n \partial y^m} \sin(xy)$

4.3.12 Upper and Lower Limits

The uplimit and lowlimit elements are Pragmatic Content MathML qualifiers that can be used to restrict the range of a bound variable to an interval, e.g. in some integrals and sums. uplimit/lowlimit pairs can be expressed via the interval element from the CD Basic Content Elements. For instance, we consider the Pragmatic Content MathML representation

<apply><int/>
  <bvar><ci> x </ci></bvar>
  <lowlimit><ci>a</ci></lowlimit>
  <uplimit><ci>b</ci></uplimit>
  <apply><ci type="function">f</ci><ci>x</ci></apply>
</apply>

as equivalent to the following strict representation

<bind><int/>
  <bvar><ci>x</ci></bvar>
  <condition>
    <apply><in/><ci>x</ci><apply><interval/><ci>a</ci><ci>b</ci></apply></apply>
  </condition>
  <lowlimit><ci>a</ci></lowlimit>
  <uplimit><ci>b</ci></uplimit>
  <apply><ci type="function">f</ci><ci>x</ci></apply>
</bind>

If the lowlimit qualifier is missing, it is interpreted as negative infinity, similarly, if uplimit is then it is interpreted as positive infinity.

4.3.13 Lifted Associative Commutative Operators

New Symbols for Lifted Operators
Issue lifted_operators	`wiki (member only)` ISSUE-8 (member only)
MathML2 allowed the use of n-ary operators as binding operators with bound variables induced by them. For instance `union` could be used as the equivalent for the TeX `\cup` as well as `\bigcup`. While the relation between the nary and the set-based operators is deterministic, i.e. the induced big operators are fully determined by them, the concepts are quite different in nature (different notational conventions, different types, different occurrence schemata. I therefore propose to extend the MathML K-14 CDs with symbols big operators, much like we already have `sum` as the big operator for for the n-ary `plus` symbol, and `prod` for `times`. For the new symbols, I propose the naming convention of capitalizing the big operators (as an alternative, we could follow TeX and pre-pend a `bib`). For example we could have `Union` as a big operator for `union`
Resolution	None recorded

MathML2 allowed to use a associative operators to be "lifted" to "big operators", for instance the n-ary union operator to the union operator over sets, as the union of the U-complements over a family F of sets in this construction

<apply>
  <union/>
  <bvar><ci>S</ci></bvar>
  <condition>
    <apply><in/><ci>S</ci><ci>F</ci></apply>
  </condition>
  <apply><setdiff/><ci>U</ci><ci>S</ci></apply>
</apply>

While the relation between the nary and the set-based operators is deterministic, i.e. the induced big operators are fully determined by them, the concepts are quite different in nature (different notational conventions, different types, different occurrence schemata). Therefore the MathML3 content dictionaries provides explicit symbols for the "big operators", much like MathML2 did with sum as the big operator for for the n-ary plus symbol, and prod for times. Concretely, these are big_union, big_intersect, big_max, big_min, big_gcd, big_lcm, big_or, big_and, and big_xor. With these, we can express all Pragmatic Content MathML expressions. For instance, the union above can be represented strictly as

<bind><Union/>
  <bvar><ci>S</ci></bvar>
  <condition>
    <apply><in/><ci>S</ci><ci>F</ci></apply>
  </condition>
  <apply><setdiff/><ci>U</ci><ci>S</ci></apply>
</bind>

For the exact meaning of the new symbols, consult the content dictionaries.

Large Operators
Issue large_ops	`wiki (member only)` ISSUE-18 (member only)
The large operators can be solved in two ways, in the way described here, by inventing large operators (and David does not like symbol names distinguished only by case; and I agree tend to agree with him). Or by extending the role of roles to allow duplicate roles per symbol, then we could re-use the symbols like we did in MathML2, but then we would have to extend OpenMath for that
Resolution	None recorded

4.3.14 Declare (`declare`)

Editorial note: MiKo
This should maybe be moved into a general section about changes or deprecated elements. Also Stan thinks the text should be improved.

MathML2 provided the declare element that allowed to bind properties like types to symbols and variables and to define abbreviations for structure sharing. This element is deprecated in MathML3. Structure sharing can obtained via the share element (see Section 4.2.8 Structure Sharing for details).

4.4 The MathML3 Content Dictionaries and Operators

We will now give an overview over the MathML3 symbols: they are grouped into content dictionaries that broadly reflect the area of mathematics from which they come.

Editorial note: MiKo
The list will eventually be generated from the MathML3 Content Dictionaries, it is currently only very vaguely in sync with them