Content Markup Validation Grammar

This presents an informal EBNF grammar that can be used to validate the structure of Content Markup.

It defines the valid expression trees in content markup. It does not define the rules for attribute validation. That must be done separately.
The non-terminal Presentation_tags is a placeholder for a valid presentation element start tag or end tag.
The string #PCDATA denotes XML parsed character data.
Symbols beginning with '_' (for example _mmlarg) are internal symbols. A recursive grammar is usually required for their recognition.
Symbols which are all in lowercase symbols (for example 'ci') are terminal symbols representing MathML content elements.
Symbols beginning with Uppercase letters are terminals representating other tokens.

whitespace definitions including Presentation_tags

[1]	`Presentation_tags`	::=	`"presentation"`	/ placeholder /
[2]	`Space`	::=	`#x09 \| #x0A \| #x0D \| #x20`	/ tab, lf, cr, space characters /
[3]	`S`	::=	`(Space \| Presentation_tags)*`	/ treat presentation as space /

Characters, only for content validation characters

[4] Char ::= Space | [#x21 - #xFFFD] | [#x00010000 - #x7FFFFFFFF] /* valid XML chars */

start(\%x) returns a valid start tag for the element \%x
end(\%x) returns a valid end tag for the element \%x
empty(\%x) returns a valid empty tag for the element \%x

 start(ci)    ::= "<ci>"
 end(cn)      ::= "</cn>"
 empty(plus)  ::= "<plus/>"

The reason for doing this is to avoid writing a grammar for all the attributes. The model below is not complete for all possible attribute values.

start and end tag functions

[5]	`_start(\%x)`	::=	`"<\%x" (Char - '>')* ">"`	/ returns a valid start tag for the element \%x /
[6]	`_end(\%x)`	::=	`"<\%x" Space* ">"`	/ returns a valid end tag for the element \%x /
[7]	`_empty(\%x)`	::=	`"<\%x" (Char - '>')* "/>"`	/ returns a valid empty tag for the element \%x /
[8]	`_sg(\%x)`	::=	`S _start(\%x)`	/ start tag preceded by optional whitespace /
[9]	`_eg(\%x)`	::=	`_end(\%x) S`	/ end tag followed by optional whitespace /
[10]	`_ey(\%x)`	::=	`S _empty(\%x) S`	/ empty tag preceded and followed by optional whitespace /

mathml content constructs

[11]	`_mmlall`	::=	`_container \| _relation \| _operator \| _qualifier \| _other`
[12]	`_mmlarg`	::=	`_container`
[13]	`_container`	::=	`_token \| _special \| _constructor`
[14]	`_token`	::=	`ci \| cn \| csymbol \| _constantsym`
[15]	`_special`	::=	`apply \| lambda \| reln \| fn`
[16]	`_constructor`	::=	`interval \| list \| matrix \| matrixrow \| set \| vector \| piecewise \| piece \| otherwise`
[17]	`_other`	::=	`condition \| declare \| sep`
[18]	`_qualifier`	::=	`lowlimit \| uplimit \| bvar \| degree \| logbase \| domainofapplication \| momentabout`
[19]	`_constantsym`	::=	`integers \| rationals \| reals \| naturalnumbers \| complexes \| primes \| exponentiale \| imaginaryi \| notanumber \| true \| false \| pi \| eulergamma \| infinity`

relations

[20]	`_relation`	::=	`_genrel \| _setrel \| _seqrel2ary`
[21]	`_genrel`	::=	`_genrel2ary \| _genrelnary`
[22]	`_genrel2ary`	::=	`ne`
[23]	`_genrelnary`	::=	`eq \| leq \| lt \| geq \| gt`
[24]	`_setrel`	::=	`_seqrel2ary \| _setrelnary`
[25]	`_setrel2ary`	::=	`in \| notin \| notsubset \| notprsubset`
[26]	`_setrelnary`	::=	`subset \| prsubset`
[27]	`_seqrel2ary`	::=	`tendsto`

operators

functional operators

[29]	`_funcop`	::=	`_funcop1ary \| _funcopnary`
[30]	`_funcop1ary`	::=	`inverse \| ident \| domain \| codomain \| image`
[31]	`_funcopnary`	::=	`fn\| compose`	/ general user-defined function is n-ary /

(note minus is both 1ary and 2ary)

arithmetic operators

[32]	`_arithop`	::=	`_arithop1ary \| _arithop2ary \| _arithopnary \| root`
[33]	`_arithop1ary`	::=	`abs \| conjugate \| factorial \| minus \| arg \| real \| imaginary \| floor \| ceiling`
[34]	`_arithop2ary`	::=	`quotient \| divide \| minus \| power \| rem`
[35]	`_arithopnary`	::=	`plus \| times \| max \| min \| gcd \| lcm`

calculus and vector calculus

[36]	`_calcop`	::=	`int \| diff \| partialdiff`
[37]	`_vcalcop`	::=	`divergence \| grad \| curl \| laplacian`

sequences and series

[38] _seqop ::= sum | product | limit

elementary classical functions and trigonometry

[39]	`_classop`	::=	`exp \| ln \| log`
[40]	`_trigop`	::=	`sin \| cos \| tan \| sec \| csc \| cot \| sinh \| cosh \| tanh \| sech \| csch \| coth \| arcsin \| arccos \| arctan`

statistics operators

[41]	`_statop`	::=	`_statopnary \| moment`
[42]	`_statopnary`	::=	`mean \| sdev \| variance \| median \| mode`

linear algebra operators

[43]	`_lalgop`	::=	`_lalgop1ary \|_lalgop2ary \| _lalgopnary`
[44]	`_lalgop1ary`	::=	`determinant \| transpose`
[45]	`_lalgop2ary`	::=	`vectorproduct \| scalarproduct \| outerproduct`
[46]	`_lalgopnary`	::=	`selector`

logical operators

[47]	`_logicop`	::=	`_logicop1ary \| _logicopnary \| _logicop2ary \| _logicopquant`
[48]	`_logicop1ary`	::=	`not`
[49]	`_logicop2ary`	::=	`implies \| equivalent \| approx \| factorof`
[50]	`_logicopnary`	::=	`and \| or \| xor`
[51]	`_logicopquant`	::=	`forall \| exists`

set theoretic operators

[52]	`_setop`	::=	`_setop1ary \|_setop2ary \| _setopnary`
[53]	`_setop1ary`	::=	`card`
[54]	`_setop2ary`	::=	`setdiff`
[55]	`_setopnary`	::=	`union \| intersect \| cartesianproduct`

operator groups

[56]	`_unaryop`	::=	`_funcop1ary \| _arithop1ary \| _trigop \| _classop \| _calcop \| _vcalcop \| _logicop1ary \| _lalgop1ary \| _setop1ary`
[57]	`_binaryop`	::=	`_arithop2ary \| _setop2ary \| _logicop2ary \| _lalgop2ary`
[58]	`_naryop`	::=	`_arithopnary \| _statopnary \| _logicopnary \| _lalgopnary \| _setopnary \| _funcopnary`
[59]	`_ispop`	::=	`int \| sum \| product`
[60]	`_diffop`	::=	`diff \| partialdiff`
[61]	`_binaryrel`	::=	`_genrel2ary \| _setrel2ary \| _seqrel2ary`
[62]	`_naryrel`	::=	`_genrelnary \| _setrelnary`

separator

[63] sep ::= _ey(sep)

leaf tokens and data content of leaf elements

[64]	`_mdatai`	::=	`(#PCDATA \| Presentation_tags)*`	/ note _mdata includes Presentation constructs here. /
[65]	`_mdatan`	::=	`(#PCDATA \| sep \| Presentation_tags)*`	/ note _mdata includes Presentation constructs here. /
[66]	`ci`	::=	`_sg(ci) _mdatai _eg(ci)`
[67]	`cn`	::=	`_sg(cn) _mdatan _eg(cn)`
[68]	`csymbol`	::=	`_sg(csymbol) _mdatai _eg(csymbol)`

condition - constraints. constraints contains either a single reln (relation), or an apply holding a logical combination of relations, or a set (over which the operator should be applied).

condition

[69] condition ::= _sg(condition) reln | apply | set _eg(condition)

domains for integral, sum , product

[70] _ispdomain ::= (lowlimit uplimit?) | uplimit | interval | condition

Note that apply is used in place of the deprecated reln in MathML2.0 for relational operators as well as arithmetic, algebraic etc.

apply construct

[71]	`apply`	::=	`_sg(apply) _applybody \| _relnbody _eg(apply)`
[72]	`_applybody`	::=	`( _unaryop _mmlarg )`	/ 1-ary ops /
			`\| (_binaryop _mmlarg _mmlarg)`	/ 2-ary ops /
			`\| (_naryop _mmlarg*)`	/ n-ary ops, enumerated arguments /
			`\| (_naryop bvar* condition _mmlarg)`	/ n-ary ops, condition defines argument list /
			`\| (_ispop bvar? _ispdomain? _mmlarg)`	/ integral, sum, product /
			`\| (_ispop domainofapplication? _mmlarg)`	/ integral, sum, product /
			`\| (_diffop bvar* _mmlarg)`	/ differential ops /
			`\| (log logbase? _mmlarg)`	/ logs /
			`\| (moment degree? momentabout? _mmlarg*)`	/ statistical moment /
			`\| (root degree? _mmlarg)`	/ radicals - default is square-root /
			`\| (limit bvar* lowlimit? condition? _mmlarg)`	/ limits /
			`\| (_logicopquant bvar+ condition? (reln \| apply))`	/ quantifier with explicit bound variables /

equations and relations - reln uses lisp-like syntax (like apply) the bvar and condition are used to construct a "such that" or "where" constraint on the relation . Note that reln is deprecated but still valid in MathML2.0

equations and relations

[73]	`reln`	::=	`_sg(reln) _relnbody _eg(reln)`
[74]	`_relnbody`	::=	`( _binaryrel bvar* condition? _mmlarg _mmlarg ) \| ( _naryrel bvar* condition? _mmlarg* )`

fn construct Note that fn is deprecated but still valid in MathML2.0

[75]	`fn`	::=	`_sg(fn) _fnbody _eg(fn)`
[76]	`_fnbody`	::=	`Presentation_tags \| _container`

lambda construct - note at least 1 bvar must be present

[77]	`lambda`	::=	`_sg(lambda) _lambdabody _eg(lambda)`
[78]	`_lambdabody`	::=	`bvar+ _container`	/ multivariate lambda calculus /

declare construct

[79]	`declare`	::=	`_sg(declare) _declarebody _eg(declare)`
[80]	`_declarebody`	::=	`ci (fn \| constructor)?`

constructors

[81]	`interval`	::=	`_sg(interval) _mmlarg _mmlarg _eg(interval)`	/ start, end define interval /
[82]	`set`	::=	`_sg(set) _lsbody _eg(set)`
[83]	`list`	::=	`_sg(list) _lsbody _eg(list)`
[84]	`_lsbody`	::=	`_mmlarg*`	/ condition constructs arguments /
			`\| (bvar* condition _mmlarg)`	/ enumerated arguments /
[85]	`matrix`	::=	`_sg(matrix) matrixrow* _eg(matrix)`
[86]	`matrixrow`	::=	`_sg(matrixrow) _mmlall* _eg(matrixrow)`	/ allows matrix of operators /
[87]	`vector`	::=	`_sg(vector) _mmlarg* _eg(vector)`
[88]	`piecewise`	::=	`_sg(piecewise) piece+ otherwise? _eg(piecewise)`
[89]	`piece`	::=	`_sg(piece) _mmlall _mmlall _eg(piece)`	/ allows piecewise construct of operators /
[90]	`otherwise`	::=	`_sg(otherwise) _mmlall _eg(otherwise)`	/ allows piecewise construct of operators /

qualifiers - note the contained _mmlarg could be a reln

[91]	`lowlimit`	::=	`_sg(lowlimit) _mmlarg _eg(lowlimit)`
[92]	`uplimit`	::=	`_sg(uplimit) _mmlarg _eg(uplimit)`
[93]	`bvar`	::=	`_sg(bvar) ci degree? _eg(bvar)`
[94]	`degree`	::=	`_sg(degree) _mmlarg _eg(degree)`
[95]	`logbase`	::=	`_sg(logbase) _mmlarg _eg(logbase)`
[96]	`domainofapplication`	::=	`_sg(domainofapplication) _mmlarg _eg(domainofapplication)`
[97]	`momentabout`	::=	`_sg(momentabout) _mmlarg _eg(momentabout)`

relations and operators and constant symbols (one declaration for each operator and relation element)

[98]	`_relation`	::=	`_ey(\%relation2)`	/ for example <eq/> <lt/> /
[99]	`_operator`	::=	`_ey(\%operator)`	/ for example <exp/> <times/> /
[100]	`_const-symbol`	::=	`_ey(\%const-symbol)`	/ for example <integers/> <false/> /

The top level math element. Allow declare only at the head of a math element.

math

[101] math ::= _sg(math) declare* _mmlall* _eg(math)

B Content Markup Validation Grammar