As part of our work on TAG issue XMLVersioning-41, I took an action to review the mechanisms used by XHTML Modularization. In particular, we were interested in exploring the potential for using substitution groups as a modularization/extensibility mechanism.
In what follows I concentrate on elements and content models---attributes present challenges which are at least in part distinct.
The published set of schema documents are intended to be combined with a
driver that imports/includes a selected subset. For example, this driver document will assemble a schema corresponding closely to XHTML10 Strict (the difference is the addition of the Ruby Basic module, providing the rb
, rp
, rt
and ruby
elements). There is also a driver which approximates XHTML Basic 1.0. In total, the 48 modules define 81 element types (86 are used, but five of these are missing definitions!).
The general paradigm is that the published modules define content models chiefly by reference to named groups. To make a change, users are expected to redefine groups to add or remove elements, ad lib.
For example, here's how the published definition specifies the content model for the
body
element:
<xs:element name="body" type="xhtml.body.type"/> <xs:complexType name="xhtml.body.type"> <xs:group ref="xhtml.body.content"/> <xs:attributeGroup ref="xhtml.body.attlist"/> </xs:complexType> <xs:group name="xhtml.body.content"> <xs:sequence> <xs:group ref="xhtml.Block.mix" maxOccurs="unbounded"/> </xs:sequence> </xs:group> <xs:group name="xhtml.Block.mix"> <xs:choice> <xs:group ref="xhtml.Heading.class"/> <xs:group ref="xhtml.List.class"/> <xs:group ref="xhtml.Block.class"/> <xs:group ref="xhtml.Misc.class"/> </xs:choice> </xs:group> <xs:group name="xhtml.List.class"> <xs:choice> <xs:element name="ul" type="xhtml.ul.type"/> <xs:element name="ol" type="xhtml.ol.type"/> <xs:element name="dl" type="xhtml.dl.type"/> </xs:choice> </xs:group>
In order to add your own element to this, you would have to build your own driver document, which included:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/1999/xhtml" [1] xmlns:mine="http://www.example.com/mine" xmlns="http://www.w3.org/1999/xhtml"> <xs:import namespace="http://www.example.com/mine"/> [2] <xs:redefine schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd"> <xs:group name="xhtml.Misc.class"> <xs:choice> <xs:group ref="xhtml.Misc.class"/> [3] <xs:element ref="mine:newfangled"/> </xs:choice> </xs:group> </xs:redefine>
Notes:
newfangled
. This
import allows me to reference it.The net result of all this is that a document such as the following
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:my="http://www.example.com/mine"> . . . <body> <div>. . .</div> <my:newfangled>. . .</my:newfangled> </body> </html>
would be schema-valid per the schema corresponding to my driver schema document as shown above.
In the published modules, almost all the groups are choice groups
(disjunctions): The only five actually substantive sequences are for
frameset
, head
, html
, ruby
and table
. Since substitution groups provide a low-overhead,
non-intrusive way of adding to a disjunction, this looks encouraging. Let's
see what our example would look like converted to substitution groups. First
we simplify things in the published modules, using abstract elements whereever
the original uses choice groups:
<xs:complexType name="xhtml.body.type"> <xs:sequence> <xs:element ref="Body.mix" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="Body.mix" abstract="true"/> <xs:element name="Heading.class" abstract="true" substitutionGroup="Body.mix"/> <xs:element name="List.class" abstract="true" substitutionGroup="Body.mix"/> <xs:element name="Block.class" abstract="true" substitutionGroup="Body.mix"/> <xs:element name="Misc.class" abstract="true" substitutionGroup="Body.mix"/> <xs:element name="ul" type="xhtml.ul.type" substitutionGroup="List.class"/> <xs:element name="ol" type="xhtml.ol.type" substitutionGroup="List.class"/> <xs:element name="dl" type="xhtml.dl.type" substitutionGroup="List.class"/>
Now we can do everything we need to do to add our own element in our own schema document:
<xs:schema targetNamespace="http://www.example.com/mine" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <xs:import namespace="http://www.w3.org/1999/xhtml"/> <xs:element name="newfangled" substitutionGroup="xhtml:Misc.class"> . . . </xs:element> </xs:schema>
That's it. Looks like a win to me.
It doesn't work. Yet. XML Schema 1.0 allows an element to be in only
one substitution group. But some elements in the published XHTML11 modules are
directly in several groups. For example, b
is in both the
InlinePre.mix
group and the InlPres.class
group. So
the blanket replacement of groups with abstract elements as substition group
heads would require an inexpressible schema per XML Schema 1.0, as the
b
element declaration would have to name two elements
as its substitution group head. The good news is that XML Schema 1.1 allows
multiple substitution group heads, and XHTML11 Modularization is still in Last
Call, so they could shift to XML Schema 1.1.
Substitution groups are great for devolved, bottom-up extensibility. The design pattern suggested by the above example is elegant and easy to use, for a language intended to be open to user extension across a broad front. But XHTML modularization has at least two goals:
Substitution groups do nothing for the second goal. For example, in the
XHTML Basic driver, only title
, base
, meta
, link
and
object
are allowed inside head
, whereas in full
XHTML11, script
and style
are allowed as well. This
is accomplished by having different definitions for the
HeadOpts.mix
group in the respective driver documents. There is
no straightforward bottom-up equivalent to this top-down approach to customization.
The previous section notwithstanding, maybe it's worth trying a complete redesign along the following lines:
Mixed or simple starred content models always just star an abstract elt, mixed or not as appropriate. These in turn will be cited as sgh by the appropriate .mix or .whatever group-equivalents.
So for example from
<xs:group name="xhtml.li.content"> <xs:sequence> <xs:group ref="xhtml.Flow.mix" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:group> <xs:complexType name="xhtml.li.type" mixed="true"> <xs:group ref="xhtml.li.content"/> <xs:attributeGroup ref="xhtml.li.attlist"/> </xs:complexType>
we would want, in two separate documents:
<xs:complexType name="xhtml.li.type" mixed="true"> <xs:sequence> <xs:element ref="xhtml.li.content" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="xhtml.li.content" abstract="true"/> --------- <xs:element name="xhtml.Flow.mix" abstract="true" substitutionGroup="... xhtml.li.content ..."/>
Could go one of two ways wrt element types:
li.abs
which in turn have the
right type defn, and do the 'real' elts separately with an appropriate sgh;The advantage of (1) is that you could do e.g. japanese xhtml with or without allowing the original english, and it's consistent with how other elts are handled. The advantage of (2) is that it's simpler.
For example, from:
<xs:group name="xhtml.ol.content"> <xs:sequence> <xs:element name="li" type="xhtml.li.type" maxOccurs="unbounded"/> </xs:sequence> </xs:group> <xs:complexType name="xhtml.ol.type"> <xs:group ref="xhtml.ol.content"/> <xs:attributeGroup ref="xhtml.ol.attlist"/> </xs:complexType>
we would get, again in two files:
<xs:complexType name="xhtml.ol.type"> <xs:sequence> <xs:element ref="li.abs" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="xhtml.ol.attlist"/> </xs:complexType> <xs:element name="li.abs" abstract="true" type="xhtml.li.type"/> --------- <xs:element name="li" substitutionGroup="li.abs"/>
I built a set of files and directories, using the strategy outlined above, and it works. (I wrote two stylesheets, one per module and one per profile, which did almost all the work).
The good news is that it not only works, it's actually very clean and powerful in some ways. It was trivial and straightforward, for instance, to produce an all-Japanese version of the Core profile, something which would have been neither using the published approach. Also, having done that, it was even more trivial to produce a bilingual version of the Core profile, which would not be at all true for the published approach.
The price for this is more files, but actually fewer bytes: The Core profile needs 13 schema documents totalling 52K bytes in the original formulation, 45 schema documents but only 44K bytes in the new formulation. The Basic profile needs 21 schema documents and 82K bytes in the original formulation, 70 schema documents but only 81K bytes in the new formulation.
One unexpected, but particularly nice, aspect of the new approach is that
in at least some cases it removes the necessity for defining special restricted
content models for profiles. In the original formulation of the basic profile, special restricted
module definitions are required for tables and forms. In the new substitution-group approach, the full module definitions can be used unchanged, because their content models are expressed in terms of abstract elements (e.g. colgroup.abs
and button.abs
). Because the 'basic' profile doesn't include the element schema files for the elements not included in the profile, for some of those abstract elements, there are no concrete elements identifying them as their substitution-group head. So for example the full table content model ( caption.abs?, (col.abs*|colgroup.abs*), ((thead.abs?,tfoot.abs?,tbody.abs*)|tr.abs+) )
becomes in practice ( caption?, tr+ )
, because the basic profile driver includes only caption.xsd
and tr.xsd
. It seems likely that this feature of the new approach will make defining profiles which are strict subsets of the whole language much simpler.