The constraint here is therefore that HTML be able to be mapped into a sequence of paragraphs of styled text, and that if that text is edited that the editor should be able to map the sequence of styles back onto a sequence of elements in a well-defined way. This allows some limited trivial nesting (eg LI within UL) but no general nesting, as a finite and small set of styles is used. In particular, the styles are not parameterized by the nesting level.
(Note: this restriction has been removed., if styles are expressed as functions of the nesting, rather than a selection from a fixed set)
Lack of specific semantics
Just as the markup must not be device-specific