| Architecture | XML
For me, XML puts the fun back into web hacking. I wrote three XML parsers last weekend. Great stress relief!
See also: some more notes on XML implementation experience, mostly by Bert Bos.
XML document types should evolve gracefully. Technically, format negotiation is a solution to deployment of revised data formats, but it did not meet the market constraints (i.e. it wasn't cost-effective for the involved parties) in the case of HTML forms, tables and foriegn payload (scripts and stylesheets).
I'm investigating ways to express the MIME multipart alternative concept at the element level in XML. This allows new features in XML documents to be deployed like color over the b/w TV signal. It allows the new and the old semantics to be expressed in the same file, which cuts down the cost of managing the data (copy, rename, verify, datestamp, inodes, ...) and caching it.
My intuition says that we can borrow the inheritance and subtyping ideas from OOP to model a form of type negotiation for XML.
This paper addresses the problem of type transformation in structured editing systems and proposes a type description model convenient for type comparison and document conversation. Two kinds of transformations are considered: dynamic transformations allow a structured editor to change the structure of a part of a document when the part is copied of moved, and static transformations allow specific tools to restructure documents when their generic structure is modified. We present in this paper the current state of our research on formal analysis for these transformations.
Cut/paste issues. Shows that DTD's are not just regexps: & ? are novel.
Also shows that separating element names from element types is essential for some kinds of modelling. I suspect DTD's should be extended to allow this (well... replaces with something that expresses this.) For example, allow XPTR style selectors rather than just namegroups in element declarations:
<!element (parent1 child) ANY> <!element (parent2 child) (x|y|z)>
@@don't use class, just make up new elements and use containment!
About namespaces in DTDs... how about:
<![ module-name [ <!entity module-name "IGNORE"> ... module contents ... ]]>
which is just like:
#ifdef _module_h #define _module_h ... module contents ... #endif /* _module_h */
I made a patch to psgml mode to allow me to use this syntax.
You still have to have a partial order on your modules. And it's still just one big namespace. So it's just like C -- which is good enough for lots of things, but not for truly independent development.
Is an unescaped > allowed in XML content? (9711 spec says yes.)
HTML 2.0 spec discouraged it in order to avoid ]]> showing up in documents, which is an error in SGML'86.
XML of 9711 has the same misfeature, but it's marked "for compatibility".
Marked sections can't contain ]]>
What's the purpose of a marked section, anyway? If it's just to be able to put XML inside XML without lots of tedious escaping, then the above limitation isn't a showstopper.
But it seems to me that the purpose is to be able to include foriegn data like SCRIPT and STYLE, in which case this limitation is really painful.
Based on shell/perl HERE documents and MIME multipart syntax, I suggest the following:
<![myStringHere[ ... ]myStringHere]>
which allows ... to contain ANY sequence of characters. Any sequence of bytes, actually! This solves the script/style problem, plus gives XML the potential to replace tar, zip, etc. in the same way that HERE documents facilitate shar archives. (But Just Say No to turning-complete archive formats.)
I'm implemented support for:
<foo> ... </>
The implementation cost is trivial. The deployment cost is the risk that folks will expect legacy HTML elements to work this way:
<blockquote> ... </>
???
Bad idea. general entites are very powerful, and all we need is a way to escape three characters (maybe two).
Other characters should be done with "replaced elements" with fallback inside, e.g.:
<emdash>---</>
Going to Unicode is probably cost-effective in the long term, but the documents don't degrade gracefully.
These are obviated by linking. The idiom:
<!doctype html public "-//IETF//DTD HTML//EN" [ <!entity product-name "Gee Whiz&tm;"> <!entity legal system "legal.html"> ]> ... &product-name; ... &legal.html;
can be done ala:
<!doctype html system "http://www.w3.org/9705/html.dtd"> <div style="display: none"> <span id=product-name>Gee Whiz&tm;</span> </div> ... <a href="#product-name" xml-link=replace>Gee Whiz&tm;</> <a href="legal.html" xml-link=replace>Copyright (c) 1997 by US</a>
The a's could be left empty. But for the benefit of downlevel clients, you can (by machine) propagate the destination of the link (or a part of it) to the souce. clients,
I want DT/DD to be able to format ala:
term definition definition def d efiintion
so I changed the content models of dt and dd so that dd is contained within dt.
@@link to MIX.
ok3: uses internal declaration subset. Boo. note that this is a perfect example of how entities are redundant with respect to linking ok3a: @@ WF client should check for data outside root element torture: whacked internal declaration out removed references to other entities #@@ is an unescaped > allowed in xml? what about ]]>? is ]]> a reportable error? well-formedness error? validity error? This doesn't match: <p>PI with markup: <?Myparser <p> or <p> -- which?></p>