XML Processing Model WG telcon -- 14 May 2009

Agenda same as last week. . .

Plus vote to publish interim CR draft: http://www.w3.org/XML/XProc/docs/langspec.html, dated 10 May

Admin

RESOLUTION: Accept minutes of 7 March as published

HT: Next meeting is 21 May

HT: Regrets from HST for 21 May

Vote to publish interim CR draft

HT: Norm distributed a pointer to his latest draft: http://www.w3.org/XML/XProc/docs/langspec.html, dated 10 May

HT: We are not in immediate reach of a complete test suite

HT: and it's been more than three months, so we should publish something

RESOLUTION: Ask the editor to publish the draft of 10 May as an interim CR draft as soon as convient

default XML processing model

HT: PG raised some questions by email

PG: What's TimBL's current opinion wrt the pipeline model -- broken?

HT: It doesn't do what he wants, but he's not opposed to it
... because he's interested in the semantics of XML documents

PG: His version does look like the kind of top-down recursive story you told

HT: Right, and that's what I was trying to get at in the elaborated infoset story

Minutes of last week: http://www.w3.org/XML/XProc/2009/05/07-minutes.html

PG's emails: http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2009May/0005.html

http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2009May/0006.html

PG: So there isn't anything in our current model which implements that kind of multi-threaded recursive story

HT: Correct

AM: Is it that there's a step or two missing, or is it more fundmental?

HT: More fundamental

HT: The basic model of the proc model is infoset to infoset transforms.

HT: It would be problematic to use our existing framework to do a standard recursive down-and-up.

HT: http://www.ltg.ed.ac.uk/~ht/compositional.pdf goes a lot further in trying to formalise this stuff.

PG: In a full recursive-descent process, you can do things on the way down as well as on the way back up

HT: That's true, you can, but you typically don't

PG: What about namespace decls

HT: Good example, not context-free

AM: What about XSLT2 -- it would let you do a lot of that, wouldn't it?
... both down and up

HT: TBL's idea is to produce some kind of semantic object, not an infoset.

HT: So there is a more fundamental reason that what TBL wants is not what our proc model does. If for instance your document combines XHTML, SVG, etc., what TimBL wants at the end is not an infoset, but a page (description).

<PGrosso> http://www.w3.org/2001/tag/doc/elabInfoset/elabInfoset

PG: Surprised to see you mention XInclude, XML Sig, XML Encryption, but not xml:id and xml:base

HT: Good point, and I think you're right on both counts
... Leaving out xml:base and xml:id was accidental
... xml:base comes for free with XProc
... but xml:id does not, in two ways:
... 1) We didn't require the xinclude step to recognise xml:ids as anchors for uris with fragids

PG: And wrt anchors there are further questions wrt DTDs and XSDs
... it's all intertwingled, and we appear to need to de-confuse the order
... to say nothing of adding in recursive descent

HT: coming back to XProc vs xml:id
... 2) we don't currently say that e.g. when parsing a character stream to produce an infoset, XProc processors should set xml:id attr IIs to have type ID
... or when we introduce xml:id attrs via e.g. p:add-attribute, that they should get that type
... does this matter? Is it detectable whether we do or not?
... What if we write type-aware XPaths, which look for type ID -- should/do/how do we know if they match xml:id?
... Need Norm for that
... Coming back to Decryption and signature verification
... For a long time I have wanted to include these, because I think the world would be a better place if use of the XML security technologies was much more widespread. But I've finally given up: of necessity, decryption and signature verification involve out-of-band appeal to key files and passphrases. Without those, the data just isn't secure. And you may need more than one set of them for a given document. This just doesn't fit well with a notion of default processing model which is pervasive, simple, and often unattended.
... Good news: Without these, since Xinclude is itself recursively specified, we don't have to implement fixed-point detection for the DXPM in XProc
... So maybe we could write an XProc pipeline which implemented the DXPM:
... [Straw man] An XProc pipeline consisting of an XInclude step
... (modulo some uncertainties wrt xml:id)

PG: So all we need from a small-s schema is IDness?

HT: That is the problem alright
... There's a chicken and egg problem
... Imagine two stages: we publish an DXPM spec; we publish a new edition of XInclude which references the new DXPM spec
... We won't get everything we want until the second step

PG: Do we have to worry about schemas?

HT: Yes, because of the way we wrote XPointer wrt IDness

PG: Any other way?

HT: External entities

PG: They get expanded, don't they?

HT: Not by all the browsers

PG: Assuming they have been expanded, there's nothing except IDness you need from schemas, in order to resolve XPointers and do xinclude
... assuming only element and framework

HT: And 3023bis

HT: Well, if we think allowing some kind of parameterisation/optionality, for use by specs. which reference DXPM and/or implementations which appeal to it, so that, for example, if five years from now we add XML Excision to the core XML specs (remove this bit of the document before further processing), should it be easy to add it to the DXPM? Should we have a core plus optional bits? In either case, we could use that flexibility to allow e.g. XSD or RNG into the DXPM in some situations.
... Open questions: 1) What about the flexibility in the XML spec itself? Do we want to require the 'full' well-formedness parse?
... 2) Parameterisable/extensible/fixed+optional --- or not?
... If it were up to me, I'd say "yes" to 'full' WFP

PG: I thought you didn't want to bring in the DTD?

HT: No, just not all the other schema languages

- DRAFT -

XML Processing Model WG telcon

14 May 2009

Attendees

Contents

Admin

Vote to publish interim CR draft

default XML processing model