XProc Minutes 9 Feb 2006 The XML Processing Model (XProc) WG met on Thursday, 9 Feb 2006 at 11:00a EST (08:00a PST, 16:00GMT, 17:00CET, 01:00JST+, 09:30p India) for one hour on the W3C Zakim Bridge. See the XProc WG[1] page for pointers to current documents and other information. Norm gave regrets; Michael (MSM) chaired (and scribed), and apologizes for the late arrival of these minutes. Attendance: Present Erik Bruchez, Orbeon Vikas Deolaliker, Sonoa Systems Andrew Fang, PTC-Arbortext Murray Maloney, invited expert Alex Milowski, invited expert C. M. Sperberg-McQueen, W3C Henry Thompson, W3C Richard Tobin, Univ. of Edinburgh Alessandro Vernet, Orbeon Paul Grosso, PTC-Arbortext Regrets Norm Walsh, Sun Microsystems Jeni Tennison, invited expert Rui Lopes 1. Administrivia 1. Accept this agenda. Accepted without change. 2. Accept minutes[3] from the previous teleconference Accepted without change. 3. Next meeting: 16 Feb 2006. Noted. 4. Tech Plenary[4] registration is now open[5]. Noted. 2. Technical 1. XProc Requirements and Use Cases[6] We continued our discussion of the document. http://www.w3.org/XML/XProc/docs/langreq.html Alex Milowski asked whether anyone had any general comments on the design goals section, before resuming our walk through the specific requirements. Alessandro said he was a bit concerned at the terminology used, particularly the specific mention of the infoset. We discussed recently what the input and output of pipeline stages should be (in particular, infosets vs. XDM instances), but AV did not think we had reached consensus one way or the other. He would prefer to avoid talking specifically about infosets. Alex said he felt strongly that we have to set a minimum bar, and that the infoset is that minimum. Henry thought we had actually reached agreement that pipeline stages and implementations of the pipeline language are not constrained to a particular data model, but you are constrained to support the info set. There followed a long discussion of whether we support arbitrary data streams including non-XML data, or streams limited to particular vocabularies or subsets of the infoset (e.g. XML, but not with attributes). There was some strong sentiment in favor of saying no, we do not support arbitrary data streams, but there were also some concerns about that restriction. There seemed to the scribe to be something like consensus that it needs to be possible to build special-purpose pipeline stages that only support specific vocabularies, and that if such a vocabulary has (for example) no attributes, it might be a challenge to formulate a requirement that attributes (to continue the example) must be supported. Henry suggested that we should probably elevate to the status of a general rule the basic principle that pipeline stages should pass the input infoset through to their output without change, except for the changes which are part of the processing and which are documented. If (for example) a component is advertised as accepting an XPath which denotes a set of nodes which contain URIs, and a base URI, and producing output in which the URIs have all been absolutized, then if all the namespace bindings are missing from the output, Henry wanted to have a legitimate grievance against the component maker. MSM sympathized with this view, but wondered whether such a rule was inherently toothless, in the sense that the maker of the component described by Henry would be able to make the component work 'correctly' by changing the documentation to say "this component absolutizes all URIs and suppresses all namespace bindings". Such a description might make clearer to the user that that component is not really useful in practice, but it could be a correct, conforming component nonetheless. Henry agreed that it would be hard to make the definition of conformance entail usefulness, but thought the notion of component signatures might be worth exploring even so. Alessandro suggested that the discussion showed clearly that we don't have a clean consensus on the details; there seemed to be agreement on this conclusion. It is probably enough, said Alessandro, if the requirements document says it's a requirement to be clearer about this problem. Alex asked whether we should perhaps require that the language define a minimum set of infoset items and properties. That would leave open for us to define either a subset of the infoset, or require XDM, or do something else. But it does say clearly that doing that is a work item. Richard expressed concern about possible over-specificity. It's (still) a possibility, he said, to define a system that does not allow addition of components, so everything would be a black box. In such a system, with black boxes for components, the user isn't actually able to *tell* what info items are flowing across the component boundaries. [Possible exception: in the right circumstances, a change in the input which fails to elicit a corresponding change in the output would indicate that a particular piece of information is not crossing some boundary somewhere. Alex replied that we do have a requirement for adding components. Should that requirement be labeled optional? (If it's not optional, then it's not actually still a possibility to define a system that does not allow addition of components, so everything would be a black box.) Murray said it seemed to him that me we can't say what happens inside of each component: it might use the infoset, or one data model, or another -- that's not up to us, it's up to the component. He supposed we could limit the inputs and outputs. But at least for the terminal component, it's useful to be able to produce non-XML output (text files, Postscript, ...), so it seems we are likely to be on thin ice if we seek to eliminate all non-XML data streams from our purview. Even agreeing that some minimum bar needs to be set, Murray said, some of the possible places we've talked about putting the bar seem (unnecessarily) restrictive. Richard said he was more interested in restricting the WG in our deliberations than in restricting implementations. He didn't want this to expand to become a wholly general language for processing arbitrary data. But he saw only one way to avoid that, namely to say that what's passing through the pipeline is XML -- though not necessarily in textual form (hence the reference to the infoset). Alex said he had registered an issue on requirement 4.3. So this should be trackable now. Perhaps, he continued, this topic (infosets and nature of what passes through the pipeline) should be on the ftf agenda, and then we can table it (i.e. suppress it) on the calls between now and then. He encouraged the WG to look at the use cases, at least briefly (with the side warning that the HTML is currently sub-optimal). In terms of the requirements document, Alex reminded the WG, we have made it through to 4.6. I stuck in the strawman we talked about last time, refactoring the old requirement into two pieces. 4.6 is the idea of having standard names for standard steps. 4.7 is a specific proposal for a minimal set of standard steps. At this point, the allotted time expired and we adjourned. 3. Any other business None. [1] http://www.w3.org/XML/Processing/ [3] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Feb/0004 [4] http://www.w3.org/2005/12/allgroupoverview.html [5] http://www.w3.org/2002/09/wbs/35195/TP2006/ [6] http://www.w3.org/XML/XProc/docs/langreq.html