Meeting minutes
trackbot, help
<trackbot> Please see <http://
Introductions
[A quick round of introductions]
Aleksei: I've worked with similar systems to ixml. I am based in St Petersburg, Russia.
… I will be updating my system to isml, to bring it to .net and C++
cmsmcq: Will I know the formats you have processed before?
Aleksei: It was one in Xerox.
cmsmcq: Involved with SGML about 30 years ago, then XML, worked at W3C for 10 years, now independent consultant. Interested in ixml because I like parsers.
John: Recently retired from Saxonica, developing parsetrees for XPath expressions, making small parsetrees. I ended up doing parsetree reduction, eliminating parents with a single child
Previous Actions
Action: Steven to specify what happens when a name isn't an XML name
<trackbot> Sorry, but no Tracker is associated with this channel.
[Continues]
[Done]
Action: Steven to research where to put S for attributes.
<trackbot> Sorry, but no Tracker is associated with this channel.
[Continues]
Action: Steven create W3C Community Group [Done]
<trackbot> Sorry, but no Tracker is associated with this channel.
New draft
https://
Steven: Includes a conformance section.
Action: Michael to comment on confrmance section of new draft
<trackbot> Sorry, but no Tracker is associated with this channel.
Requirement that input be completely consumed
https://
Michael: problem is with streaming
Michael: "The longest string that matches"
Steven: That's OK, that's the same thing.
Aleksei: You may have an XML fragment with no root element. And then a stream of them.
John: What happens if your input stream is a sequence of documents?
Aleksei: There are points in the input when you are certain.
Steven: Not in the general case.
Michael: You can do reductions at all points there's a match.
… I realise that might give an infinite number of reductions.
… Suppose I use ixml to make an ixml parser.
… then I can repeat over the rules.
… So i believe there are two use cases for not requiring consuming the longest string.
… Prolog style (see email, and transient streams.
Tomos: Because we allow ambiguity, can be do ixml in streaming mode?
Michael: I believe so.
Tomos: Big memory requirements
Michael: The spec doesn't constrain how the result is produced.
Steven: I would be extremely cross if I offered a document, and got an empty XML document just because it matched the initial empty string.
Michael: If the ixml spec requires the entire input, it wouldn't work for streaming.
Steven: It depends on the meaning of "entire input".
… that's the conformance requirement I'm looking for.
Steven: A wording that supports the obvious case without excluding streaming is what I am looking for.
Tomos: We have a default for parses that an implementation has to pick, we don't define those in the spec, but it would be useful to define how to deal with an ambiguous parse.
John: You would have to be able to insist that the implementation gives you the biggest parse, if it were a choice.
Tomos: That should be the default.
Aleksei: Can't you just do greedy, like with regexp?
<cmsmcq> Aleksei: you could define this in the same way as requiring regular expression parsers to be greedy.
Steven: This would mean that parsing stops at an error, and says that the parse is correct.
Tomos: How about when you find an error, providing a partial parse, and the rest of the input
Steven: The spec already says that. We need to specify what happens for a correct parse.
Tomos: Can't we just say "greedy" as Aleksei proposed.
Steven: I think that that may work.
<cmsmcq> How close does this wording come to matching what people want?
<cmsmcq> In the normal case, the input will have a deterministic length, either
<cmsmcq> known in advance or signaled by some end-of-stream signal.
<cmsmcq> In that case, the default behavior of an ixml parser shall be to parse
<cmsmcq> the input as a whole against the grammar, and return a parse or a
<cmsmcq> failure document as described elsewhere.
<cmsmcq> Parsers may also support the case of input of non-deterministic
<cmsmcq> length, by parsing successive prefixes of the input.
<cmsmcq> Parsers may also offer, at user option, to parse prefixes of the input
<cmsmcq> even if the input has deterministic length.
Aleksei: Greediness by default, because of the empty string case. But with streaming, there is an extra case.
<johnLumley> seems appropriate Michael...
Michael: I don't want an interface that offers an infinite number of parses.
AOB
Next meeting Tuesday 11th May.
Further discussion on email.
Tom: Namespaces on the agenda next month please