W3C

– DRAFT –
Ixml Users Community Group Teleconference

13 April 2021

Attendees

Present
AlekseiGusev, JohnLumley, MichaelSperbergMcQueen, StevenPemberton, TomosHillman
Regrets
-
Chair
Steven
Scribe
Steven

Meeting minutes

trackbot, help

<trackbot> Please see <http://www.w3.org/2005/06/tracker/irc> for help.

Introductions

[A quick round of introductions]

Aleksei: I've worked with similar systems to ixml. I am based in St Petersburg, Russia.
… I will be updating my system to isml, to bring it to .net and C++

cmsmcq: Will I know the formats you have processed before?

Aleksei: It was one in Xerox.

cmsmcq: Involved with SGML about 30 years ago, then XML, worked at W3C for 10 years, now independent consultant. Interested in ixml because I like parsers.

John: Recently retired from Saxonica, developing parsetrees for XPath expressions, making small parsetrees. I ended up doing parsetree reduction, eliminating parents with a single child

Previous Actions

Action: Steven to specify what happens when a name isn't an XML name

<trackbot> Sorry, but no Tracker is associated with this channel.

[Continues]

[Done]

Action: Steven to research where to put S for attributes.

<trackbot> Sorry, but no Tracker is associated with this channel.

[Continues]

Action: Steven create W3C Community Group [Done]

<trackbot> Sorry, but no Tracker is associated with this channel.

New draft

https://lists.w3.org/Archives/Public/public-ixml/2021Apr/0005

Steven: Includes a conformance section.

Action: Michael to comment on confrmance section of new draft

<trackbot> Sorry, but no Tracker is associated with this channel.

Requirement that input be completely consumed

https://lists.w3.org/Archives/Public/public-ixml/2021Apr/0007

Michael: problem is with streaming

Michael: "The longest string that matches"

Steven: That's OK, that's the same thing.

Aleksei: You may have an XML fragment with no root element. And then a stream of them.

John: What happens if your input stream is a sequence of documents?

Aleksei: There are points in the input when you are certain.

Steven: Not in the general case.

Michael: You can do reductions at all points there's a match.
… I realise that might give an infinite number of reductions.
… Suppose I use ixml to make an ixml parser.
… then I can repeat over the rules.
… So i believe there are two use cases for not requiring consuming the longest string.
… Prolog style (see email, and transient streams.

Tomos: Because we allow ambiguity, can be do ixml in streaming mode?

Michael: I believe so.

Tomos: Big memory requirements

Michael: The spec doesn't constrain how the result is produced.

Steven: I would be extremely cross if I offered a document, and got an empty XML document just because it matched the initial empty string.

Michael: If the ixml spec requires the entire input, it wouldn't work for streaming.

Steven: It depends on the meaning of "entire input".
… that's the conformance requirement I'm looking for.

Steven: A wording that supports the obvious case without excluding streaming is what I am looking for.

Tomos: We have a default for parses that an implementation has to pick, we don't define those in the spec, but it would be useful to define how to deal with an ambiguous parse.

John: You would have to be able to insist that the implementation gives you the biggest parse, if it were a choice.

Tomos: That should be the default.

Aleksei: Can't you just do greedy, like with regexp?

<cmsmcq> Aleksei: you could define this in the same way as requiring regular expression parsers to be greedy.

Steven: This would mean that parsing stops at an error, and says that the parse is correct.

Tomos: How about when you find an error, providing a partial parse, and the rest of the input

Steven: The spec already says that. We need to specify what happens for a correct parse.

Tomos: Can't we just say "greedy" as Aleksei proposed.

Steven: I think that that may work.

<cmsmcq> How close does this wording come to matching what people want?

<cmsmcq> In the normal case, the input will have a deterministic length, either

<cmsmcq> known in advance or signaled by some end-of-stream signal.

<cmsmcq> In that case, the default behavior of an ixml parser shall be to parse

<cmsmcq> the input as a whole against the grammar, and return a parse or a

<cmsmcq> failure document as described elsewhere.

<cmsmcq> Parsers may also support the case of input of non-deterministic

<cmsmcq> length, by parsing successive prefixes of the input.

<cmsmcq> Parsers may also offer, at user option, to parse prefixes of the input

<cmsmcq> even if the input has deterministic length.

Aleksei: Greediness by default, because of the empty string case. But with streaming, there is an extra case.

<johnLumley> seems appropriate Michael...

Michael: I don't want an interface that offers an infinite number of parses.

AOB

Next meeting Tuesday 11th May.

Further discussion on email.

Tom: Namespaces on the agenda next month please

Summary of action items

  1. Steven to specify what happens when a name isn't an XML name
  2. Steven to research where to put S for attributes.
  3. Steven create W3C Community Group [Done]
  4. Michael to comment on confrmance section of new draft
Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Succeeded: s/Alex/Alek/

Succeeded: s/THe/

Succeeded: s/documents./documents?/

Succeeded: s/11th mail/11th May/

No scribenick or scribe found. Guessed: Steven

Maybe present: Aleksei, cmsmcq, John, Michael, Steven, Tom, Tomos