See also: IRC log
We accept the draft agenda, https://www.w3.org/XML/XProc/2016/01/27-minutes.html
Henry: that agenda should include links to the draft
<scribe> ACTION: add link to the draft [recorded in http://www.w3.org/2016/02/10-xproc-irc]
<alexmilowski> https://github.com/xproc/notes/tree/master/dataflows
Alex: good to spend time on the major conceptual issues
Do we accept meeting minutes from the Jan 27th
https://www.w3.org/XML/XProc/2016/01/27-minutes.html
none heard accepted
Do we accept meeting minutes from the Feb 3rd ?
https://www.w3.org/XML/XProc/2016/02/03-minutes
Henry: before tomorrow we need to have the link to the new doc
Alex: we have included link
Henry: no one will see it until tomorrow
Alex: will send email out to the list with link of draft proposal
https://github.com/xproc/notes/tree/master/dataflows
Jim: has sent agenda for today to the list
pause to read Henry comments
https://github.com/ndw/xproc-notes/blob/master/design/ht-notes.html
Henry: I would like to put further discussions about the syntax to one side
... With all respect to the WG, we are not up for this job ...
... We need more wo/manpower
Alex: agree we need more people
<alexmilowski> http://noflojs.org/documentation/fbp/
<alexmilowski> http://www.nextflow.io/example1.html
Alex: goes on about some examples of dataflow languages
... noflow is a javascript type thing - its a dataflow language that processes inputs/outputs
Jim: what about google dataflow
Alex: I looked at google dataflow its an api
Jim: its an important distinction api vs actual dataflow language, but I mention in the awareness of dataflow in general
Alex: There are lots of people working in data pipelining but there is no way to say 'here is my data pipeline'
Liam: that is (and gives me) an interesting insight - there is two kinds of (anything) of data flow language
... First approach everything is done with dataflow, the second is where you have 'islands' that are steps
Henry: The way I understood, 'we now use right arrow operator'
Alex: the steps continue to be a blackbox
... Gives examples of blackbox steps
<liam> [ pervasive dataflow language example - https://en.wikipedia.org/wiki/Lucid_%28programming_language%29 ]
Alex: if you look at the science workflows they are all dataflows
... mentions Spark, pig in data science workflows
... when I was teaching, my students started to draw data pipelines
Henry: XProc v1 is pretty good, but once you go into longer term doing anything other then linear flows are unreasonably harder
Alex: We have two fundamental problems, we focused on XML
Liam: I raised that in my reply to Henry's mail
Alex: The other part about xproc is we focused on steps
... and everything became 'steps'
... the design centre revolved around steps
Henry: I agree with the first part- not the rest
Alex: The syntax of the language focused on the stpes
Henry: maybe we are saying the same thing
... the hard part is managing more then linear flow
Liam: We spent longer in XQuery because of XQuery scripting
... describes experience with XQuery scripting, spent almost three years
... we should avoid that kind of route
https://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2016Feb/0008.html
Alex: it should be easy, I want a tool to parse an instance of xproc v2 and draw dataflow graph
Henry: The data flow graph is a candidate - abstract semantics
... something that is less detail
... We need something that is more specific then the nice pictures of the xproc v1 spec
... We only designed for the simple case
... .we got blindsided because we were focused on the simple case
... (pls <diety/> do we have to do more UML diagrams)
... Layering is going to be absolutely crucial - one of the things we need right was that the language that is the interface to the engine was lot lower then any sane human would use
Jim: thinks that xproc v1 was working with abstract syntax tree
Henry: It is implicitly there, the way we talked about xproc syntax ... there are a lot of defaulting rules
... xproc own language is its own compilation target - its worth not assuming the authoring language is the same thing
Alex: we have operators now and we connect things by operators
Henry goes to the whiteboard
scribe: (then comes back)
Henry: We get to the heart, if it were my choice we should stop this discussion
... The complexity that kills us is the non linear definition
Alex: I do not disagree with you
Henry: I want to see the syntax we will serialise
Alex: We have lots of use cases, we need to get more use cases (non xml, non epub) eg. from machine learning pipelines
<liam> [ graphical pipeline editor with real example - http://3.bp.blogspot.com/-codtjRuVW5I/T4qFXWkFMVI/AAAAAAAAAeo/EdEfvkVeun8/s1600/node4-exp.png ]
Alex: We need to be able draw a graph as dataflow
... and so here is the ways you can write this down
Henry: That is the next step, take one of the examples and do that
... I want to see what the problem is
Alex: We have a lot of experiences with pipelines, especially on the working group
... What we do not have is enough use cases
... I know where to get more
... I would like to keep our next steps grounded, we take the use case, we push it through our new syntax and see if it works - we cannot go back to first principles
... can a tool spit out a diagram, if it cannot then why ?
... if its because its a difficult pipeline (strange, weird) fine
Liam: the reason this works (diagram) it is all single layer
... Description is at the right level
Henry now at the whiteboard
Henry: I would like to lay out my thinking
Alex: one point of consensus, we need better use cases ...
<scribe> ACTION: re collate list of use cases [recorded in http://www.w3.org/2016/02/10-xproc-irc]
scribe distracted by getting Norm to join
hangout
break for
a few mins
<scribe> Scribe: Alex
<scribe> ScribeNick: alexmilowski
The real chair has joined us via a video call.
Henry is at the board …
Henry: like to rethink what the basic elements of the flow ...
<Norm> I'm listening. Heading to other room to check coffee
Henry: Build up the inventory of functionality that we need to map onto the language
... starts with pipes ...
... take the notion of pipeline and make it richer and simpler at the same time
... what naturally flows together but also more at the architecture that actually runs
... reserve the posibility of more explicit layering in the design
... that there is a language not designed for human consumption and there are layers over that...
... pipes that are very detailed and not really hand crafted
... kinds of things: xdm objects, xml documents, other documents, metadata …
... the notion of scope does not fit well with the notion of data flow (e.g. variables versus the data the flows)
... where does the XPath static context come from … what does a variable resolve against in the data flow world?
... maybe contexts flow along with data
... gauges? instrumentation on the pipeline that gives you insight on what inside the pipe … sampling
... viewport .. you can look inside
... hindsight, we need to treat sets of documents (collections) as first class objects in the architecture … the need to move collections around
Alex: side note, the data science perspective is that you want to process a collection as if you have the whole thing but in actuality, you only have a subset that you can process on a particular computation node
Henry: there are two different related points … thinking about collections as something in the flow.
... difference betwen compile time sets and runtime sets
... example: splitting index terms by an alphabet (e.g. one output set for each letter)
... at runtime a step can create an output stream for each letter
Alex: the runtime one is like map/reduce
Henry: steps have signatures, … still important.
... they vary on three dimentions: item type, arity, and style
... item type: what comes down the pipe
... arity - how many on the input/output
... style - events, sets, callbacks, etc.
... every interface that data flows through has signatures
... sinks have to deal with multiplexing, steps do not have to deal with that
... sinks and sources do the ‘adaption'
... the source has to do the assembly
Alex: In xproc v1, you have to put something in the middle
... In xproc vnext you get that built into the language (do the meet)
... easier to specify in xproc vnext
... when you conceptualise dataflow process and go do it in the language and extract the graph, they are not going to be the same
... the question are ‘how are they different'
<Norm> Norm: I don't think meets were especially difficult in XProc1, you didn't need a step, but you did have to name all the steps that were meeting and build the p:pipe list for them.
Henry: Steps get configured at authoring time
... We spent a while trying to get params and options right
Norm: Maps gave us a different and less obscure story to tell about params
scribe buffer full
<Norm> ScribeNick: Norm
Henry: I want to reconsider the whole question of step configuration; it's easy in the simple case in XProc1, it's when you want to do something sophisticated where you fall off the complexity cliff.
... Getting this into the language is going to be tricky.
Alex: What part of configuration isn't covered by maps?
Henry: That's just saying AVTs can represent everything. That's true, but...
... Dependencies and ordering are a big part of the problem. How do you get at what you put on the right hand side of the arrows.
Alex: Config params for XSLT is easy
Henry: I disagree, it's about what you can *use* to construct those paramters that's important.
... What is the XPath context? How do you specify the XPath context for variable references in the expressions which compute the map?
Alex: We have that mechanism now.
Henry: It's full of defaulting.
Some further discussion of maps and parameters. Room for additional understanding/discussion.
Jim: Is it time to move on?
Alex: There are lots of good points here, we need to consider them
... We need to take configuration as you've described it and break it down into different silos.
Henry: Think of a data flow language and now tell me what are the dependencies by which (and where do expressions sit) in the data flow.
... Steps/sinks/sources/adapters/pipes. Where does configuration fit in there? It doesn't have an obvious place to stand.
Alex: Right now that's an implicit input.
Henry: There's a bunch of defaulting going on that needs to be made explicit at this level.
Alex: The story we tell today makes the story more complicated because steps have variables and other implicit inputs
Henry: We're in violent agreement.
Jim: There are lots of languages that escape to Java properties and lots of specs stop at that point.
... I think this is worth having an action.
Liam: It might be worth a few minutes on my responses to Henry.
<jfuller> scribeNick: jfuller
https://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2016Feb/0008.html
Go through Liam response to Henry
Liam: Has said we need more resources, which I think its true but when I look at successful languages they were done by 1 or 2
... If we produce something 1% as successful as Perl, that would be not a bad thing
... XQuery has lots of orthogonality ...
(Alex: we never want to go back to parameter entities ... ever again)
Liam: We could get some review with programming language design people
... Adoption of a language is about writing in a syntax
... The biggest question - is how we know it succeeded ... what is the measure of success
Jim: 1000x more users is my metric
Alex: I would like to see more data science users
<alexmilowski> Replace X with J globally!
Norm: I think we want to make a language that is less xml centric but more data agnostic but we do not need to drop the X
Alex: heritage works against us
<Norm> Naming things is bloody difficult. You come up with a better name, I'll sign on.
Liam: I care that vnext is not v1.1
<Norm> W3C DFL
Jim: What other WG groups would have a dataflow
some discussion about where dataflow may emerge in other W3C WG.
Alex: Relying on xquery makes a lot of sense and easily shoehorn json
... its more xpath
Henry: You cannot use xpath unless you specify how you embed it
Alex: xpath is possibly too much for what we need ...
<scribe> ACTION: discuss about xpath vs xquery and minimal size of exp language [recorded in http://www.w3.org/2016/02/10-xproc-irc]
<scribe> ACTION: identify metric of success [recorded in http://www.w3.org/2016/02/10-xproc-irc]
Norm: tomorrow we say 100x more users, all in agreement ?
= Jfuller: none heard
https://github.com/xproc/notes/tree/master/dataflows
take a 10 min break
back from break
Alex will take us through doc
Alex: Will email ebnf to the list
Norm: Might be premature to go through EBNF
Alex: The spec does not describe the story as it needs to describe
Norm: Story will change
Alex has problem with definition of step chain
Henry: I find usage of bind slightly awkward
... means exact opposite of what I was thinking
Alex: as we find these things we should be more permissive
Henry: I dislike having it on the left hand side
... What I would like is $in => source@xslt
... [$in,$style]=>[source,stylesheet]@xslt
Alex: discussing output binding
Norm: if we chain xslt to something else ...
Alex: we are talking about >>
<Norm> Output bindings may be defined at the end of a step chain with the ">>" operator.
Henry: You can dump an output into a variable and then stuff that variable into any number of inputs
... but... are things more symmetrical if you have a document or a seq of docs, dumped into a var ... you may stuff them into an input
Alex: you use the append operator and bind the output to a var
we use >> to mean 'append'
<Norm> https://github.com/xproc/notes/issues/5
jfuller: brings up the issue of default context
... we are going to have problem with $1 in xpath
Liam: version could be distracting
jfuller: As long as we can explain it then I think we are ok
<alexmilowski> [$source,$manifest] → { … }
Henry: add an example source frag
... if we were embedding in python it would be easier
(in response to block expr)
Henry: Overloading of square bracket is an issue
<ht_home> Henry likes Norm's musing, so moves it to the record: [$source,$manifest] -> [source,manifest]{ if ($source/*/@version) ... }
Alex: we left off append operators in the early discussion of step chains on purpose.
jfuller: Do we need to illustrate options now ?
Henry: Its about the flow stupid ...
Liam: I do not like the unnamed inputbinding form
["document.xml","style.xsl"] -> xslt()
Norm: They will be all the same in xproc
<ht_home> So why aren't we just using python syntax:
Norm: In the middle of a chain you have to use the anonymous forms
<ht_home> -> xslt(source=$1, stylesheet="style.xsl")
Henry: why not allow pythonic approach
Alex: Norm has a solution to this ....
<Norm> ... -> [$source,$secondary]@xslt(...)
<Norm> Oops. ... -> [source,secondary]@xslt(...)
Henry: Why are we making a distinction ?
Liam: Can you write a function that returns a step ?
Henry: The last point I made on the whiteboard was about 'reflection'
discussion about dynamically creating steps
Liam: I think I understand the last example before Inputs section
... Its really hard to handle messages from XSLT
Norm: in xproc vnext we might define a new output port fot xslt messages
Liam: Its unclear
<Norm> Norm: We don't have many steps that have two outputs which in the general case feed into steps with two inputs
picking up where we left off at Inputs
Alex: We build up what a port is bound too
Jim: new issues are at
https://github.com/xproc/notes/issues
jfuller: older issues at
... Norm helpfully elects to migrate them
Norm: for presentation - we know xinclude step does not take a sequence
... Maybe think about automatically accept sequences
... but that would be confusing for existing users
Alex: some feedback I got from a non xml person, suggested that we make our examples more neutral and less xml centric
... We need to think about all other data formats
Jfuller: we already identified that right ?
... we should define minimal flow language, this just means we can push this out or in when we spec it out
Alex: same feeling about literals
... obvious things that are missing are triples (json-ld, turtle?)
... some things like SPARQL are easier
Norm: we are going to have to say something about extensibility of data model ... need to describe how AVT work
... The only rational description of the data constructor is what is between the delimiters
... and then you interpret that sequence as html, etc
Alex: we may want to go farther then that for other media types
Alex and Norm discuss the finer points of data constructor
Liam: what happens here if we do AVT inside data like json
<Norm> https://github.com/xproc/notes/issues/11
Alex: the syntax here is incomplete
... we need to carefully craft a story ...
Henry: we remove it for now,
Alex: we will skip it
Henry: Its not a USP of this proposal
Alex: we know we need literals
, we need to do AVT, etc
<liam> [ my quasiquoting proposal from 2013 - https://lists.w3.org/Archives/Member/w3c-xsl-query/2013Nov/0017.html (there was also a version with user-specified delimiters, more like here-documents) ]
Norm: Take it out of the linear flow and put into appendix
jfuller: now onto output bindings
Alex: essentially 'where you are sending the output'
... Using this generally to send things to output ports by reference
... depends on how it is predeclared and I wonder how it will be received
Norm: I am ok with right hand usage (for now)
Alex: $included is effectively being declared, but it is implicit
<Norm> Mmmm. ... >> [$out=result, $chunks=secondary]
jfuller: we are just giving a primary output a name
<Norm> 3 + 4 >> $sum7
Alex: it has nice semantics, in that you can build indendant flows
(like my previous 'meet' example)
Alex: we know that ordering is an issue
jfuller: do we have sort ?
Norm: we have an RFP for p:sort
Henry: Sorting is useful and we should cater for the 80% case
Alex: step declarations are rough at the moment
<Norm> $in → [source]{ if ($source/*/@version eq "v1.0")
<Norm> then [$source,"crummy.xsl"] → xslt() ≫ @1
<Norm> else [$source,"better.xsl"] → xslt() ≫ @1 }
<Norm> ≫ $out
<alexmilowski> https://github.com/xproc/notes/blob/master/dataflows/examples/database-import.xpl
now discussing block expressions
Henry: At the flow level we have variables
jfuller: was if then intentional vs if then else
Alex: it was intentional
Liam: then there is ambiguity
Liam reminds us of the dangling else problem
Norm: we can have a step that produces nothing ... we can finesse it if we need too
Alex: We need conditionals
... We already variables do we need let binding ?
... Projects can be a step
Norm: I think its an antipattern
... We should have a step for each kind
Alex: projections is a nice to have
... iteration is a necessary piece
... replacement is a nice to have
Norm: I think we need viewport
Alex: Do folks use this ?
Norm: I absolutely use it
Alex: tee is neat probably a nice to have
jfuller: raises implicit iteration
Liam: is there order implied ?
Norm: it is a mapping operator
Henry: It must output them in that order
Liam: the term iteration implies ordering
Henry: do we answer this question in xproc v1
Norm: We probably do, but we wanted to allow parallism
Henry: xproc v1, it says 'each document in turn'
<Norm> |-
<Norm> -|
Alex points some meta thoughts for lang design
Murray: I do not quite understand the Tee example
<Norm> $in -> xinclude() >> $temp
<Norm> $temp -> "included.xml"
<Norm> [$temp,"stylesheet.xsl"] -> xslt()
<Norm> >> $out
Alex attempts to explain
<Norm> $temp >> "included.xml"
<Norm> $in -> xinclude() >> $temp
<Norm> $temp >> "included.xml"
<Norm> [$temp,"stylesheet.xsl"] -> xslt()
<Norm> >> $out
Murray: you can use Tee as a manifold
Henry: Murray is right
jfuller: Murray's arguments convince me
Norm: I think we should remove it
Alex: If you have a problem with that then we have a problem with iteration
... They are all operators
<ht_home> ("x1.xml","x2.xml") { | xinclude() }
Henry: If I could do that, then I would be happy
<Norm> ("x1.xml", "x2.xml") -> { foreach xinclude() }
<alexmilowski> [$in,”chop.xq”] → xquery() ! { [$1,”chunk.xsl”] → xslt() → @1 } → $out
<Norm> [$in,”chop.xq”] → xquery() ! { [$1,”chunk.xsl”] → xslt() >> @1 } >> $out
<Norm> [$in,”chop.xq”] → xquery() → { foreach [$1,”chunk.xsl”] → xslt() >> @1 } >> $out
<alexmilowski> [$in,”chop.xq”] → xquery() ! xinclude() ≫ $out
<Norm> [$in,”chop.xq”] → xquery() → { if ($1) then $1 → xslt() >> @1 else () } >> $out
Alex: block expressions nest
Norm: curly brackets ought to be lexically scoped
moving on
Alex: we need a way to name flows, reuse them
Henry: Literal strings as inputs or outputs translates to GET/PUT what about string variable references
Alex: You can call your xproc with an option
Norm: externally bound yes, but what about when I want to choose
Henry: Norm is solving a simpler problem
Alex: use let
<Norm> xs:int($1/*/@version) >> $version
Alex and Henry discussing let
<Norm> let $imode := "string" "doc.xml" -> xslt(initial-mode=$imode)
<Norm> () -> { let $imode := "string" "doc.xml" -> xslt(initial-mode=$imode) }
<Norm> () -> { let $imode := "string" { "doc.xml" -> xslt(initial-mode=$imode) } |
<Norm> () -> { let $imode := "string" { "doc.xml" -> xslt(initial-mode=$imode) }
<Norm> () -> { let $imode := "string" { "doc.xml" -> xslt(initial-mode=$imode) } }
<alexmilowski> let $imode := "string" { "doc.xml" -> xslt(initial-mode=$imode) }
<Norm> $imode := "string" { load("doc.xml") -> xslt(initial-mode=$imode) }
Norm: I cannot live with quoted string doing one thing and quoted string doing another thing
file: //doc.xml
<Norm> $imode := "string" { `doc.xml` -> xslt(initial-mode=$imode) }
<alexmilowski> <doc.xml>
<Norm> $imode := "string" { <doc.xml> -> xslt(initial-mode=$imode) }
<ht_home> HST likes ^ and v, one for read-from and the other for write-to
<ht_home> [Read that as up-arrow and down-arrow]
<ht_home> And, crucially, we should _definitely_ support <$schemaFileName
<ht_home> Changing my mind again. Use postfix > for read and prefix > as write, so we have
Norm: We will take the proposal plus comments
... And thats how we will move forward
Alex: The other thing we need to do tomorrow, is what Henry suggested
<ht_home> e.g. [$1,$schemaFileName>] -> xmlschema() >> [@1,>$errorReportFilename]
Alex: We need people's use cases
... What you were trying to do
Henry: and it will be fascinating about how they go about diagramming them
(just catching up)
Any objection to the current existing proposal in the context of todays understanding ?
https://github.com/xproc/notes/tree/master/dataflows
none heard
tomorrow we meet
http://www.xmlprague.cz/day1-2016/
at 10:00
Murray: Why the over reliance on $
Henry: This touches upon something I mentioned this morning the idea of typing
... and something we have to come back too, less concerned of naming at the moment
Murray: I want things to flow things into a pipe
... is essentially a manifold
jim is back as well
<Norm> Crucially, $in is a source from which 0 or more documents can be read. It might be a constructed document or sequence of documents or it might be the output of some other step
@outputfromfoo -> xinclude() ->
$in <- @outputfromfoo
(from Murray)
["foo.xml"] -> xinclude -> @outputfromfoo
$in <- @outputfromfoo
<alexmilowski> We should contact these people: http://www.xml-project.com/morganaxproc/
<Norm> Yes