IRC log of csvw on 2015-02-13

Timestamps are in UTC.

09:01:11 [RRSAgent]
RRSAgent has joined #csvw
09:01:11 [RRSAgent]
logging to http://www.w3.org/2015/02/13-csvw-irc
09:01:24 [ivan]
Meeting: CSVW F2F Meeting, London, 2nd day
09:01:28 [ivan]
Agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02
09:01:32 [gkellogg]
gkellogg has joined #csvw
09:01:34 [ivan]
Chair: Jeni
09:01:42 [ivan]
rrsagent, set log public
09:01:51 [ivan]
rrsagent, draft minutes
09:01:51 [RRSAgent]
I have made the request to generate http://www.w3.org/2015/02/13-csvw-minutes.html ivan
09:18:44 [danbri]
danbri has joined #csvw
09:19:38 [jumbrich]
jumbrich has joined #csvw
09:20:56 [jtandy]
jtandy has joined #csvw
09:21:25 [JeniT]
Agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02
09:21:32 [gkellogg]
scribenick: gkellogg
09:21:32 [DavideCeolin]
DavideCeolin has joined #csvw
09:21:37 [gkellogg]
topic: Foreign Keys and References
09:24:17 [JeniT]
http://piratepad.net/URwa3CM9Vv
09:25:23 [gkellogg]
JeniT: how we handle having multiple interrelated tabular data files.
09:25:47 [gkellogg]
… An example is the public salaries use case (#4?)
09:26:14 [ivan]
scribenick: danbri
09:26:20 [danbri]
gkellogg: roles json is basically a table group referencing two tables
09:26:26 [danbri]
… the driving metadata file
09:26:31 [danbri]
references the senior roles
09:26:41 [danbri]
and the junior roles
09:27:44 [danbri]
junior people refer to senior people
09:28:01 [danbri]
… a foreign key rel from the col 'reportsTo' to the other with col 'ref'
09:28:29 [danbri]
what we've said here … we've created property urls, and a value url
09:28:38 [danbri]
so property expands to reportsTo and value uses a URI pattern
09:28:46 [danbri]
senior roles have more col definitions
09:28:50 [danbri]
ref name grade and job
09:29:51 [danbri]
this allows you to examinethe data, … a vaidator, ...
09:30:12 [danbri]
… looking at junior, … seeing reporting senior in 1st col, which would need to exist in the senior roles in the post-unique reference
09:30:26 [danbri]
e.g. 90238 is 3rd or 4th row
09:30:41 [danbri]
JeniT: a couple of observations
09:30:54 [danbri]
first is, two kinds of mechanisms for getting pointers between resources
09:31:02 [danbri]
one is thru primary and foreign key type mechanism
09:31:08 [danbri]
very database-oriented terminology
09:31:19 [danbri]
all primary key really says is that values in this column are unique
09:31:24 [danbri]
… each is different
09:31:45 [danbri]
foreign keys, or unique comb of values if multi-column, … must reference something that does exist in this other file
09:31:55 [danbri]
so quite a tight, validation oriented relationship
09:32:00 [danbri]
AND also we have
09:32:03 [danbri]
aboutUrl, valueUrl
09:32:08 [danbri]
these create the links in the output
09:32:15 [danbri]
for rdf and json generation
09:32:25 [danbri]
creates the urls for the things being identified in these files
09:32:35 [danbri]
could easily have an example that only had the one or had the other
09:32:48 [danbri]
gkellogg: in fact primary and 2ndary keys are not used in the transformation
09:32:54 [danbri]
ivan: there is a need for consistency
09:33:45 [danbri]
ivan: whatever is described in the f.key structure
09:33:52 [danbri]
vs what we use in the valueUri
09:34:00 [danbri]
you would expect those things would be essentially identical
09:34:05 [danbri]
woudl they ever differ?
09:34:20 [danbri]
jenit: usually they would match but it's a rope-and-hang-yoursel
09:34:26 [danbri]
f
09:34:35 [danbri]
gkellogg: consider two tables, ...
09:34:41 [danbri]
see http://piratepad.net/URwa3CM9Vv
09:38:13 [danbri]
we… used diff
09:38:18 [JeniT]
q?
09:38:25 [danbri]
ivan: to be clear, avoid misunderstanding, ...
09:38:39 [danbri]
… source of misunderstanding, … column names used for interlinking
09:38:41 [Zakim]
Zakim has joined #csvw
09:38:46 [danbri]
you invite him-her-it
09:39:13 [danbri]
gkellogg: … what we have is a primary key in the senior but not in the junior
09:39:32 [danbri]
(jeni takes to whiteboard)
09:39:51 [danbri]
jtandy: in junior schema how do you know that it is reportsTo in the senior?
09:40:20 [danbri]
ivan: template refers always to the local table value
09:40:28 [danbri]
…has nothing to do with the reportsTo of the other table
09:40:45 [danbri]
gkellogg::better to update the example accordingly?
09:40:53 [danbri]
jenit: let's start with something super simple
09:41:04 [danbri]
ivan: e.g. i would remove the reportsTo of the senior table
09:41:39 [danbri]
jenit: it is useful to have that example
09:41:59 [danbri]
danbri: at some point it crosses over into domain datamodel validation e.g. "no reporting cycles" problem isn't our problem
09:42:45 [gkellogg]
scribenick: gkellogg
09:43:03 [gkellogg]
jumbridge: this requires that columns have names?
09:43:13 [gkellogg]
iherman: yes, but there are defaults.
09:43:48 [gkellogg]
jumbridge: so, there may be a name or a title, but name is best for creating a reference.
09:44:41 [gkellogg]
… perhaps we should use “id” someplace in the metadata to show that this is an identifier?
09:45:06 [gkellogg]
jenit: I’d like to stay close to Data Package.
09:45:58 [gkellogg]
DavideCeolin: If I have two tables and want to say one refers to the other, I can do it using small markup in the FK specification.
09:46:21 [gkellogg]
jenit: we’ll go into issues.
09:46:27 [JeniT]
https://github.com/w3c/csvw/issues/16
09:47:23 [gkellogg]
JeniT: Andy brought up the difference between “strong linkage” in databases, with strong validation requirements for the FK to find the reference, and the “weak linkage” in the web where something may not exist.
09:48:00 [gkellogg]
… He as concerned about not having to resolve URLs to validate links. When you’re at the process of generating them, they likely don’t exist anyway.
09:48:19 [gkellogg]
danbri: granularity was an issue as well, depending on there the data comes from.
09:49:00 [gkellogg]
JeniT: We have two mechanism, the first you have control by knowing what is coming together and having control of the metadata and are better able to make a strong statement about validation when using such cross-references.
09:49:29 [gkellogg]
… We also have the “weak linking” generation on demand where there is no check. It’s up to the metadata author to know what to use.
09:49:57 [gkellogg]
iherman: we have to define what a validation is expected to do. In this case, we probably require only weaker validation?
09:50:26 [gkellogg]
JeniT: When there is a primary key then a validator must verify that all referenced data exists and that all primary keys are unique.
09:51:20 [gkellogg]
jumbrich: so this allows just mapping one data without necessarily mapping the other.
09:51:57 [gkellogg]
iherman: you don’t have to check if the values in a column using an FK are actually present in the other table. The two tables are consistent in the roles example, as they do exist.
09:52:16 [gkellogg]
jtandy: if you declare it as an FK you must check that it exists. if you use a valueUrl, you don’t need to check.
09:52:51 [gkellogg]
… Because strict validation is a “beast”, you can only use the references within a single TableGroup.
09:53:18 [danbri]
( if you want some examples with multi-table keys, https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/chinook )
09:54:41 [gkellogg]
jeniT: there is a subtlety in the examples ...
09:55:14 [gkellogg]
… In real life, there is a government office that says all departments need to publish senior and junior roles, and all adhering to the same schema.
09:56:24 [gkellogg]
… They also define a list of departments, with say name of department, and website.
09:58:14 [gkellogg]
… When departments publish the senior/junior roles pairs, the “dept” column will typically all be the same pointing to the identifier of a particular department, so the FK needs to reference the departments.csv file.
09:59:10 [gkellogg]
… The TableGroup then needs to reference the departments CSV and schema.
10:02:24 [jumbrich]
jumbrich has joined #csvw
10:02:38 [gkellogg]
iherman: the person creating the description probably shouldn’t say what data not to export.
10:02:55 [gkellogg]
gkellogg: But, this could be specified in user-deined metadata, and is undercontrol of the user.
10:03:28 [gkellogg]
jumbrich: I might want to refer to other resources without pulling them in.
10:04:29 [gkellogg]
JeniT: the closest thing we have is to use the same table group to describe related resources and generate the URL for a “team” in any output. Youl would then use that URL to reference the team, for references and identification.
10:05:09 [gkellogg]
jumbrich: I might have a relation table, and a couple of tables where things are used, and I might want to point to something for additiona information.
10:05:48 [DavideCeolin]
DavideCeolin has joined #csvw
10:06:10 [gkellogg]
… I might be able to build search on top of the metadata where I could use FK information to infer information about the various tables.
10:06:18 [gkellogg]
jtandy: I think that’s a normal FK relationship.
10:07:25 [gkellogg]
iherman: there’s also a difference between what a validator and a transformer will do.
10:08:23 [gkellogg]
… The FK spec is conceptually disjoint from the valueUrl and transformation. The FK is only there for validation.
10:08:46 [jumbrich]
jumbrich has joined #csvw
10:09:06 [gkellogg]
jtandy: if you use PKs, that might change how you serialze.
10:09:19 [JeniT]
https://github.com/w3c/csvw/issues/16
10:09:55 [gkellogg]
JeniT: FK references are for validation purposes...
10:14:10 [jumbrich]
jumbrich has joined #csvw
10:14:45 [gkellogg]
danbri: what do we say about the results of being invalid? Are we creating a culture so that things can’t proceed if they’re invalid.
10:16:05 [gkellogg]
JeniT: a validator may work in strict and lax modes, where it fails at the first problem when strict, but just reports all issues encountered when lax.
10:18:02 [JeniT]
https://github.com/w3c/csvw/issues/31
10:18:09 [danbri]
rrsagent, pointer?
10:18:09 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T10-18-09
10:18:29 [danbri]
rrsagent, make logs public
10:18:36 [gkellogg]
jtandy: this looks out of date now, I suggest close as “expired”.
10:18:47 [gkellogg]
iherman: it will come back if we have a “skip” flag.
10:19:07 [danbri]
"Should primary keys be skipped from cell level triple (or k/v pairs) generation? #31"
10:19:15 [gkellogg]
JeniT: if you use valueUrl, you only get ???
10:20:16 [JeniT]
https://github.com/w3c/csvw/issues/130
10:21:25 [gkellogg]
jtandy: Alain has provided some alternate JSON structure that uses identifiers as properties rathern than an array.
10:22:38 [gkellogg]
… If you didn’t define a PK, there’s not necessarily one thing that is unique, and such an index structure is available.
10:23:03 [gkellogg]
… We agreed that PK is for validation, but necessarily only for validation.
10:23:27 [gkellogg]
JeniT: this is the purpose of aboutUrl, which _may_ be associated with the PK, but not necessarily.
10:23:53 [gkellogg]
jtandy: the index and object works for some, but Tim Robertson seemed to object.
10:25:14 [gkellogg]
JeniT: I think we should only define one JSON output for ease of scope.
10:25:59 [gkellogg]
jtandy: so the “standard” publishing mechanism is an object per-line, and converting to an ‘indexed’ mechanism is “triveal”, and outside the scope of the spec.
10:26:33 [gkellogg]
… We may say that implementations could have alternate output forms.
10:27:00 [danbri]
('templating and transformation'?)
10:27:33 [gkellogg]
iherman: I like to have a conceptual similarity between the JSON and the RDF transformations, and for the time being they are quite similar.
10:27:52 [danbri]
rrsagent, pointer?
10:27:52 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T10-27-52
10:31:50 [JeniT]
https://github.com/w3c/csvw/issues/66
10:32:32 [danbri]
"Composite primary keys and foreign key references #66"
10:33:14 [gkellogg]
jtandy: for exmple my PK may be based on givenname & familyname, and you’re making stuff up as you go along.
10:33:47 [gkellogg]
JeniT: you can use aboutUrl to combine such columns together to get what you want.
10:34:56 [gkellogg]
… You can’t say that one column points to two values, but you can create an aboutUrl which uses both name and a valueUrl in the other to create the same reference. It works for RDF, but not for validation.
10:35:34 [danbri]
rragent, pointer?
10:35:55 [danbri]
rrsagent, pointer?
10:35:55 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T10-35-55
10:36:50 [gkellogg]
danbri: if you had postal codes in each country, then the combination of country code and postal code will be unique.
10:40:31 [gkellogg]
jtandy: TableGroups contain resources and may contain schemas? (yes)
10:41:32 [gkellogg]
JeniT: because there are two different types of FK references you might make (departments example), one always points to the same resource, and the other to different values based on cell values.
10:55:51 [ivan]
Topic: URLs and metadata
10:56:18 [jumbrich]
jumbrich has joined #csvw
10:57:09 [JeniT]
https://github.com/w3c/csvw/issues/74
10:57:36 [JeniT]
https://github.com/w3c/csvw/issues/74#issuecomment-72854167
11:00:42 [JeniT]
https://github.com/w3c/csvw/issues/191
11:02:13 [JeniT]
diverted onto https://github.com/w3c/csvw/issues/91
11:02:55 [ivan]
https://github.com/w3c/csvw/issues/191#issuecomment-73497474
11:09:43 [danbri]
gkellogg: in json-ld … there are rules for term expansion
11:09:54 [danbri]
… the prefix expansion is more naturally dealt with as part of #91 than this.
11:10:03 [danbri]
What we're doing here is saying it is a URL template property
11:10:08 [danbri]
when you apply template, result is a string
11:10:14 [danbri]
which in #91 will be made into an url
11:10:21 [danbri]
jenit: fear we'll get stuck on exact wording
11:10:27 [danbri]
… can we capture direction of the resolution
11:10:32 [danbri]
… will ref #91
11:10:37 [danbri]
… and editor action will be needed
11:10:48 [danbri]
capturing basic thing, … these properties are string properties
11:11:42 [danbri]
from piratepad, copying:
11:11:52 [danbri]
resolved: The order of processing is as described in https://github.com/w3c/csvw/issues/191#issuecomment-73497474https://github.com/w3c/csvw/issues/191#issuecomment-73497474http://piratepad.net/ep/search?query=issuecomment-73497474. These properties are string properties, the URL template is expanded first. Any resolution (ie expanding prefixes & resolving against a base URL) is done after that expansion. Editor action to make this so.
11:12:02 [JeniT]
https://github.com/w3c/csvw/issues/91
11:12:12 [danbri]
"What is default value if @base is not defined in the metadata description #91"
11:12:14 [gkellogg]
s/#91/#191/
11:13:46 [danbri]
jenit: bunch of issues …
11:13:56 [danbri]
how link urls which are bases are resolved
11:14:09 [danbri]
how url templates following their templates, what base url they get, how they are then treated, what base url gets used on that
11:14:25 [danbri]
and then whether we want to provide some level of control within the urltemplates to enable people to expand based on a different base url
11:14:29 [danbri]
1st - link properties
11:14:33 [danbri]
like reference to the csv files
11:14:47 [danbri]
those link properties should be resolved in the same way that they are resolved in json-ld
11:14:53 [danbri]
i.e. if there is an @base in the context, use htat
11:14:59 [danbri]
otherwise metadata doc in which that link is found
11:15:04 [danbri]
requires to you expand them prior to merging
11:15:11 [danbri]
or keep track of where original comes from
11:15:23 [danbri]
gkellogg: that's where language in merge now says
11:15:37 [danbri]
before merging both A and B make any link URIs absolute relative to the base of that metadata
11:15:48 [danbri]
ivan: isn't there also a language about merging the @base?
11:15:56 [danbri]
gkellogg: for @base there is
11:16:00 [danbri]
works pretty much like object merging
11:16:05 [danbri]
ivan: but then why do we merge @base?
11:16:16 [danbri]
gkellogg: point is that after normalizing, context isn't necessary any more
11:16:20 [danbri]
ivan: let's make that explicit
11:16:34 [danbri]
… conceptually every metadata file needs to be normalized before merged
11:16:46 [danbri]
gkellogg: @base and language can dissapear
11:16:57 [danbri]
you still need the default metadata since that is how you define prefixes etc
11:17:11 [danbri]
ivan: i don't think we do that
11:17:19 [danbri]
jenit: they're never explicitly put inthe @context
11:17:29 [danbri]
… gregg is saying that conceptually there is such a context
11:17:46 [danbri]
and if you are using basic json-ld processing, then implicitly we'd pull in everything from that context doc
11:17:54 [danbri]
gkellogg: need not be just implicit
11:17:59 [danbri]
we need to figure out what we want to do
11:18:17 [danbri]
jenit: you were both in agreement that the @base and the @lang were redundant by the time you had gone through the normalization
11:18:22 [danbri]
ivan: that's correct
11:18:31 [danbri]
gkellogg: but there is a conceptual or virtual base url of the metadata
11:18:38 [danbri]
besides an explicit @base declaration
11:18:43 [danbri]
jenit: yes, the location of...
11:18:48 [danbri]
gkellogg: or the 1st in a set, ...
11:18:54 [danbri]
jenit: that, I don't, ...
11:19:02 [danbri]
ivan: comes back to #199
11:19:43 [danbri]
jenit: i think we agree that the link properties are resolved against the base url, maybe the @base from the context, or it may be the location of the metadata file, during normalization of the metadata file, and prior to merge.
11:19:55 [gkellogg]
[[[If the property is a link property the value is turned into an absolute URL using the base URL.]]]
11:20:00 [danbri]
jenit: 2nd piece of this, is what happens to these url templates
11:20:09 [danbri]
these can't get expanded until you are actually processing data
11:20:19 [danbri]
at which point you have your merged metadata as basis of what you are doing
11:20:33 [danbri]
if you have lost your base url, or not got, what to resolve against becomes tricky
11:20:44 [danbri]
also - jtandy's 1st assumption, that those would be resolved against url of the csv file
11:20:51 [danbri]
so when you had template like #rownum=5
11:20:59 [danbri]
then that would be ref to something within the csv file
11:21:05 [danbri]
not relative to any of the metadata files it might be in
11:21:13 [danbri]
which raises the usability perspective, ...
11:21:27 [danbri]
… it might be better for the url templates to be ref'd against the csv file
11:21:35 [danbri]
to have that as the default
11:21:54 [danbri]
gkellogg: i won't stand in way, but am not enthusiastic
11:22:03 [danbri]
… you can always avodi trouble by having absolute urls
11:22:22 [danbri]
jtandy: we just need to be clear on what happens when not an absolute url
11:22:41 [danbri]
ivan: raising q: is it not confusing for authors, that we have 2 diff ways of absolutising urls
11:22:48 [danbri]
depending on whether they are link properties or templates
11:23:01 [danbri]
… a completely diff approach would be that we don't do this under normalization
11:23:07 [danbri]
instead use the table url just like for templates
11:23:21 [danbri]
jenit: how do you resolve the table url? that's the link property
11:24:18 [danbri]
gkellogg: json-ld has an url expansion algo
11:24:43 [danbri]
… nominally each json-ld doc has a location which can overide @base
11:24:52 [danbri]
...
11:25:16 [danbri]
if we say it is undefined, this would be the only doc (format) i've dealt with in which you start off with a base and then lose it along the way
11:25:50 [danbri]
ivan: talking about confusing, … that means I get a merged metadata, and the various templates in that metadata will expand differently
11:25:59 [danbri]
… the templates will expand depending on where they come from
11:26:06 [danbri]
gkellogg: no, there's a single base url notionally
11:26:11 [danbri]
ivan: then i don't understand the issue
11:26:24 [danbri]
gkellogg:I think we said it's the csv file it is expanded against
11:26:32 [danbri]
that's what i reacted to , saying that this is weird, …
11:27:06 [danbri]
jenit: [missed]
11:28:14 [danbri]
discussion of detail of mess starting with the csv file vs metadata
11:28:33 [danbri]
jtandy: key issue to my mind, uri templates only get expanded once you've done all the merging, ...
11:28:38 [danbri]
… only at that point,
11:28:44 [danbri]
gkellogg: only at row processing stage
11:28:55 [danbri]
jtandy: … templates get expanded, … urls get resolved, …
11:29:12 [danbri]
gkellogg: which we're saying is the expanded url property of the table
11:29:17 [danbri]
jtandy: at least we always know what that is
11:32:41 [danbri]
jtandy: to clarify, this is for the metadata doc, and by time we get to conversions, this will all have been expanded?
11:32:42 [danbri]
[yes]
11:33:39 [phila_reception]
phila_reception has joined #csvw
11:33:40 [danbri]
jenit: do we in abstract table data model need url in each cell not just value
11:33:46 [danbri]
i.e. what you'd get from value url
11:33:52 [danbri]
gkellogg: that is the value of the cell
11:33:55 [danbri]
jenit: no
11:34:04 [danbri]
-> example in piratepad
11:34:28 [DavideCeolin]
DavideCeolin has joined #csvw
11:34:31 [gkellogg]
scribenick: gkellogg
11:35:37 [gkellogg]
iherman: just to clarify, linkproperty values can be CURIEs/PNames
11:36:27 [JeniT]
https://github.com/w3c/csvw/issues/121
11:37:29 [danbri]
gkellogg: discussion of expanding urls, we talked about json-ld, then asked about URL spec
11:37:37 [danbri]
reason for that is that url spec doesn't deal with prefixes
11:37:42 [gkellogg]
scribenick: danbri
11:37:50 [danbri]
ivan: spec-wise it is fine, but if i read that doc it is like some of the HTML5 specs
11:38:05 [danbri]
jenit: does it specify the behaviour that we want it to specify
11:38:13 [danbri]
… there is no other good url spec to reference
11:38:28 [danbri]
jenit: i think it is at least consistent to point to the json-ld one
11:38:42 [phila]
phila has joined #csvw
11:38:45 [danbri]
ivan: that's why i asked what i asked. back then it went into a whole set of things that were v json-ld specific, with prefixes etc.
11:38:48 [danbri]
…that was my fear
11:39:08 [danbri]
… it goes into all kinds of detail on context processing
11:39:27 [danbri]
gkellogg: we are using a context, we have one defined that defines all of our terms, that is the one used when expanding these values
11:39:36 [danbri]
jenit: let's defer this, maybe discuss over lunch, ...
11:39:54 [danbri]
gkellogg: if we choose something else let's say it is intended to be consistent with json-ld iri expansion
11:40:09 [danbri]
ivan: one thing it does introduce, … and we do not, is issue of syntax for bnode identifiers
11:40:16 [danbri]
gkellogg: but we can constrain the value space...
11:40:37 [Zakim]
Zakim has left #csvw
11:41:00 [danbri]
jenit: suggest resolve as "we'll summarize the algo from json-ld spec, extract bits that are relevant, and say it is intended to be consistent with the spec
11:41:06 [danbri]
gkellogg: yes, can do that
11:41:20 [danbri]
… re bnodes i think it is intent of group to avoid using a bnode syntax where URIs can be used
11:41:40 [danbri]
ivan: maybe we need some sort of appendix
11:41:50 [danbri]
saying this is json-ld compatible, but with these-and-these restrictions
11:41:57 [danbri]
e.g. that we restricted what can go into a context
11:42:08 [danbri]
… that we have restricted yesterday the evlaution of common properties, etc.
11:42:17 [danbri]
… i.e. there are a number of places where we restrict json-ld
11:42:23 [danbri]
[general agreement]
11:42:57 [danbri]
resolved: We will summarise the expansion processing that is necessary for our purposes, and say that it is intended to be consistent with JSON-LD IRI expansion. We do have some restrictions on what IRIs can be used, eg we don't allow blank node syntax.
11:43:20 [danbri]
topic: Conversion issues
11:43:34 [danbri]
from https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02#Friday_13th_February
11:43:45 [danbri]
will revisit after lunch.
11:44:03 [danbri]
topic: Conversion Details
11:44:33 [JeniT]
https://github.com/w3c/csvw/issues/83
11:44:43 [danbri]
Extension Conversions: #83 "Possible error in "optional properties" for Template Specifications: source #83"
11:45:03 [danbri]
jenit: this is about when we have these extension conversions, we have said we want to enable extensions to work on results of a conversion we have already defined
11:45:10 [danbri]
e.g. we have already defined json and rdf
11:45:19 [danbri]
… can we make e.g. a post-processor that sits on top of the RDF
11:45:26 [danbri]
maybe it might use SPARQL CONSTRUCT
11:45:42 [danbri]
(the use of XML in the orig issue was a typo)
11:45:58 [danbri]
this lead to q of what the source looks like for post processing
11:46:07 [danbri]
gkellogg: how does this relate to accept headers?
11:46:20 [danbri]
e.g. my impl creates an abstract graph
11:46:37 [danbri]
… Accept: can turn into a prioritized list of formats
11:46:47 [danbri]
seems like the type of thing that a tabular data processor might do
11:47:16 [danbri]
danbri: assumes an HTTP REST deployment model?
11:47:23 [danbri]
ivan: seems like an impl detail not relevant here
11:47:34 [danbri]
… more … if you want Turtle, this is the processor you can use, etc etc
11:47:44 [danbri]
options of tools or http or online tools … i dont think we should go there
11:47:57 [danbri]
ivan: only thing, what in metadata descr params need to be specifiable
11:48:17 [danbri]
gkellogg: seems reason why Accept has a prioritised list, so you get something you can handle even if not best
11:48:32 [danbri]
jenit: in my head, the source thing here was only taking 2 values
11:48:45 [danbri]
… and when you said post-processing woudl be delivered an rdf graph
11:48:52 [danbri]
you wouldn't be specifying
11:48:59 [danbri]
you might never serialize
11:50:38 [danbri]
danbri: not comfortable assuming all in memory / API access, unix pipe model is quite likely
11:50:45 [danbri]
...
11:50:56 [danbri]
jenit: you (jtandy) are assuming serialized output?
11:51:03 [danbri]
jtandy: i'm v happy saying we don't serialize
11:51:33 [danbri]
that json stays just in memory
11:51:42 [danbri]
gkellogg: i believe json in memory defined in ecma
11:54:36 [danbri]
gkellogg: diff between target format and template format?
11:54:47 [danbri]
… mustache vs RDF
11:55:34 [danbri]
jenit: either you'd be operating over the rdf using a mustache template, or to create rdf/xml, would be a basic thing ...
11:56:07 [danbri]
danbri: would fancy alternate mappings always use json or rdf mappings? or sometimes raw?
11:56:12 [danbri]
jenit: can go back to the base also
11:56:27 [danbri]
fwiw this was the closest we got to a demo using R2RML : https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/attempts/attempt-1/metadata.json
11:56:34 [danbri]
https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/events/attempts/attempt-1
12:56:37 [jumbrich]
jumbrich has joined #csvw
12:58:10 [gkellogg]
gkellogg has joined #csvw
12:58:53 [jtandy]
jtandy has joined #csvw
12:59:03 [danbri]
scribenick: danbri
12:59:05 [danbri]
topic: Conversions
12:59:23 [danbri]
jenit: given that we have abouturls, property urls etc etc, i.e. pretty flexible way of making triples from a row in the table...
12:59:23 [DavideCeolin]
DavideCeolin has joined #csvw
12:59:33 [danbri]
…what does this imply in terms of what else is needed to be flexible about that structure
12:59:37 [danbri]
or should we be constraining it
13:00:00 [danbri]
https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02#13:00_-_14:30_Conversion_Details
13:00:05 [danbri]
issue #66 already closed
13:00:19 [danbri]
so https://github.com/w3c/csvw/issues/66 does not need discussion
13:00:53 [danbri]
https://github.com/w3c/csvw/issues/64
13:00:57 [danbri]
"Suppression of columns in mapping #64"
13:01:05 [ivan]
ivan has joined #csvw
13:01:16 [danbri]
jtandy: sometimes in the stuff you want to push out through RDF or JSON conversion, you might not want all of the cols in the tabular data to appear in the output
13:01:24 [danbri]
I would just like to be able to say "don't include this column"
13:01:34 [danbri]
… seemed trivial but ppl objected
13:01:50 [danbri]
gkellogg: [missed]
13:02:01 [danbri]
… re naming, we mix hyphens and CamelCase
13:02:05 [danbri]
jtandy: should be CamelCase
13:02:15 [danbri]
s/jtandy/jenit/
13:02:16 [danbri]
so "table-direction" is wrong
13:02:27 [danbri]
jtandy: so that was my requirement, it would be cool if you could do that
13:02:44 [danbri]
jenit; and properties
13:02:57 [danbri]
jtandy: Gregg's suggested optimzation for skipping an entire table, it could be an inherited property
13:03:08 [danbri]
so you could say it up at the table level, schema...
13:03:19 [danbri]
gkellogg: suppressing table would handle all its cols
13:03:26 [danbri]
ivan: strictly speaking this is not the same
13:03:33 [danbri]
because if I have common properties
13:03:37 [danbri]
if i say I skip the table
13:03:48 [danbri]
if i refer back to this AM's discussion, i want to supress the generation of everything
13:03:53 [danbri]
if i have a flag on a table, is fine
13:04:07 [danbri]
… if just a space keeper for all the cols, you would generate common properties
13:04:11 [danbri]
jtandy: you are correct
13:04:16 [danbri]
therefore we should have a suppress col
13:04:20 [danbri]
gkellogg: i don't see that
13:04:27 [danbri]
if it is on the table that is how it is interpreted
13:04:38 [danbri]
ivan: let's not conflate the interpretation of this
13:04:51 [danbri]
jenit: surely having the same property does not ...
13:05:16 [danbri]
"this suppresses the conversion output from the thing that it is on" would be a fine def, to avoid having repeated similar terms
13:05:31 [danbri]
ivan: but I might want to do what I said earlier, just common properties
13:07:17 [danbri]
resolved: We will introduce a `suppressOutput` property, on individual resources or on columns, which would mean that no output was generated from that table or from that column during a conversion. This is not an inherited property.
13:07:20 [danbri]
rrsagent, pointer?
13:07:20 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T13-07-20
13:08:07 [danbri]
jtandy: before we get to phantom cols, … aboutUrl on cols?
13:08:30 [danbri]
gkellogg: we resolved that aboutUrl etc are common properties
13:08:38 [danbri]
can appear in col, schema, …
13:09:16 [danbri]
ivan: there may be cells where the generated triples have a different subject
13:09:23 [danbri]
jenit: let's discuss that 1st
13:10:02 [danbri]
"whether it is useful helpful to have different about URLs on different cols …
13:10:14 [danbri]
jtandy: that would really help my use cases
13:10:21 [danbri]
q+ to agree a lot
13:10:29 [danbri]
… we need multiple entities per row
13:10:44 [danbri]
ivan: if we go there, fundamentally not against it, … the structure of the generated rdf needs rethinking
13:11:00 [danbri]
currently we make a predicate 'row' etc etc… this structure becomes meaningless
13:11:08 [danbri]
gkellogg: in average case it works out fine
13:11:21 [danbri]
way reads now, the row resource, iri is from 1st cell
13:11:32 [danbri]
jtandy: no, subject of row comes from aboutUrl in schema
13:11:46 [danbri]
jenit: purely what you generate as triples
13:11:48 [Zakim]
Zakim has joined #csvw
13:11:50 [danbri]
q+
13:11:56 [danbri]
jtandy: i believe this is an inherited property
13:12:05 [danbri]
so if you define it at schema level, …
13:12:19 [danbri]
[can't capture realtime and listen, backing off from detail]
13:12:57 [JeniT]
q?
13:13:06 [JeniT]
ack danbri
13:13:09 [danbri]
gkellogg: some times it does have value to use row
13:14:32 [danbri]
ivan: where do i put these extra triples?
13:14:40 [danbri]
jenit: "the output" :)
13:14:57 [danbri]
jtandy: if we are processing on a row-by-row basis, we look at those across a row that share a subject, and emit them together
13:15:09 [danbri]
the issue we have got is that the entities which are talked about lose an implicit relationship to the table they are in
13:15:24 [danbri]
jenit: what kind of relationship …
13:15:56 [JeniT]
https://github.com/w3c/csvw/issues/179#issuecomment-72072147
13:15:57 [danbri]
issue may be discussed in tracker under 'phantom col'
13:16:19 [danbri]
gkellogg: imagine a doap description of a software project, referencing a foaf description of a developer
13:16:48 [danbri]
… if there happens to be a spare column, e.g. foaf ID column out, i could put [missing detail]
13:17:12 [danbri]
ivan: i think you're conflating 2 different things
13:17:41 [danbri]
jenit: what is the proper relationship between the table in the rdf output
13:17:52 [danbri]
… vs the entities from the data
13:18:05 [danbri]
jtandy: at moment we say 'csv row'
13:18:26 [danbri]
jenit: i don't think it worked in 1st place
13:18:32 [danbri]
…tables rows are rows which describe things
13:18:38 [danbri]
e.g. a row might describe many things
13:18:51 [danbri]
so either you'd say, instead of csv:row property, you want 'describes'
13:18:53 [danbri]
isDescribedBy etc
13:19:04 [danbri]
… table describes all of the distinct subjects / entities
13:19:28 [danbri]
or you can do it by saying table contains rows, row describes entities
13:19:43 [danbri]
...
13:19:51 [danbri]
jenit: could be 2 rows talking about same entity
13:20:18 [danbri]
jtandy: in this case table is a kind of dataset
13:20:45 [danbri]
… mention yesterday that a table … if we defined CSVW 'Table' as a subclass of one of the dataset types e.g. dcat:Dataset
13:21:12 [danbri]
jenit: let's get to agreement on the q ivan posed, … do we want separate about urls on each column
13:21:49 [danbri]
resolved - jeni summarising
13:21:56 [danbri]
ivan: does it affect the json output?
13:21:58 [JeniT]
PROPOSED: aboutUrl is a property that goes on individual columns; different columns can generate data about different subjects
13:21:59 [danbri]
jtandy: yes
13:22:01 [danbri]
+1
13:22:07 [JeniT]
+1
13:22:07 [gkellogg]
+1
13:22:09 [DavideCeolin]
+1
13:22:10 [jtandy]
+1
13:22:11 [jumbrich]
+1
13:22:13 [ivan]
+1
13:22:18 [JeniT]
RESOLVED: aboutUrl is a property that goes on individual columns; different columns can generate data about different subjects
13:22:21 [danbri]
rrsagent, please draft minutes
13:22:21 [RRSAgent]
I have made the request to generate http://www.w3.org/2015/02/13-csvw-minutes.html danbri
13:22:31 [JeniT]
https://github.com/w3c/csvw/issues/26
13:22:42 [danbri]
https://github.com/w3c/csvw/issues/26 Rich Column Classes / Types (@type / @datatype on column) #26
13:22:42 [danbri]
13:22:51 [danbri]
jenit: about types of the entities being described by this row
13:22:57 [danbri]
e.g. each row about a Person
13:23:16 [danbri]
gkellogg: a phantom column, of course!
13:23:22 [danbri]
jenit: ok …
13:23:36 [danbri]
… short term answer would be to make a custom property, but let's discuss phantom columns now
13:23:36 [JeniT]
https://github.com/w3c/csvw/issues/179
13:23:50 [danbri]
https://github.com/w3c/csvw/issues/179 Do we need "phantom" columns, i.e., columns with their own and separate 'aboutUrl' value? #179
13:24:11 [danbri]
jenit: what problem does this solve?
13:24:37 [danbri]
gkellogg: problem that we have is that sometimes the information we want to have in our output, json or specifically rdf, … we might need info not exactly in the source CSV
13:24:42 [danbri]
e.g. that the rows describe People
13:24:56 [danbri]
we would therefore need a way to introduce data into the output on a row by row basis
13:25:03 [danbri]
a virtual column might allow us to do that
13:25:16 [danbri]
a table is defined by having some number of columns
13:25:32 [danbri]
if the table desc had more cols after the last real one from the csv, then notionally it would not retrieve a cell value
13:25:45 [danbri]
but we can through other means define ....
13:25:47 [danbri]
aboutUrl etc
13:26:01 [danbri]
that was what i was trying to accomplish
13:26:29 [danbri]
you go through each col, if there are more col records after last one, you go through … and if not in the csv, … you overide with default properties
13:26:32 [danbri]
to get literal values
13:27:06 [danbri]
jenit: i understand the goal, there's a q as this adds extra complexity, …
13:27:16 [danbri]
… the demonstration of type, to me, is proof that it is useful
13:27:21 [danbri]
the use of columns in that way concerns me
13:27:29 [danbri]
in that … the data changes if we add more cols to the data
13:27:45 [danbri]
if we start adding more cols to the data, multiple metadata files, some have extras, then we start to get conflicts
13:27:57 [danbri]
gkellogg: we could have isVirtual property set on the col
13:28:13 [danbri]
q+
13:28:32 [danbri]
jenit: maybe have this as a separate property on schema, beyond cols, e.g. "extras"
13:28:54 [danbri]
gkellogg: how does this look in annotated model?
13:29:18 [danbri]
jenit: q is whether to pretend that they are cells or not
13:29:33 [danbri]
gkellogg: virtual col could appear any place?
13:29:55 [danbri]
jenit: concerned about the merge
13:30:11 [danbri]
ivan: agree w/ jenit, that this somehow mis-uses something
13:30:15 [danbri]
cols are to describe cols
13:30:26 [JeniT]
q?
13:30:31 [JeniT]
ack danbri
13:33:39 [danbri]
new issue: (disagreement over triples per cell in case of array value)
13:33:57 [danbri]
jtandy: lots of more structured data, observations etc., you want often a more deeply nested structure
13:34:05 [danbri]
e.g. adding a virtual column could support this
13:34:07 [danbri]
get more nesting
13:34:55 [danbri]
jtandy: in a CSV file of weather observations… that is a product based view
13:35:14 [danbri]
… we might have 5 different 'observation' entities
13:35:21 [danbri]
…all share the same time
13:35:30 [danbri]
which is why humans flatten them in csv
13:36:09 [danbri]
(example in http://piratepad.net/URwa3CM9Vv )
13:40:10 [danbri]
(discussion of data cube use case)
13:40:25 [danbri]
(slices can have common properties, but then we have to tie those back to observations)
13:40:45 [danbri]
jumbrich: in data at uni, we have Org with a director with an Address
13:41:10 [danbri]
ivan: seems to work with what you have
13:41:25 [danbri]
jenit: usually to make things link together _and_ to be able to say it has a firstname, givenname etc
13:41:32 [danbri]
you can basically only get one triple per column
13:41:40 [danbri]
if you had 5 cols you get 5 triples
13:41:46 [danbri]
you get to define what the abouts and values are
13:41:57 [danbri]
gkellogg: the notion of the virtual col is to have more control
13:42:09 [danbri]
jumbrich: e.g. row1 person has a first name and a last name, …
13:42:18 [danbri]
(example in piratepad)
13:42:52 [danbri]
event example is https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/source/events-listing.csv
13:43:04 [danbri]
expected triples: https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/output/expected-triples.txt
13:44:15 [danbri]
gkellogg: another hacky way to do this
13:44:19 [danbri]
multiple tables
13:44:28 [danbri]
hijack diff cols in diff table mappings
13:44:50 [danbri]
jenit: yes a hack!
13:45:17 [JeniT]
q?
13:45:21 [danbri]
jumbrich: there are also these mapping languages, ...
13:49:50 [danbri]
discussion of a jenit proposal
14:01:59 [danbri]
jenit: option 1, out of scope
14:02:16 [danbri]
option 2, … most common is saying this thing descrbied by row is an Event, Person, et
14:02:16 [danbri]
c
14:02:25 [danbri]
so we could have a specialized handling for that
14:02:46 [danbri]
option 3, this stuff is v v useful, best way of doing that is to hook onto existing column based processing
14:02:55 [danbri]
just say we have phantom cols
14:03:18 [danbri]
4., we want to do this, but not use phantom cols but extra stuff within a col description
14:03:32 [DavideCeolin]
DavideCeolin has joined #csvw
14:03:40 [danbri]
my prefs: 3 or 4, no pref between them
14:04:17 [danbri]
jenit; either way we'll solicit wider feedback
14:04:21 [JeniT]
2
14:04:26 [jtandy]
3
14:04:27 [ivan]
2
14:04:55 [gkellogg]
3/4
14:05:51 [danbri]
gkellogg: 3 easier to impl
14:05:55 [danbri]
4 more complex
14:06:02 [danbri]
strongly against 2
14:06:18 [jumbrich]
3 or 4 (if we have typed colums or several entities between columns, we need something more)
14:06:35 [danbri]
3 is a hack but it's easy with potentially a huge win
14:06:43 [DavideCeolin]
3/4
14:07:29 [danbri]
jenit: ivan and i preferred it simple, everyone else went for the extra power
14:07:43 [danbri]
… and i accept value of that, esp 3 seems preferred
14:07:48 [danbri]
… let's try it and seek feedback
14:08:02 [danbri]
gkellogg: I think it will probably work
14:08:09 [danbri]
ivan: means at least virtual cols need a name
14:08:23 [danbri]
jenit: we'll pursue investigating use of phantom cols for generating
14:08:37 [danbri]
jenit: create a PR and we'll put in spec saying "we particularly seek feedback on this feature"
14:08:57 [danbri]
ivan: whatever we publish in a month will include phantom cols
14:09:14 [danbri]
jtandy: terminology?
14:09:16 [danbri]
Virtual
14:09:34 [danbri]
rather than Phantom
14:09:38 [JeniT]
PROPOSED: We will implement virtual columns for the next version of the spec, with an explicit request for comments.
14:09:43 [gkellogg]
+1
14:09:50 [jumbrich]
+1
14:09:51 [danbri]
+1
14:09:54 [JeniT]
+1
14:09:54 [ivan]
+1
14:09:56 [DavideCeolin]
+1
14:11:48 [danbri]
backup gist copy of piratepad, https://gist.github.com/danbri/30534e3c337b34520798
14:13:04 [gkellogg]
scribenick: gkellogg
14:13:13 [JeniT]
https://github.com/w3c/csvw/issues/58
14:13:33 [gkellogg]
How should class level qualified properties be transformed to JSON #58
14:14:43 [JeniT]
PROPOSAL: In JSON output, we do not expand property names into URLs.
14:14:48 [ivan]
+1
14:14:54 [gkellogg]
+1
14:15:22 [danbri]
+1
14:15:25 [DavideCeolin]
+1
14:16:18 [ivan]
RESOLUTION: In JSON output, we do not expand property names into URLs.
14:17:15 [JeniT]
gkellog: right now, the value of a csvw:row is a row URI, but there are now multiple entiities for each row…
14:17:39 [JeniT]
ivan: my understanding was that this was homework to work out the details
14:18:53 [JeniT]
https://github.com/w3c/csvw/issues/117
14:19:08 [JeniT]
https://github.com/w3c/csvw/issues/117#issuecomment-72898169
14:19:08 [gkellogg]
topic: Make the datatype mapping more precise #117
14:20:07 [gkellogg]
JeniT: columns describe datatype such as strings, dates, numbers. We could also have XML, HTML, JSON.
14:20:55 [gkellogg]
… embedded XML, HTML, JSON does exist in the wild. Embedded CSV is a nightmare!
14:21:57 [gkellogg]
… In the generation of the RDF, if the datatype is XML, the output should be an rdf:XMLLiteral, HTML: rdf:HTML, JSON: ???
14:23:15 [gkellogg]
… Three options, xsd:string, csvw:JSON, process JSON as we process common properties.
14:23:31 [danbri]
can we emit base64 as data uris, e.g. data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO 9TXL0Y4OHwAAAABJRU5ErkJggg==
14:24:10 [JeniT]
danbri: I think so, through the URL template, yes
14:24:28 [danbri]
(yes, looks doable to me too, just wanted sanity check - thanks)
14:24:31 [gkellogg]
iherman: I spent some time to define a JSON datatype; the problem is that formally speaking, you need to define L2V for the datatype.
14:25:12 [gkellogg]
… there are discussions at IETF on doing this, but there is no universally accepted way to do it.
14:26:05 [gkellogg]
… My feeling is that we shouldn’t define such a datatype.
14:26:32 [gkellogg]
… A datatype means that I have a property RDF datatype definition, which we can’t do.
14:28:37 [gkellogg]
JeniT: options on table: process JSON as a common property, output in RDF with xsd:string, or with csvw:JSON, where that is defined as a subClass of xsd:string
14:30:00 [gkellogg]
jtandy: usually, when people do this it’s to use in a GUI, and is not intended for interpretation. I don’t want to pick out embedded output.
14:30:11 [gkellogg]
JeniT: this leaves options 2 and 3.
14:32:30 [JeniT]
PROPOSED resolution: datatype: json gets mapped to RDF literal with a datatype of csvw:JSON which is a subtype of xsd:string
14:33:11 [gkellogg]
+1
14:33:30 [jumbrich]
+1
14:33:33 [jtandy]
+1
14:36:41 [danbri]
+1
14:51:51 [danbri]
scribenick: danbri
14:52:02 [danbri]
topic: Overflow Time
14:52:07 [danbri]
topic: conflation
14:52:33 [danbri]
ivan: metadata has an @id etc., considered as a json-ld thing, result is an rdf graph, where everythting is hanging on the subject, whose url is this one
14:52:43 [danbri]
jenit: and your understanding is… that the @id is for the graph, … or?
14:52:58 [danbri]
ivan: it is a bunch of rdf statements, whose subject is [this url]
14:53:05 [danbri]
[general agreement so far]
14:53:14 [danbri]
ivan: from this metadata thing, we also generate a bunch of rdf statements, ..
14:53:27 [danbri]
… as we describe, which includes the rows, the things jtandy has described
14:53:47 [danbri]
my understanding is that yesterday we said that the url for this, is … this
14:53:58 [danbri]
in fact we get, for the same subject, …
14:54:19 [danbri]
in current world, … means that we attach on to the same subject, a bunch of additional triples which have nothing to do with what we want here
14:54:35 [danbri]
what i claim is that these two things should be different
14:54:41 [danbri]
we have to have an explicit statement here
14:54:53 [danbri]
that gives a home … to give a subject for what we generate from CSV
14:55:07 [danbri]
jenit: what I don't understand is why you make the assertion that things about blah there, arent about blah there
14:55:22 [danbri]
ivan: [here this here something missed ]
14:55:41 [danbri]
gkellogg: my u/standing is that all of those properties _are_ the table
14:55:48 [danbri]
and all of those properties are properties of the table
14:56:00 [danbri]
and something similar to inference rules add triples based on interpreting the csv
14:56:08 [danbri]
where i believe ivan is coming from, and i also feel
14:56:17 [danbri]
what the metadata description is, is a description of the table
14:56:21 [danbri]
used to create the tabular data model
14:56:26 [danbri]
which is though, a different entity
14:56:36 [danbri]
therefore when we say Common Properties, and copying them over, …
14:56:56 [danbri]
… i think from Jeni's perspective, you are not copying them, you are just expressing them with some discrimination e.g. skipping notes and schema
14:57:13 [danbri]
gkellogg: whereas my view + i think ivan's, … we could go …. [missed]
14:57:25 [gkellogg]
https://github.com/gkellogg/rdf-tabular/blob/feature/de-conflate-metadata/spec/data/tree-ops.csv-metadata.json
14:57:27 [danbri]
… to be unequivically of the table and not the metadata
14:57:40 [jtandy]
jtandy has joined #csvw
14:57:59 [danbri]
ivan: we require an explicit thing that is different
14:58:05 [danbri]
… jtandy raised this ages ago
14:58:24 [danbri]
jtandy: i was just happy to establish that things in the @table were about the table
14:58:36 [danbri]
i did not have burning need to talk about the table description itself, who wrote it, etc.
14:59:24 [danbri]
(gkellogg talks us through https://github.com/gkellogg/rdf-tabular/blob/feature/de-conflate-metadata/spec/data/tree-ops.csv-metadata.json )
15:00:36 [danbri]
… "… in we chose to create such a distinction this would be a reasonable way"
15:00:47 [danbri]
jenit: querying this, … url is probably poperty of the table
15:00:55 [danbri]
… and tableSchema is the schema of the table
15:01:02 [danbri]
gkellogg: now i am understanding your view a bit more
15:01:07 [danbri]
the metadata thing … is the schema
15:01:19 [danbri]
if ivan wanted to make statements about the metadata, it could be in the schema
15:01:26 [danbri]
see orig version of this file, …->
15:01:34 [gkellogg]
https://github.com/gkellogg/rdf-tabular/blob/develop/spec/data/tree-ops.csv-metadata.json
15:02:02 [danbri]
this has url, common properties, and tableSchema
15:02:15 [danbri]
i understand if we put common props in the tableSchema they won't come out via conversions
15:03:06 [danbri]
...
15:04:04 [danbri]
gkellogg: we're not copying over, so much as serializing this alongside rules based on the referenced csv file
15:05:12 [danbri]
ivan: .. what the rdf gen doc does is additional, but common properties are already there
15:05:16 [danbri]
should be made v clear in the doc
15:05:20 [danbri]
for me it was absolutely not clear
15:05:42 [danbri]
...
15:05:54 [danbri]
jenit: if you have dc:title on tableGroup you have it for the whole set , not inherited down
15:06:03 [danbri]
gkellogg: there is no description on the table group as such
15:06:17 [danbri]
ivan: in grander scale, you talk about CSV files as being part of the Linked Data cloud or world, ...
15:06:30 [danbri]
… my view until now, the metadata creates link between that cloud and CSV files which are in some form RDF
15:06:43 [danbri]
but in fact that is not what happens
15:06:58 [danbri]
… what we describe is some sort of an inference
15:07:47 [danbri]
topic: Conversion to RDF
15:07:57 [danbri]
looking at PROV
15:08:32 [danbri]
http://w3c.github.io/csvw/csv2rdf/
15:08:35 [danbri]
section 3.1.x
15:08:48 [danbri]
issue #147
15:08:52 [danbri]
#147
15:09:22 [danbri]
jtandy: as it is useful to understand how a set of info is created, and we discussed including PROV, … this section of csv2rdf is based on a suggestion in those discussions
15:09:48 [danbri]
prov:generated <[RDF Output Location]>;
15:09:51 [danbri]
…hard to know
15:10:02 [danbri]
prov:startedAtTime [Start Time];
15:10:02 [danbri]
prov:endedAtTime [End Time];
15:10:02 [danbri]
… for activities
15:10:14 [danbri]
and it had a usage, which was a csv file, … etc.
15:10:20 [danbri]
see also 2nd example further on.
15:10:40 [danbri]
ivan: see https://github.com/w3c/csvw/issues/174
15:10:51 [danbri]
Slight modification of the provenance structure for RDF output #174
15:11:04 [danbri]
ivan: this shows eg a bit different, … you bind it to table with activity
15:11:09 [danbri]
i was looking at prov vocab and examples
15:11:16 [danbri]
… here it was generated by an activity, ...
15:11:29 [danbri]
whether that info was useful or not is a separate debate
15:11:35 [danbri]
i think that is more correct
15:11:48 [danbri]
davide: … this kind of info was what i was looking for
15:11:53 [danbri]
may not be v useful in many cases
15:12:01 [danbri]
but sometimes can help you find problems
15:12:14 [danbri]
ivan: this is what i generate now
15:12:25 [danbri]
jenit: you mention a way of capturing what metadata files were used
15:12:34 [danbri]
jtandy: you'd have a prov qualifiedUsage block
15:12:39 [danbri]
one for every metadata involved
15:12:47 [danbri]
gkellogg: except for the embedded metadata
15:12:56 [danbri]
ivan: i have here a slightly more complex one
15:13:10 [danbri]
(adding to https://github.com/w3c/csvw/issues/174 )
15:13:41 [danbri]
prov entity has a bunch of csv files
15:13:46 [danbri]
jtandy: so it is a list of entities
15:13:55 [danbri]
jenit: i don't know what the correct usage is
15:14:02 [danbri]
… here this is an activity that has two prov usages
15:14:07 [danbri]
one of which has multiple entities
15:14:42 [danbri]
jenit: even though there are multiple metadata files, ...
15:14:48 [danbri]
ivan: problem is, ...
15:15:51 [danbri]
discussion of whether optional
15:15:54 [danbri]
how to test
15:15:56 [danbri]
esp with times
15:16:55 [danbri]
gkellogg: only thing problematic for automated testing, is inclusion of timestamps
15:17:08 [danbri]
jenit: whether that is problematic or not depends on how we define those tests
15:17:23 [danbri]
gkellogg: we got a lot of rdfa impl feedback that we made testing hard
15:17:46 [danbri]
ivan: here we have 2 metadata files that exist and can be referenced
15:17:57 [danbri]
but default and user metadata, passed on,… how do we describe them
15:18:03 [danbri]
davide: i was thinking about that
15:18:27 [danbri]
gkellogg: maybe a bnode??
15:18:31 [danbri]
danbri: do we have a UC for this?
15:18:45 [danbri]
jenit: are there any specs that generate proveance automatically
15:19:02 [danbri]
jenit: would it be terrible if left implementation defined
15:19:47 [danbri]
ivan: prov docs can be hard to read but a good primer
15:21:05 [danbri]
danbri: provenance super useful in v detailed scientific scenarios, but we can't define that … let's point them at prov
15:21:15 [danbri]
jenit: to facilitate that, fix some prov roles
15:21:29 [danbri]
csvw:EncodedTabularData and csvw:tabularMetadata
15:21:37 [danbri]
… we may need to think about those more
15:21:42 [JeniT]
https://github.com/w3c/csvw/issues/174
15:21:57 [danbri]
jenit: suggesting that https://github.com/w3c/csvw/issues/174 ("Slight modification of the provenance structure for RDF output") be resolved as …
15:22:28 [danbri]
(discusssion that examples are non-normative)
15:22:30 [JeniT]
PROPOSAL: We suggest that implementations may choose to include provenance information and include an example of what it might look like.
15:22:39 [danbri]
rrsagent, pointer?
15:22:39 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T15-22-39
15:22:47 [danbri]
+1
15:23:20 [gkellogg]
+1
15:23:38 [danbri]
jenit: the use of the prov info will really determine how much depth needed, … so am inclined to leave it impl-defined.
15:23:50 [danbri]
gkellogg: for testing, implementations should have a way to disable outputting prov
15:23:52 [JeniT]
+1
15:23:57 [jumbrich]
+1
15:23:57 [jtandy]
+1
15:24:00 [ivan]
+1
15:24:19 [DavideCeolin]
DavideCeolin has joined #csvw
15:24:23 [DavideCeolin]
+1
15:24:26 [danbri]
RESOLVED: We suggest that implementations may choose to include provenance information and include an example of what it might look like.
15:24:40 [danbri]
jenit: on to prov roles
15:24:53 [danbri]
jtandy: i feel that is the right way fwd
15:24:59 [danbri]
raises q then about dcat distribution
15:25:09 [danbri]
i think important that we point to the csv where the stuff came from
15:25:13 [JeniT]
https://github.com/w3c/csvw/issues/147
15:25:15 [danbri]
jenit: but prov roles first
15:25:24 [danbri]
Prov roles #147.
15:25:31 [danbri]
"The CSV2RDF doc uses two values for prov:hadRole: csvw:csvEncodedTabularData andcsvw:tabularMetadata. This need to be defined as instances of prov:Role in the namespace. Are there other instance types we need to define? TSV, XLS, HTML?"
15:26:22 [danbri]
ivan: q is whether there are other roles
15:26:29 [danbri]
we defined yesterday validation vs generation processors
15:26:40 [danbri]
i used a ref to my own tool saying 'this is the guy that generated that'
15:26:45 [danbri]
maybe the validation is a diff role?
15:27:11 [danbri]
danbri: plugins for R2RML etc?
15:27:16 [danbri]
jenit: no, this is just for our bit
15:27:23 [danbri]
danbri: so they'd do their own prov? fine thanks
15:27:32 [danbri]
ivan: prov's way around reification is interesting
15:28:36 [danbri]
jenit: so on #147 we make it only applicable to the csv2rdf mapping, and assign Davide to the issue, commenting "discussed at f2f…" -> see https://github.com/w3c/csvw/issues/147
15:28:45 [danbri]
… davide and ivan to come up with a list of appropriate roles
15:29:32 [danbri]
https://github.com/w3c/csvw/issues/179Is the DCAT block useful in the RDF output. #177
15:29:32 [danbri]
15:29:54 [danbri]
jtandy: to give an unambig rel between dataset and outset, i inserted idea of using a dcat:distribution statement
15:30:00 [danbri]
file vs abstract data
15:30:08 [danbri]
however we could simply use the url property
15:30:12 [danbri]
jenit: or dc:source
15:30:22 [danbri]
danbri: also in schema.org
15:30:29 [danbri]
jenit: this is one mech to do it, … there are clearly others, ...
15:30:41 [danbri]
… introducing the dcat stuff gives us some baggage that might make some people flinch
15:31:03 [danbri]
ivan: this in #177 …
15:31:08 [danbri]
… is a json transform
15:31:34 [danbri]
jtandy: I generated it. Idea is that you would, while transforming, insert a bit of json magic
15:31:44 [danbri]
gkellogg: just a json not json-ld?
15:32:03 [danbri]
no, this is the rdf transformation …
15:33:04 [danbri]
ivan: i have no dcat experience
15:33:13 [danbri]
jenit: impl is that the table is a dataset in dcat terminology
15:33:21 [danbri]
which is so flexible as to mean anything
15:33:27 [danbri]
jtandy: you could insert as a common property
15:33:56 [danbri]
jenit: i think this falls under 'it's impl defined how you might define info about the provenance of this output graph'
15:34:04 [danbri]
you could use prov or dc:source or dcat or ...
15:34:13 [danbri]
gkellogg: so goes into same non-normative section
15:34:44 [danbri]
jenit: only thing, … using dcat:distribution def falls under impl-defined, only q is whether we want there to be a CSVW URL property to be in the rdf output
15:35:57 [danbri]
danbri: can't force people to publish e.g. intranet urls
15:36:05 [danbri]
jtandy/jenit: feesls more refined than just dc:source
15:36:17 [danbri]
… should csvw:url be a subproperty of dc:source
15:36:29 [danbri]
http://purl.org/dc/terms/source
15:36:36 [danbri]
"A related resource from which the described resource is derived."
15:36:41 [danbri]
"The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system."
15:37:23 [danbri]
-1 on subproperty
15:37:41 [danbri]
jenit: Suggestion: don't add dcat:distribution, but do have, in the generated RDF output:
15:37:41 [danbri]
_:table csvw:url <tree-ops.csv> .
15:37:50 [danbri]
jtandy: in json it would just be "url": ...
15:38:40 [danbri]
jenit: proposed - any impl of dcat properties is impl defined, but that we do try to preserve the link to original file through using csvw:url
15:38:43 [danbri]
+1
15:38:53 [danbri]
rrsagent, pointer?
15:38:53 [RRSAgent]
See http://www.w3.org/2015/02/13-csvw-irc#T15-38-53
15:39:03 [ivan]
rrsagent, draft minutes
15:39:03 [RRSAgent]
I have made the request to generate http://www.w3.org/2015/02/13-csvw-minutes.html ivan
15:40:20 [danbri]
(route to http-range-14: "Is the url the id of this thing or a different thing? discuss.")
15:41:23 [danbri]
jenit: two more issues we didn't get through
15:41:24 [danbri]
lists next
15:41:29 [JeniT]
https://github.com/w3c/csvw/issues/107
15:42:05 [danbri]
jenit: when we have a cell with a sequence, e.g. spaces, semicolons, … and the cell value then contains a sequence of values, ...
15:42:48 [danbri]
gkellogg: do we disagree? what about cells being only one triple?
15:42:55 [danbri]
jenit: what to do in these kinds of cases?
15:43:02 [danbri]
… what gets created in the rdf output?
15:43:03 [JeniT]
https://github.com/w3c/csvw/issues/107#issuecomment-72894468
15:43:08 [danbri]
json has arrays which are always ordered
15:43:21 [danbri]
rdf output has possibilities of generating an actual rdf list, … or you generate repeated properties
15:44:29 [danbri]
jtandy: content that lists are lists
15:46:01 [danbri]
danbri: [begs for a parameter for listyness, use case of nationality]
15:47:15 [JeniT]
PROPOSED: when a cell value is a sequence of values, it is converted to a rdf:List if ordered is true, and to multiple values for the same property if ordered is false; the default is that ordered is false
15:47:23 [DavideCeolin]
DavideCeolin has joined #csvw
15:47:24 [danbri]
+1
15:47:49 [JeniT]
PROPOSED: when a column defines a separator, cell values are converted to a rdf:List if ordered is true, and to multiple values for the same property if ordered is false; the default is that ordered is false
15:47:52 [danbri]
+☮
15:47:59 [ivan]
+0.999
15:48:00 [danbri]
+1
15:48:06 [gkellogg]
+1
15:48:07 [DavideCeolin]
+1
15:48:10 [jtandy]
+1
15:48:12 [jumbrich]
+1
15:48:24 [JeniT]
+1
15:50:07 [danbri]
(exit davide)
15:50:15 [danbri]
jtandy: who is going to update UC doc?
15:50:23 [danbri]
davide: ok, i'll…
15:51:53 [JeniT]
https://github.com/w3c/csvw/issues/35
15:54:30 [JeniT]
https://github.com/w3c/csvw/issues/94
15:59:48 [jumbrich]
jumbrich has joined #csvw
16:01:03 [JeniT]
[discussion about whether it’s possible/useful to have a default metadata document]
16:01:39 [danbri]
ivan: what we have now...
16:01:49 [danbri]
we normalze each metadata then we 2nd-normalize them
16:01:53 [danbri]
filling in missing bits like name
16:01:54 [JeniT]
ivan: we normalise the metadata files before merge, then we merge, then we add defaults (like name)
16:02:11 [danbri]
gkellogg: that's your view
16:02:30 [danbri]
… what's in there is consistent and does not require us to locate default metadata
16:02:41 [danbri]
ivan: I think more the q of how we define it, … an editorial issue
16:02:45 [danbri]
we do same thing
16:02:58 [danbri]
… i try to put the formulation of whole thing into metadata files,...
16:03:09 [danbri]
… at end of whole process we have another phase of normalization
16:03:18 [danbri]
which seems consistent with the current system
16:03:25 [danbri]
this is an editorial issue
16:03:39 [danbri]
jenit: i think perfectly reasonable to say 'normalization, merge, … '
16:03:45 [danbri]
…'completion' (ivan/jenit)
16:04:14 [danbri]
gkellogg: places we talk about property values to make sure they're [post-completion]
16:04:21 [danbri]
jenit: for each property we say 'if missing assume x'
16:04:27 [danbri]
ivan: name, details of dialect
16:04:43 [danbri]
jenit: we could be more disciplined providing more info throughout
16:05:11 [danbri]
(reminds me of https://en.wikipedia.org/wiki/XML_Schema_(W3C)#Post-Schema-Validation_Infoset …)
16:05:28 [danbri]
jenit: editorial action is to check property definitions are applied consistently
16:05:38 [danbri]
gkellogg: I tried this when looking at property values (in transform doc)
16:05:47 [jumbrich]
jumbrich has joined #csvw
16:05:51 [danbri]
i think it is ok. if not, there is some editor action.
16:06:22 [danbri]
ivan: i can do this, but when? all these changes pending
16:06:32 [danbri]
jenit: process from here is … lots of editor actions
16:06:36 [danbri]
push them all through
16:07:46 [danbri]
ivan: even my implementation needs reworking after all this
16:07:50 [danbri]
gkellogg: also our test cases
16:10:31 [danbri]
jtandy: I'll always have a propertyUrl defined?
16:10:32 [danbri]
(yes)
16:10:45 [danbri]
ivan: conversion docs will be cut by half
16:11:44 [danbri]
jenit: do we want to discuss '•Relationship between table group, table and schema" ?
16:11:57 [danbri]
jtandy: that will be resolved based on [other actions/decisions]
16:12:51 [danbri]
-topic cvwr:row
16:13:24 [danbri]
topic: Relationship in RDF output of conversion between csvw:Table and the entities generated from a row
16:13:33 [danbri]
ivan: dealing with lists is ugly
16:13:49 [danbri]
… which is why we pulled away and put in the row number
16:14:47 [danbri]
jenit: table has rows, … rows have row numbers, which describe entities, … the (possibly different/various) about URIs
16:14:54 [danbri]
('describes' or similar)
16:15:21 [danbri]
discussion of using RFC-7111 to point here with fragment IDs
16:16:08 [danbri]
gkellogg: i'm fine so long as i can turn it off
16:16:45 [danbri]
debate on whether we want to explicitly list an option
16:17:55 [danbri]
jenit: may as well be non-normative then, if optional
16:18:21 [danbri]
… related q: is it legal for the rdf conversion to include anything else it wants?
16:18:32 [danbri]
gkellogg: always should be ok, but should also be possible to turn off turnoffable things
16:19:04 [danbri]
levels of conversion-
16:19:16 [danbri]
gkellogg: including "that", rows etc
16:20:12 [danbri]
danbri: [something like named graphs, x3]
16:21:57 [jumbrich]
jumbrich has joined #csvw
16:22:05 [JeniT]
PROPOSAL: there are different levels of output from RDF and from JSON, which can be selected on user option. These are ‘minimal’ that produces only the data from the table, without reification triples, ‘standard’ which includes reification of tables & rows, ‘plus prov’ which includes provenance
16:25:21 [danbri]
+1
16:25:22 [danbri]
+1
16:25:22 [danbri]
+1
16:25:24 [danbri]
+1
16:25:26 [danbri]
+1
16:25:27 [ivan]
+1
16:25:27 [danbri]
+1
16:25:31 [gkellogg]
+1
16:25:34 [jumbrich]
+1
16:25:35 [JeniT]
+1
16:25:59 [JeniT]
RESOLVED: there are different levels of output from RDF and from JSON, which can be selected on user option. These are ‘minimal’ that produces only the data from the table, without reification triples, ‘standard’ which includes reification of tables & rows, ‘plus prov’ which includes provenance
16:26:24 [danbri]
rrsagent, please draft minutes?
16:26:24 [RRSAgent]
I'm logging. Sorry, nothing found for 'please draft minutes'
16:26:24 [ivan]
rrsagent, draft minutes
16:26:24 [RRSAgent]
I have made the request to generate http://www.w3.org/2015/02/13-csvw-minutes.html ivan
16:39:29 [jumbrich]
jumbrich has joined #csvw
16:39:53 [danbri]
topic: wrapup and actions
16:40:19 [danbri]
jenit: … aiming for another set of Working Drafts end of March, early April.
16:42:21 [danbri]
… handing of comments/suggestions
16:42:23 [danbri]
3 buckets:
16:42:38 [danbri]
small gramamticifical fixes, which should be made immediately with no fuss.
16:43:19 [danbri]
gkellogg/jtandy: let's stick with Pull Requests, just merge immediately
16:43:26 [danbri]
ivan: … and remove the branch?
16:43:34 [danbri]
gkellogg: I have a gk updates branch
16:44:43 [danbri]
jenit: 2nd bucket is where we have a resolved direction but you need someone else to review it please
16:44:58 [danbri]
… suggest create a PR specific for the particular issue
16:45:11 [danbri]
gkellogg: this is where the timeliness comes in, … you get blocked
16:45:17 [danbri]
jenit: no, git is good
16:45:30 [danbri]
debate over how badly things block
16:45:51 [danbri]
jenit: small atomic PRs that are quickly resolved
16:46:04 [danbri]
if you are not getting the review you need, … and will cause a problem with merge, then just merge it.
16:46:13 [danbri]
… and assign it to somebody
16:46:22 [danbri]
ivan: will I get an automatic email?
16:46:23 [danbri]
yes
16:46:58 [danbri]
gkellogg: "watching" setting for the repo helps
16:47:24 [danbri]
jenit: the other is around 'useful issues'
16:47:34 [danbri]
3rd category is "don't know what to do here"
16:47:41 [danbri]
i.e. there are some options
16:47:45 [danbri]
… keep them small and focussed
16:48:05 [danbri]
… try to provide what the options are, say what your proposed resolution is, … get some no. of +1s, sufficient for you to say if it is resolved
16:48:29 [danbri]
(discussion of avoiding digressions)
16:48:43 [danbri]
jenit: please avoid digressions in github
16:48:46 [danbri]
create a new issue
16:48:49 [danbri]
then link it
16:50:03 [danbri]
https://github.com/w3c/csvw/pulls?q=is%3Apr+is%3Aclosed
16:52:32 [danbri]
jenit: how many +1s on a proposed resolution are needed?
16:52:45 [danbri]
jenit: a working week with no -1s
16:53:33 [danbri]
rough consensus to use mailing list if blocked waiting for +1
16:54:19 [danbri]
jenit: let's use the existing 'requires discussion' github label
16:54:27 [danbri]
jenit: when we do our reviews try to have a read
17:38:40 [jumbrich]
jumbrich has joined #csvw
17:52:07 [Zakim]
Zakim has left #csvw
18:11:10 [jumbrich]
jumbrich has joined #csvw