meeting: DXWG DCAT Working Session teleconference 13 February 2019 21:00 UTC
agenda: https://www.w3.org/2017/dxwg/wiki/Meetings:DCAT-Telecon2019.02.13
regrets: Andrea Perego
scribenick PWinstanley
https://github.com/w3c/dxwg/issues?q=is%3Aopen+is%3Aissue+label%3Adcat+label%3Aversioning
DaveBrowning: there are useful resources in the links - esp the list of relevant github issues
https://github.com/w3c/dxwg/projects/9
... these have been tagged with 'versioning'
...Work done towards the beginning of the WG - alejandra did a review of versioning
... there are other notes
... and notes on using pav
https://lists.w3.org/Archives/Public/public-dxwg-wg/2019Feb/0208.html
... Makx sent a suggestion to the mailing list 21:13:04 +1 to Makx's suggestion 21:13:29 ... we need to take the position that it is not for DCAT to determine the point of change from one version to another for a dataset - this is established within the domain 21:13:58 ... but we need to provide a mechanism. proposed: we are not going to talk about why, when or where, but are talking about how
resolved: we are not going to talk about why, when or where, but are talking about how
DaveBrowning: the follow on: do we want to make an explicit statement about this?
DaveBrowning: So the question now is: how much effort does this task require
PWinstanley: Don't gold plate, go for coverage not depth
... simple illustrative case
... that shows something can be done, something that shows it can scale...
... maybe do a more complex case...?
Jaroslav_Pullmann: I am having difficulties understanding if we have a vision of versions,. Do we consider alternative distributions of different languages 'versions' or 'distributions'?
... what are our version properties?
DaveBrowning: last week we agreed to be loose in interpretation or definition of distributions. we are minimising the complexity of 'informational equivalence' leaving this to the the publisher
you can catch up on the update of the distribution definition at: https://w3c.github.io/dxwg/dcat/#Class:Distribution
Makx: reacting to the point of languages, this is a comon case, but I suggest the set of 6 scenarios, and I think these are for publisers to define what constitutes a version change
... we can say what the properties might be to support their choices
... but we need to leave it open to them to make the design decisions depending on their requirements
https://www.w3.org/TR/hcls-dataset/
https://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels
alejandra: I agree with Makx - but I thought the work we did with the HCLS profile for data sets might be instructive - see s.5 and the diagram that separates the data set description from the distribution and the version level description that alows one to describe the relations between data versions
alejandra: in the table, for each descrription level we specify the requirement of properties for each level
SimonCox: this still leaves it quite abstract in terms of scenarios
alejandra: the specific stuff for versioning is within the table 'provenance and change'
alejandra: using dct and pav attributes
riccardoAlbertoni: I like the approach of pav - a potential solution. the other approach is using qualified relations.
... I mention qualified relations because versioning is a relationship between datasets, and in DCAT we are already considering qualified relations. Covering versioning the same way is another possibility.
Makx: alejandra mentioned the HLCS - is the approach there to be the one we direct people to, or do we bring an extra class into DCAT?
alejandra: HLCS is in use, but in niche life sciences areas. Combined with riccardoAlbertoni comment, qualified relations are referenced
... I think it is better for DCAT to have its own version of this because we are covering a wider user. I'm not suggesting a specific solution ... yet. But my mention of this is to add it to the discussion of options.
We do already have 'is version of' as an example in the Qualified Relationships section: https://w3c.github.io/dxwg/dcat/#qualified-forms
PWinstanley: Might want a more low tech option, as well as the more sophisticated
chair: DaveBrowning
Jaroslav_Pullmann: it might not be possible to combine both approaches - the DCAT document itself might not be the place to describe these information. If I'm wanting just summary material what am I expecting to see
... we could try to draw out what this constellation might actually look like
... AFAIK the distributions in HCLS are not versioned, only the dataset.
alejandra: the dataset is abstract, the distributions are concrete and can come in different languages/formats/profiles
... the versions are of datasets only. For representation one doesn't need a separate class
Jaroslav_Pullmann: can we consider versioning in terms of effort - it is a lot of effort to describe a dataset. we can approach this on different levels of resolution
... perhaps we should take effort into account
DaveBrowning: Summary: lots of suggestions, but with the exception of riccardoAlbertoni and the qualified relations, we are circling around the problem
... I am still looking for a strong suggestion
riccardoAlbertoni: the example proposed in the google doc is a straw man
... perhaps we should try to keep it simple, suggesting pav as the first attempt. One issue I see is that we had another vocab which is not a W3C standard
... but this could be unproblematic. I also acknowledge that the reference to PAV is easier to realise in the short time we have
... The qualified relations could be part of an incremental approach for which plain PAV is the start. PAV itself uses qualified relations for complex patterns
DaveBrowning: one advantage of that approach is that we start drafting and become more elaborate as we move forward
DaveBrowning: what examples might we want to use?
Jaroslav_Pullmann: Makx already mentioned 2 scenarios where the data sets are versions of the 'summary level'
... but we can have versions at distribution level too as an example
Makx: in my message last week there were plenty of illustrations that we can use to test the current DCAT and evluate to see if anything is missing
DaveBrowning: are you expecting qualified relations modelling?
Makx: these are 'real world' examples from DCAT-AP work. The language one can be done with different distributions under the same dataset.
It would be easy to add more examples like https://w3c.github.io/dxwg/dcat/#qualified-relationship and make it explicit that the related resources are of type=dcat:Dataset
... but there area annual budgets for different periods, - there are many options, but we need to be able to point which one follows which
... going through these examples will be instructive
... we either model different dataset versions, or different versions of distributions
... we need to discover in real stuff what works and what doesn't. At the moment we are not discussing concrete things, just general stuff
SimonCox: in the contributions I've made I find real examples most helpful - I used the CSIRO data repo
... in most cases it has uncovered niggles
... let's descend to concrete examples
... I also drop examples into the 'examples' folder of github, we can place them there
https://github.com/w3c/dxwg/tree/gh-pages/dcat/examples
https://www.w3.org/TR/hcls-dataset/#appendix_1
alejandra: I agree about the examples The HCLS example from the chemical compounds database doesn't fit DCAT.
... in addition to examples and UC as proposed by Makx , perhaps we should also consider what queries we would like the metadata to answer
... we can only attach qualified relations or PAV properties to dataset level
(more examples coming when https://github.com/w3c/dxwg/pull/730 is merged :-) )
... we need to determine which domains require these properties
Jaroslav_Pullmann: I think this will remain inconsistent because of the choices of the publishers. I think both (dataset/distribution) levels might be applicable
Makx: I don't see that we are concerned with inconsistency. we can create new datasets, or new distributions under the same dataset. Wejust need to say which properties need to be used for each case. We don't need a singular view of everything, but we need to say that if you want to do A then do this, and B then do that
Jaroslav_Pullmann: if we have a set of properties that migt be used on either dataset or distribution level then the querying might yield confusing results
Makx: people are doing these things, so we have to roll with it.
... but we can suggest routes
riccardoAlbertoni: the discussion suggests to me that we are discussing issue #93
... saying that it is up to the user to determine the subject of the versioning
... it might be any first-class object from the DCAT vocab
alejandra: if we support this then we are leaning towards a solution that will combine properties and qualified relationships
... This needs to be illustrated with our examples
Jaroslav_Pullmann: searching - creating models leads to diversity, but queries will need to be able to establish the type of versioning pattern
considering the time series data, which is one of the use cases that Makx listed, DCAT-AP represents it using hasPart and no reference to versions: https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series
... this is up to the exploration of the patterns applied to the metadata
alejandra: Makx - re: the link of how DCAT-AP does this with annual budget data. AFAIK there is no reference to version, but to dataset parts
... please can you (Makx) point to how DCAT-AP handles versions
"Additionally, DCAT-AP allows relating datasets as 'versions' using dct:hasVersion/dct:isVersionOf but it is not clearly described in which cases to use these properties."
Makx: we looked and couldn't find an agreed approach.
... CKAN thought it was ridiculous to have different distributions. It is only visible on the screen, there is no metadata.
The `dcat:qualifiedRelation` has domain `dcat:Resource` and range `dcat:Relation` which carries the property `dct:relation` which can point to anything
... my point was that W3C was going to resolve it (Us?!!)
DaveBrowning_: can we summarise the conversation about qualified relations?
riccardoAlbertoni: there are diverse properties that relate to this area between DCAT and PROV, but I am uncertain that it is totally appropriate to our needs
Makx: we need to have the qualified relation to express the exact version. we also need to be cautious about how deep we go into this. in the library world there is this issue of complexity in book revisions. sometimes it is not just version 1,2,3, etc, but sometimes there are additional free notes. we need to ensure that any solution we achieve is reasonable and fits peoples' needs
... some basic approach migth be a good way forward, then to increase the complexity and see how it fares
Jaroslav_Pullmann: we have discussions on 2 levels of solution level
... on the lower slopes then simple properties are enough, but for more complex situations qualified relations
... but we need to agree to what entities we would apply a version . we need to provide hints or definite advice.
... if we cannot easily provide these then this is the problem to solve.
SimonCox: asking Makx - it sounds like you're identifying a gap and perhaps with enough examples using the properties we have available is another property giving a version something that would meet your requirements
Makx: in DCAT-ap there is a version indicator and a version note - these are the simple requirements met, but eqyally not the only way to do this. I was wanting to discover how much precision we need to bring into this, because we were working for a long time and agreement was hard to reach
... so perhaps we should not do that effort
SimonCox: anytime there is a property and some explanation, it is another class - a more complicated pattern
more than one property grouped together == a class
alejandra: In the google doc is a diagram - if we have 2 versions of a dataset we want to descrie their relations. but these might have different distributions. we need to be able to relate the datasets / or the distributions. so as to decide which is the next version
... we want to give people freedom, but we need to give them the properties to express these relationships
the google doc link is: https://docs.google.com/document/d/1fApxJIotapugde-hyS2lmsElNO3mLvoi7nLqDYJQZ7g/edit
Makx: I cannot see the diagram, but rather than saying to people how to do things, the DCMI Terms versionOf does the job.
... we cannot expect people to do what 'we' think
alejandra: yes, but we need to provide guidance
... we need a position on the best , cleanest way of doing this
Makx: I agree, - the examples I have are ones that we might want to say something about
riccardoAlbertoni: I agree with the idea of allowing the user to do what they want.
and let's not forget dcat:Resource and services!
... the user should decide when to apply versioning. On the issue of simplicity I take a diifferent line to Makx - the idea of adding the same qualified pattern will add one pav term to model any possibility. it is just a matter of judicious choice of the term
Jaroslav_Pullmann: my summary - support for simplicity; for the drawing which shows the degree of freedom people have;
... the modelling pattern is the individual decision of the publisher
even if we allow freedom, we should guide through a few patterns
Jaroslav_Pullmann: we can describe options , as alejandra did, which do not break the structures
I just took a look at schema.org - it has fairly weak support for versioning, only a version designator https://schema.org/version, which is not tied into a link to another thing, and https://meta.schema.org/supersededBy which is only one possible versioning relationship
... and is not in the core
... and is only related to model constructs not datasets
DaveBrowning: bringing us back to the recommendation, do we expect to talk much in the rec - or is it there to provide some illustrations of versioning and we accept that publishers will develop their own styles?
alejandra: we could discuss and illustrate riccardoAlbertoni point - there are vocabularies, so let's decide which might be used for our examples
DaveBrowning: tbh, pav not being a W3C standard is an advantage - there is more than one provider, and this shows strength in the approach
Makx: wen doing DCAT v1 there was pushback from W3C for using DCT, but PAV is referred to in DWBP so there is no problem referencing it
riccardoAlbertoni: I don't know if using PAV is a problem, but in DWBP PAV is provided as an example only, not a recommendation
Jaroslav_Pullmann: do we have a gap I suggest we go through examples and that will illustrate any gaps. 22:41:42 +1 to Makx and Jaroslav_Pullmann about examples 22:42:02 q? 22:42:52 DaveBrowning: families of examples: are the ones from Makx good? Nobody suggests otherwise... are there any others? 22:43:13 q+ 22:43:31 ack Makx 22:44:05 Makx: there is serial versioning and parallel versioning. 22:44:15 I like Makx categories 22:44:32 q+ 22:45:00 q+ 22:45:00 ... I don't know if it is an issue that we take into account 22:45:17 ack PWinstanley 22:47:11 ack Jaroslav_Pullmann 22:47:59 Jaroslav_Pullmann: I support this vision - it is to do with obsolation. One obsoletes the other. I want to know what is current 22:48:24 q+ 22:48:29 ack Makx 22:48:36 supersedes? 22:49:21 Makx: Jaroslav_Pullmann brings up a number of points - there is sequencing where members are equally valid. each requires a different set of functions. 22:49:29 we see many 'versions' in simulation and forecasting datasets, all of which are 'valid' for different functions 22:49:35 ... There can be replacement. 22:50:11 Jaroslav_Pullmann: we are reaching the crucial point of the version - to let the client indicate the current shape of the dataset 22:50:32 ... we should support this axis of interest using the most appropriate means 22:50:52 ... these may inform the gap analysis. 22:51:33 q? 22:52:55 s/obsolation/obsolescence 22:53:51 q+ 22:53:57 ack Jaroslav_Pullmann 22:54:30 'supersedes' is more common term than 'obsoletes' 22:54:48 (unless there is a nuance I'm missing) 22:54:48 Jaroslav_Pullmann: the different patterns could be described, and we could give a minimal requirement of how each pattern might be expressed 22:55:11 q? 22:56:09 +1 to sprint if we have examples in the meanwhile 22:56:14 q: are people still getting value from the sprint approach? 22:56:22 I got less value from this sprint, because no concrete proposal on the table 22:56:40 +1 to SimonCox view 22:56:40 ... need some wording, a document section ... 22:56:45 q+ 22:56:49 ack Makx 22:57:14 Can Jaroslav_Pullmann draft a starting point? 22:57:39 +q 22:57:53 Makx: I think Jaroslav_Pullmann did a concrete proposal. Whatever you do , provide version information, version indicator, version notes. if you think it is the dataset that has changed, then apply to dataset. if distribution, then apply to the distributions. 22:58:13 for the basic structure, we might as well refer to https://www.w3.org/TR/dwbp/#dataVersioning 22:58:27 ... going further than that (e.g. annual budget data) then provide examples of handling these more complex cases. 22:58:45 .... I think that what Jaroslav_Pullmann proposed takes us the first step of the way. 22:59:11 ... but we need some concrete proposals, and if we have those then we will clean up the work quickly 22:59:42 ... We don't need the sprint to create the proposal though 22:59:45 q? 23:00:09 alejandra: what Jaroslav_Pullmann proposed is similar to DWBP 23:00:36 1= sprint; 2= meeting as normal 23:00:44 we need a concrete proposal about this 23:00:55 1 23:01:10 no sorry 2 23:01:24 +1 for meeting (2), since too late 23:01:39 1= sprint around a concrete proposal; 2= meeting as normal 23:01:51 2 23:01:55 2 23:01:58 2 23:01:59 2 23:02:01 ... until there is a concrete proposal on versioning ready 23:02:06 +1 (if we have proposals to discuss) - +2 otherwise 23:02:30 rrsagent, draft minutes v2 23:02:30 I have made the request to generate https://www.w3.org/2019/02/13-dxwgdcat-minutes.html SimonCox 23:02:45 yes 23:03:11 bye thanks for the interesting discussion 23:03:16 rrsagent, create minutes v2 23:03:16 I have made the request to generate https://www.w3.org/2019/02/13-dxwgdcat-minutes.html PWinstanley 23:03:24 thank you! 23:03:29 RSAgent, draft minutes v2 23:04:26 we have a section here: https://w3c.github.io/dxwg/dcat/#dataset-versions 23:05:03 rrsagent, create minutes v2 23:05:03 I have made the request to generate https://www.w3.org/2019/02/13-dxwgdcat-minutes.html PWinstanley 23:05:04 thanks! 23:05:10 present 23:05:12 present- 23:05:42 rrsagent, create minutes v2 23:05:42 I have made the request to generate https://www.w3.org/2019/02/13-dxwgdcat-minutes.html PWinstanley 23:06:12 bye! 23:06:12 rrsagent, create minutes v2 23:06:12 I have made the request to generate https://www.w3.org/2019/02/13-dxwgdcat-minutes.html PWinstanley