<SimonCox> switch items 5 & 6
<SimonCox> https://www.w3.org/2018/06/28-dxwgdcat-minutes
<roba> +0
<SimonCox> +1
+0 (absent, sent late regrets)
<Jaroslav_Pullmann> +1
Resolved: Approve minutes from last meeting
<SimonCox> https://github.com/w3c/dxwg/issues/256
<SimonCox> ack: alejandra
SimonCox: issue discussed 3 weeks ago
<SimonCox> alejandra: what about when distributions are bags-of-files?
<SimonCox> ... do we need to define an entity 'bag of files'
<SimonCox> ... sibling to dcat:Distribution
I meant entity File
rather than bag-of-files
roba: I was going to raise the relationship with other use cases
… the case of SOAP services
… the payload returned is wrapped inside a document
… there is a general need to describe both the packaging and the internal content separately
… one way is to say that a distribution conforms to multiple profiles
… what the wrap containers are
… I'm sure there are other approaches as well
… multiple solutions for this problem
<Jaroslav_Pullmann> Pattern from IDS: [content]->[representation: format + compression etc.]->[artifcat: materialization as file]
Jaroslav_Pullmann: in Genoa we talked about a pattern
… from IDS
… representation - the syntax, how data is structured in terms of syntactical data types, media types, compression
… if we are talking about files, we have to note artifacts
… artifacts as materialization as file
what is IDS?
<Zakim> SimonCox, you wanted to comment on how much abstraction vs. solving immediate problem
<Jaroslav_Pullmann> IDS: https://www.fraunhofer.de/en/research/lighthouse-projects-fraunhofer-initiatives/industrial-data-space.html
SimonCox: I'm hearing roba and Jaroslav_Pullmann pointing out that we are talking about a special case of a more general problem
… motivation when proposing this use case was dealing with a legacy issue
… common issue with existing catalogues
… as they weren't design to distinguish distributions
… in the wild repositories often ask people depositing data to give an archive or a set of files
… I'm a little bit nervous about loosing the initial common concern
… alejandra has spotted something important
… the solution I proposed has missed the representation of the entity file
… on further reflection I don't think every distribution would be a file
… what the relationship between a distribution and a file might be?
Jaroslav_Pullmann: my reference to IDS was related to alejandra's concept of file
… cannot we described as it is done in ADMS?
… it supports nesting of datasets
… a legacy file, why not use this pattern
… dataset that has distribution
… ADMS included asset
<Jaroslav_Pullmann> I was referring to this predicate for purpose of composing "bag" of files: https://www.w3.org/TR/vocab-adms/#adms-includedasset
<SimonCox> alejandra: DCAT does not have granularity required
<Zakim> SimonCox, you wanted to point out that dct:relation could also manage partonomy (dataset) relations
granularity for describing the contents of a distribution
the relationship between bag-of-files and distribution is key
<SimonCox> https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#class-dataset
as we need clear guidelines on when to use one or the other
and a distribution itself may be a bag-of-files
so potentially we need some recursive representation
<SimonCox> See in usage note "One of the more specific sub-properties should be used if the semantics of the link are known."
<SimonCox> and 'See also:
… dct:conformsTo, dcat:distribution, dct:hasPart, dct:references, dct:requires'
SimonCox: showing the current PR with a potential representation
SimonCox: I'm motivating this from cases I've seen in catalogues
… including some documentation, perhaps a schema, files that are parts of a whole dataset
… as well as alternative representations
… subproperties of dct:relation
<Jaroslav_Pullmann> the usage note provides a sensible explanation, +1 for using "dct:relation" in case we don't know about the details
SimonCox: trying to address alejandra's concern
<Zakim> alejandra, you wanted to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
SimonCox: both in usage note and notes, the relationship should be used in the semantics are known
… are you looking for a stronger instruction to users
<SimonCox> alejandra: we need specific examples to illustrate recommended patterns
<SimonCox> ... from CKAN, other repositories
<riccardoAlbertoni> +1 to have stronger language
reminder about the comment on the list https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
SimonCox: yes, we need to deal with the issue of manifest
… perhaps the solution is to run some experiments
… and using some examples
… and working up with increasing sophistication
roba: there are a couple of overlapping concerns
… how individual distributions bundle things
… needs to be separated from a dataset as a set of files
<riccardoAlbertoni> yes
<riccardoAlbertoni> i think so
roba: is there something saying that distribution is disjoint of a dataset
<riccardoAlbertoni> No i think they are disjoint
roba: is the problem the separate concepts of dataset or distribution
SimonCox: maybe I should have put files
… I'm looking at CKAN and CSIRO data access portal
… I think it is call collection in DAP
… when a person adds a dataset to a repository can add multiple files
… different representations of a dataset as a whole
roba: the issue is that dataset and distribution are conflated
… then surely the packaging is a platform specific choice
… certain platforms can choose a dataset
SimonCox: the issue is that there will be a lot bag of files
roba: another case of qualified relation problem
SimonCox: there are some first class relations
… dcat:distribution
… subproperties of dct:relation, it might have been done as qualified relations
… if you don't know the semantics of the relationship
… and you're not sure if it is a distribution
… use a dct:relation
SimonCox: we need to give people a recommendation when they don't know what the relationship is
roba: I don't think it is restrictive to legacy
… it is a common problem
SimonCox: at the mo, there is nothing on the DCAT spec to say people how to deal with this common problem
roba: there ought to be a note to say if there is no specific semantics, use a qualified relationship
SimonCox: how to qualify it if you don't know the relationship?
roba: you could put some note
SimonCox: we're trying to provide a mechanism
… alternative to distribution
… CKAN does it wrong
… because we don't tell them how to represent it
… for people that are using dcat:distribution incorrectly
<Zakim> alejandra, you wanted to say about dataset and distribution abstraction and evolution of catalogues
SimonCox: I'd defer the suggestion of a qualified relation
<SimonCox> alejandra: is Distribution actually a kind of Dataset? Did DCAT do a conflation?
<Zakim> SimonCox, you wanted to comment that definition of dcat:Distribution as _representation _ needs clarifying
I raised the issue about evolution of catalogues
what if a dataset was a bag of files
and now the same dataset is given in another representation
SimonCox: Jaroslav_Pullmann in Genoa was discussing about tighten up the definition of dcat:Distribution as a representation
… then some of the files I'm talking about in this case, if they are parts of a dataset, might be reasonable also model as representation of other datasets
… but the general problem you're discussing goes away if we consider a Distribution as a representation
Jaroslav_Pullmann: this would break a lot of things
… people wouldn't bother about the distinction
… between abstract data and syntax
Jaroslav_Pullmann: the proposed solution was replying to the idea of file
Jaroslav_Pullmann: I think we have a viable solution
… that wouldn't break anything
… it would help people to find files within the catalogue
Jaroslav_Pullmann: what are the use cases for finding datasets
question about evolution of catalogues
<SimonCox> SimonCox asks alejandra: what is relationship between dcat:Distribution and dcat:File?
when you have a dataset as a bag-of-files and then the dataset is expanded with a new representation
Jaroslav_Pullmann: are not we breaking the crucial distinction between abstract concept and concrete file
… we are talking about composites
… we have the wrapper file that is a dataset that is called a boundary
… archive file
… ADMS has further notes on the dataset
… a schema would be a dataset
<SimonCox> When I wrote 'bag of files' in the UC, I meant that there woul dbe links from the Dataset intances to each of the files in the bag, but that the dcat:distribution predicate was incorrect for some members of the bag
Jaroslav_Pullmann: I don't see the problem here if we adopt the distinction between dataset and distribution
SimonCox: I thought we had those cases covered
… hasPart to point to another dataset
… conformsTo to point to a schema
Jaroslav_Pullmann: we should not omit the concept of dataset
<SimonCox> https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_part
SimonCox: probably we should go back to alejandra's proposal about giving examples
… graduated set of examples
+1
<Jaroslav_Pullmann> +1 for looking at how this modeling applies to concrete (composite) examples
Action: SimonCox to construct examples to show usage of Dataset -dct:relation etc
<trackbot> Sorry, but no Tracker is associated with this channel.
Action: Jaroslav_Pullmann to construct examples of relations from real catalogs
<trackbot> Sorry, but no Tracker is associated with this channel.
Action: alejandra also to develop examples of dct:relation etc
<trackbot> Sorry, but no Tracker is associated with this channel.
<riccardoAlbertoni> bye, thanks a lot for the interesting discussion
<DaveBrowning> Very valuable, and constructive...
thanks, and bye!
<Jaroslav_Pullmann> bye!
Succeeded: s/any/every/
Succeeded: s/granulatiry/granularity/