W3C

– DRAFT –
DX WG meeting with DDI Alliance

30 October 2024

Attendees

Present
Achim, albertmeronyo, AndreaPerego, Asif, db, fatma, Franck, pchampin, peter, RiccardoAlbertoni
Regrets
-
Chair
-
Scribe
pchampin

Meeting minutes

introductions

db: director of the technical service of UK Data Archive, univ Essex. Sitting on the board of DDI Alliance, liaison with W3C. Also on the technical committee of DDI-CDI.

Achim: involved in a lot of DDI Alliance in the past. Currently still working on the DDI-CDI WG as an "independant expert", another term for "retired". Background in empirical social sciences.

albertmeronyo: ass. prof. at Kings College London. Did some unofficial work for W3C standards (implementations of CSVW, Linked Data Notifications, RDF Data Cube). Have used Linked Statistical Data for a long time.

AndreaPerego: one of the editor of DCAT, invited expert at W3C. Working on metadata in the last 10 years, different types of metadata. Working on making bridges, enforce metadata interop.
… One of the people involved in developing profiles for DCAT (e.g. GeoDCAT-AP, for geospatial metadata).
… Also profiles for research institutions.

Asif: working in the university of Galway. PhD in different SemWeb technologies, reasoning. Currently in charge of design principles in Data Spaces.

Franck: independant expert / retired from INSEE (French Stats institute). Worked on data-releted standards for a number of years, as a practitionner and occasionally as a contributor with DDI-Alliance and SDMX.

RiccardoAlbertoni: work at the Italian Council for Research. Reseach interest in Linked Data. One of the editors of DCAT. Invited Expert in the W3C Dataset Exchange WG.

<Asif> working as a postdoc researcher (data spaces) with Insight, University of Galway, Ireland. Previously PhD in Computer Engg, worked with semantic web technologies, knowledge modeling, reasoning, data interoperability and behavior modeling. Currently dealing with designing design principles for EU Data spaces. Keen to interact with DCAT, as it has

<Asif> become baseline for many data space connectors.

<Franck> Franck Cotton: Retiree from Insee, Worked a lot on standards (statistical standards or data and metadata), as a practitionner and also occasionally contributor, in particular for DDI, Unece standards and VTL

fatma: I'm a semantic web researcher in the French Geological Survey. Working on data interop, semantic web and how we can use this technology in our programs.

pchampin: staff contact of the DX WG

Variable Description

db: in DDI-CDI, the "variable cascade" is a way to describe variables in datasets.
… a three layer structure to describe all aspects of the variables described in a dataset.
… You don't have to use the 3 classes, but this provide a way to describe variables in a way that allow linkage.
… One of the main components of DDI-CDI which we want to put forward.

pchampin: Variable Description can be complementary with many W3C recs, CSVW, Data Cube, SOSA...

<RiccardoAlbertoni> w3c/dxwg#1426

RiccardoAlbertoni: we have some kind of description in DCAT, but not a way to describe datasets in terms of variable. I agree that there could be some connexion
… see link above an issue we raised to define this kind of future work.
… Also, could the google doc shared in Zoom we also shared on IRC?

<db> https://docs.google.com/document/d/10kpQg8QlZzTjqefDulzSqoVzk8aqQli__EEDxGd0bNg/edit?tab=t.0#heading=h.5dg8g7d5q5vb

fatma: I'm trying to read the doc right now

pchampin: this is very detailed, focus on the summary at the end

Achim: we started with DCAT, which is high level.
… DDI-CDI aims to be cross-domain, domain independant.
… We start by describing variable at a conceptual level, which is not present in, e.g., CSVW.
… It is sometimes useful to describe a variable more than as a header of a CSV file.
… There is a missing piece, which rich variable descriptions could fill.

albertmeronyo: to rephrase, I believe that this could be complementary with other standards.

peter: thanks albertmeronyo for this perspective. the Data Usage vocabulary is a light-weight attempt in that direction.
… Is the goal to provide a lightweight vocabulary that people can reuse with some imagination?

<db> https://ddi-cdi.github.io/ddi-cdi_v1.0-rc3/field-level-documentation/DDICDILibrary/Classes/Conceptual/InstanceVariable.html

peter: Or something more specific, but that may encounter corner cases?
… Is anyone using the Data Usage vocabulary?

<AndreaPerego> https://www.w3.org/TR/vocab-duv/

peter: maybe one of the things that we need to do is to bring people along.
… There are a number of programs (Go Fair, and others) that are providing training, to help people make their data FAIR.
… We tend to build something new rather than building what we have to the max.
… One more standard is not gonna work.

Achim: I think Variable Description could be a good addition to DCAT and Data Cube, ideally a new recommendation.
… It might be different from a one-to-one mapping of DDI.
… Some of the parts would be expressed in a different way from CDI.

peter: the notion of profile in DCAT was meant to be quite liberal.
… It could be as flexible as a document, and as specific as a schema / set of SHACL patterns.
… And I don't think that it is used enough.

albertmeronyo: reacting to what peter said about going too far and over-specifying.
… I didn't know about Dataset Usage and DCAT profile, not sure how these use case would overlap.
… I fully agree that reaching a compromise between saying too little and saying too much.
… But there may be an opportunity to think more about what variables are in datasets.
… Also abstract things like variable dependencies, variable roles.

pchampin: agree that "yet another standard" is not goal in itself,
… but the variable description links nicely with what we already have, and will help them to use them better

Franck: +1 to what has been said before.
… As a practionner, using SSN/SOSA, I've encountered the need to better describe variables.
… Some organization needs to tackle this question.

pwin: an important thing about a dataset is the granularity, quality, rights and obligations.
… There is often a big difference between the kinds of data used in different domains.
… We often use the word "data" very broadly.

Minutes manually created (not a transcript), formatted by scribe.perl version 238 (Fri Oct 18 20:51:13 2024 UTC).

Diagnostics

Succeeded: s/spatial DCAT-AP/GeoDCAT-AP, for geospatial metadata/

Succeeded: s/topic: Variable Descriptions/

No scribenick or scribe found. Guessed: pchampin

Maybe present: pwin

All speakers: Achim, albertmeronyo, AndreaPerego, Asif, db, fatma, Franck, pchampin, peter, pwin, RiccardoAlbertoni

Active on IRC: AndreaPerego, Asif, db, Franck, pchampin, pwin, RiccardoAlbertoni