Schemata Discussion - Follow up from TPAC23

Meeting minutes

Intro

ek: We can start
… welcome everyone
… session is called Schemata Follow Up from TPAC2023
… Jan is taking minutes
… will skip the introduction round
… due to large turnout, please introduce yourself before speaking
… meetings are under two polices: Antitrust and competition policy, encourage competition, furthermore we encourage a good work environment
… context is to share experience and find a place within W3C for discussion
… required background is some knowledge of SHACL and JSON-LD

Presentation

<kaz> Slides

ek: Maybe we have met before, we have hosted a session with Pierre-Antoine before at TPAC

<betehess> new

<marcelotto> new

<VladimirAlexiev_> new

<pebran> new

ek: if you haven't been there, please write "new" into the IRC
… however, I also prepared a brief intro
… there a few new people here, I see

ek: We have slides from previous sessions, which you can look at

ek: As quick summary, you have different kinds of schema approaches, and if you have a specification that uses different concepts, it becomes hard to manage
… in WoT TD, for example, we have the spec document itself
… ontology documents
… SHACL shapes
… JSON Schema files
… type and class definitions in TypeScript
… tests and examples
… all need to be managed, updated and published
… we have some tooling, but we still have to do some manual work
… soon, we will also have a registry for Binding Documents, where authors will also face these issues

ek: Previous presentations were given by Chris Mungell and @@@

<VladimirAlexiev_> see json-ld/yaml-ld#19 for more "polyglot modeling" approaches/frameworks

ek: The work so far included an analysis by the WoT WG
… concerning versioning, packaging, and serving resources
… we can discuss how to continue this in the last 10 minutes of this slot

ek: Mahda did most of the work for this presentation actually, but she is currently not available
… all resources are available on GitHub

ek: So far, we were creating a very complicated diagram summarizing the very complicated process that has to be done with every PR
… for example, the JSON Schema needs to be updated or rendering needs to be triggered
… we are not very proud of it, it is quite messy

ek: We have then been looking into alternative tooling to make our lives easier
… and collected metrics and other aspects for comparison in a table
… for example, the handling of different value representations, inheritance, or unknown object keys
… at the moment, we are in favor of using LinkML in the future, but we have not decided yet and this is not is the topic of this session
… but we want to collect feedback from the tool authors themselves
… as Vladimir Alexiev has already done, thank you for that
… we want to update our requirements accordingly, to make sure that the process is transparent
… any questions so far regarding the analysis or the diagram?

No questions so far

ek: If you have any questions, then please join the IRC and write "q+"

ek: So we have all of these resources, but there is still a missing point
… so in the WoT WG, we have a repo with GitHub pages available
… after publishing, the W3C team contact adjusts the redirection
… in general, you can consider this uploading software to a web server
… and there is no standardized way to handle this

<VladimirAlexiev_> I've seen many communities that face the same problem: electrical CIM, traceability in trade, GS1 EPCIS in logistics, ACORD in insurance, IFC in AECO etc etc

ek: and this process is too slow for our release cycle

ek: Can we do better?
… we have a PR that tries to address this
… we need better tooling, could rely on package managers
… Klaus Hartke has done some work of using npm for this kind of thing

<VladimirAlexiev_> For the Traceability community, I asked them to consider LinkML: w3c-ccg/traceability-vocab#295

ek: in the JSON Schema world they are using custom registries (?)

ek: So this finishes the summary for now

<VladimirAlexiev_> More importantly, I wrote up some draft Requirements for such tooling: https://github.com/w3c-ccg/traceability-vocab/issues/296. This could complement the comparison table that WoT showed

ek: wanted to keep it brief
… in the TPAC 2023 discussion, there was the question of where the discussion should continue
… not necessarily needed to standardize something like LinkML
… but there needs to be some process or best practices in my opinion, if anyone has other thoughts, please make a comment
… any questions?

<Ege> https://docs.google.com/presentation/d/193OFcFaxD0GqrRuOggwZe5eorgL1C1Epe2cAYN3JEkk/edit?usp=sharing

kaz: A comment regarding logistics: Please paste the link of the slides into the IRC
… another comment: please list the important questions in the slides
… like tooling, versioning, and so on

ek: (Updates the slides)
… if there are any points, we can categorize them accordingly

<VladimirAlexiev_> .. communities: also AAS (industrial Digital Twins, i.e. Industry 4.0 / RAMI)

ek: in the IRC, there were some comments by Vladimir
… saying, that others have been suffering the same issues?

va: I have been seeing the same issues in other communities as well
… many communities want to use both JSON Schema and JSON-LD, and they need to be in sync
… in some cases, they want to borrow schemas from other people and mix and match them
… in some cases, the results are mixed, in others they are very bad
… as many ontologies come with their own baggage
… also difficult if you are relying on a schema in XML

va: In particular the Trade Transparency group has been working on related topics
… find this very interesting to see how semantic technologies spread into these communities
… on the other hand, these people need some help
… need guidance how to use RDF properly
… even simple things, how to model triples with literals (?)
… in RDF, we have infinite precision, in the case of conventional JSON numbers, we don't

va: The importance of this discussion is very very high
… question how to get RDF into more communities
… that try to use it, but don't get it right yet
… question how W3C can help these communities

ek: Thank you for your comments, tried to update the presentation with the links form the minutes

<VladimirAlexiev_> The import is: how can the sem web community help other data communities "graduate" into linked data?

ek: could you elaborate on the Trade Transparency group?

va: Will do

ab: Quick introduction:
… work at Netflix on an ontology service
… this group is exactly discussing what we are trying to solve
… try to combine schemas with ontologies (?)
… we tried using SHACL, has been working okay so far
… issue is that people need to learn RDF and SHACL, which is difficult

<VladimirAlexiev_> Another example: AAS is a very important spec in Industrial IoT. But they have fundamental issues that go against the Web Architecture, eg https://github.com/admin-shell-io/aas-specs/issues/383. See admin-shell-io/aas-specs#384 for a list of issues

ab: we have a problem of discovery, how to discover ontologies?
… question how to combine schemas and ontologies
… very much interested in talking to people, standardization, which we would like to see at some point

ek: Thank you, you've mentioned that you get a lot questions regarding SHACL, do you have problems with adaption?

ab: Problem is that SHACL is way too powerful
… difficult to map to a schema
… tried to create a subset of SHACL that is powerful enough to write meaningful ontologies which you can then project onto GraphQL for example

ek: I think I can relate a bit with that

<VladimirAlexiev_> Another example: the Allotrope community (lab equipment measurements). It is very active and does things right. They use JSON-LD in nice ways

ek: I think a main problem is that @@@
… they have trouble seeing the benefit of switching

ek: There has been some discussion on standardization or not
… in my point of view, there wasn't the need yet
… question: What should be standardized?
… some aspects from the table could be standardized, but the tools are already quite stable already
… not sure about the benefits, other that it would give more credibility

ab: I consider the main issue with this slide related to standardization
… how to we make sure that our projection to GraphQL, for example, is correct and can be injected into an ontology?

ek: From my point of view, it is mostly a tooling question, with LinkML you could go to the other representations

ab: Yeah, for of people it would be about tooling, for us it would be about meaning

ek: Guaranteeing that there is no information loss?

ab: Yeah, and the information is represented correctly

va: I feel very much all of the questions that have been raised as we are facing similar issues
… I tried to create a community for canonical mapping between SHACL and Shex
… SHACL is very useful to build UIs
… so the question is how to use most common subset of SHACL. How to use with GraphQL and translate to SPARQL and use certain more complex joins
… very important how to transform without losing meaning

<betehess> +1 on being lossless. That's our main concern here at Netflix.

ek: Just one point
… you've mentioned that there are GraphQL implementation that use RDF?
… could you repeat that?

<VladimirAlexiev_> GraphQL implementations over RDF, and benchmarks (there's LinGBM and a couple smaller ones): https://www.zotero.org/groups/5393345/semantic_graphql

va: I have Zotero library with resources regarding this topic, I will send you a link

ih: I have seen similar problems before
… I am currently participating in the Verifiable Credentials WG, there have been similar issues
… Pierre-Antoine mentioned similar issues, there are slight differences between different languages, making it hard to convert
… one thing we've seen was that SHACL and shex cannot model dataset

<VladimirAlexiev_> Holger Knublauch is convening a SHACL 1.2 CG that should address named-graphs

ih: typical problems, can't imagine what we will face with the introduction of RDF 1.2

ih: What I am critical of is the problems JSON-LD introduces
… as it both sold as a serialization of RDF but also as plain JSON
… inherit problem of JSON-LD, not sure how to solve it
… one thing I've seen in communities was a misunderstanding of JSON-LD context files
… as context files can be seen as a glorified mapping file

<VladimirAlexiev_> "@context" is not an ontology, and it is not a schema: it's only a mapping from JSON to ontology terms

ih: we need to work on making these discrepancies disappear, but I am not really optimistic, as all of these schematas are not the same
… should not create yet another standard (in the XKCD sense)
… but we need to be aware of the discrepancies before we can go to the tools

ek: Thank you, very good points, the double nature of JSON-LD is exactly why we are facing these kinds of problems

ih: I have seen this in many communities before

ab: I agree so much with what Ivan just said
… we are facing the same problems at Netflix
… JSON-LD was not a problem so far, everyone is using Turtle and that is working as expected
… the lack of definition of issues has been an issue, question how to import an asset (?)
… one aspect that we've noticed that we can always define a SHACL shape and achieve what you want to do, also with fundamental things
… want to publish something in that regard soon, SHACL is very good for this kind of thing in our experience

<McCool> sorry, ntd

ek: In my experience, there is some tooling to help us if we want to go down the route of using SHACL

Check-out

ek: Now we should fill out the check-out slides
… I think one consensus was that this is an annoying problem
… and is relevant for different communities (not only WoT)
… not sure about the next steps
… should we create a CG or mailing list?
… what can we do to work on this?

ih: What I felt is to try to list, gather, categorize the problems that make these tools so difficult
… for example datasets, as Alexandre mentioned, or RDF 1.2 or literals
… we need to have a clear view of the problem space
… I have the impression that we should not jump into creating new tools, we should first understand the problem, step back

ek: Agree with that
… what should the next step be then?

va: I think we should begin with the UCR
… and then begin with SHACL vs Shex
… creating mapping, then see what is missing
… similar with GraphQL, if W3C wants to create a CG working on a mapping then this could happen
… not going to solve all problems, as there will always be differences
… what do you do in case of a discrepancy? Maybe best practices are enough, not necessarily need to create a new specification
… the SHACL 1.2 CG is a good approach
… @@@
… so first focus on UCR and then create focus groups to start working on the individual problems

ek: Added creating the catalog to the slides

kaz: I basically agree with Ivan and Vladimir
… we should clarify requirements, what the problem is, then see what solution would fit

ek: I think there is a question whose requirements it is?

kaz: A better word might be expections

ek: There is a question of ownership, not only the WoT WG is involved

<VladimirAlexiev_> yes: 1. Catalog the problems/features/questions/issues (UCR), 2. focus CGs to work out specific issues: a) SHEX-SHACL mapping, b) RDF-GraphQL best practices, c) maybe YAML/YAML-LD syntax for mixing schemata approaches (JSON Schema and JSONLD Context are first candidates)

kaz: As I mentioned at the beginning of this session, and similar to Ivan's comments, we should clarify and categorize problems
… and then see how they relate to requirements
… WoT WG should clarify its own requirements, then we can contact the others again

<VladimirAlexiev_> 4. Dissemination/proliferation into various communities. Because these are problems that affect widely different communities, it will not be easy to reach/evangelize to them

ek: Question is how the individual groups will form, as I am not part of the groups mentioned in the discussion
… if there is not going to be a new CG, then each group will first have to work on its own

<VladimirAlexiev_> 0. Catalog of tools/practices. I'd be ecstatic if a single tool (eg LinkML) can solve the problems, but I'm doubtful. So we can borrow from the "KGC" CG (who work on RDB/JSON/XML mapping tools, extending R2RML and RML): features from one tool are borrowed as requirements for another

ek: (adds an action item to the slides that each person notes the problems and should do dissemination)
… that concludes our session

<kaz_> [adjourned]

– DRAFT –
Schemata Discussion - Follow up from TPAC23

12 March 2024

Attendees

Meeting minutes

Intro

Presentation

Check-out

Diagnostics