RDF-star Semantics task force

Meeting minutes

discussion on Enrico's examples in https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Feb/0061.html

olaf: I find the last example (Enrico's birth) puzzling

souri: representing N-ary relationship: either as a vertex with N properties, or an edge with N-2 properties
… but you need to chose which of the 2 properties are represented by the source and destination of the edge
… this is a matter of perspective

TallTed: RDF is different from the relational model; it is schema last
… you add properties about a node; you don't need to know in advance what properties you need.

ora: we need to think not only of the data, but also of the queries.
… A model of the academic world may say "thesis have an author and an advisor",

<TallTed> +1 yes, this comes to the fore when articulating a query ... which is *usually* the largest justification for collecting the data in the first place. Sometimes (often) you don't know what question will be the most important/revealing in the end, but you do know you'll be wanting to question.

ora: but I may build a query to get the author's advisor -- I don't care about the thesis.

Souri: data evolves, that's why perspectives are important.
… It is ok to first model a wedding with a binary relation.
… An N-ary relationship could be seen as N*(N-1) binary relations, depending on the focus of the person querying.

pchampin: My question on this perspective is that, I would have considered this an inferencing-related problem
… and, then, do I need RDF-star for that?
… the annotation can be seen as provenance
… which may be interesting to keep
… but sometimes I want to infer a binary view of an n-ary relation

pchampin: inferring the binary relations from the N-ary relation is useful, but do we need RDF-star / rdf:reifies for that?

ora: some people claim we need hypergraphs. Maybe we should keep that in mind.

tl: notion of shortcut relation in the cultural domain.
… The actual triples are optimized in the perspective of the use-case.
… Getting a triple, you can find back the N-ary relation this triple came from.

<Zakim> pfps, you wanted to mention expressive power

pfps: at high enery levels, everything looks the same, but that's not where we live.
… RDF is a low energy level environment, when many things (e.g. rules) don't exist.

<niklasl> +1 - choice is key (a predicate is a chosen (low-level) simplification)

<ora> +1 to what pfps said

Souri: true, we have to make choices. Rules may come from outside.

<pfps> in the low-expressivity RDF environment one has to make representational choices, these choices are *not* equivalent, and users see artifacts induced by these choices

Souri: When the data evolves (2-ary to 3-ary relation), we need to ensure that old queries still work.

<pfps> in RDF 1.1, the (sole) way of representing n-ary relationships is reification - creating a node for the relationship and separate links for each of the n parts of the relationship. This is a simple, flexible, and powerful mechanism, but can be cumbersome.

<tl> to counter pfps: we are here because people find the degree of simplification that RDF imposes on them too much

<pfps> In labelled property graphs, there are two ways (but both are flawed). You can do the RDF 1.1 way, or take one of the relationships as primary, use it as an edge and add the other n-1 as attributes (or whatever they are called there).

<TallTed> How much existing RDF has been analyzed to come to this conclusion of "arity of (close to) 2"? There's a *LOT* of RDF data out there in the wild, 20 years into RDF...

niklasl: I think that many people understand the n-ary relation case

<pfps> The notion of arity is entirely dependent on representational choices, at least in low-energy environments. One clear benefit of triples (i.e., binary relationships) is that they are adequate to encode all relationships, precisely because of reifications.

niklasl: should we put that use case in the front?
… maybe not
… RDF-star can be used for many other things as well
… It is like adding post-its

<olaf> s/scrbe//

<pfps> So, in some very strong sense, *everything* is binary. One of the clear problems with relational data is its inability to correctly represent unknown values in n-ary relationships. Reification does not have this problem, but is forbidden in most normal forms for relational data bases.

<niklasl> https://hackmd.io/@niklasl/HJ3IudCdp

ora: about what pfps wrote in the char:
… in PGs you can only add scalar properties on edges, so you can't model hypergraphs.
… In some sense, what we are adding to RDF is more expressive than PGs.
… And in some sense, it is simpler (unified).

<pfps> Certainly RDF-n is more expressive than labelled property graphs because it does not have the limitation that edge decorations cannot point to other nodes.

Souri: I agree that our proposal is more expressive than PG.
… In PG, when you need to change edges into vertices, you are ruining all existing queries.

pfps: yes, but the solutions only address a small part of the changes that one may need to make.
… A large class of changes will require the queries to change.
… If you decide that the most important thing about a mariage is not the two participants but the place where it happened, you need to rewrite your queries.

Souri: if you decide to add additional information to your existing data, *pre-existing* queries should not be forced to change.

<pfps> My point is that only *some* pre-existing queries remain the same when a previously binary relationship is changed to quasi-n-ary. So any argument that *some* queries do not need to be changed has to come with an argument that the other queries do not matter (much).

niklasl: shorthand properties: I've used them by (ab)using OWL property chain axioms.
… but you can only infer the shorthand from the whole path, not the other way around.

<pfps> There is expense and then there is expense. OWL (often!) requires more computational support but (often?) requires less human support. Balancing these two expenses is a knowledge engineering decision, unfortunately influenced by non-engineering environmental concerns.

niklasl: [example of "contributor" relation, where the exact role is not always known first, then discovered later]
… I hope this use-case can be handled by using RDF-star annotations.

ora: pfps, can you explain the issue you raised with marriages?

pfps: assume you first represented marriages as (groom -maried-> bride), which you later annotated with location, date, etc.
… but then you realize that you got it wrong, and that they should be represented (groom -marriedIn-> location), annotated with the bride, date, etc.
… *Then* you need to rewrite your queries.

ora: I think everyone expect they need to change their queries if they change their data model.

tl: there is no way to completely avoid remodeling forever ; but RDF-star gives more leeway
… Yes, RDF is simple, but it is too simple.

<pfps> RDF does *not* require the use of blank nodes for reification.

tl: The popularity of PGs comes from the trade-off they made between simplicity and expressivity.

Souri: with relational data, people will never change the schema from the ground up,
… but they would add new columns

<pfps> Changes to an RDB schema do actually change the results of some queries! Consider SELECT * in SQL. Please don't make RDBs better than they actually are.

Souri: The popularity of PGs comes from edge properties. It makes extending the data easier.
… This is the main reason. There are other (complexity of IRIs, scalar values vs. literals).

niklasl: I agree with tl. Shorthand properties and triple annotations are @@1.
… With RDF-star annotations / marginalia, you can extend your model for a while.
… [discussion about marriages ending, needing to remove the asserted triple]

<Souri> SELECT * is convenient for development but heavily discouraged in production code. :-)

tl: in PGs, there are two levels (properties and relations), while RDF only has triples. RDF is less readable.
… Properties and relations are different primitives when you model your domain.
… We need to explain how to use RDF-star to model PGs.

<pfps> I find LPGs much less readable. Maybe this is simply familiarity, but I find the distinction between properties and relations jarring each time I see it. Of cource, RDF 1.2 would have this same problem.

<TallTed> ack

Souri: [discussion about the complexity of SPARQL vs. PG query languages]
… We need to think about it when extending SPARQL.

gkellogg: what do we want to accomplish?
… I don't think that RDF-star is a competitor of PGs, but it helps modelling PGs in RDF.
… The property/relation dichotomy in PGs is confusing for some people (especially with an RDF background):
… how do you decide which one to use?

enrico: recent approaches in conceptual modelling do not distinguish between attributes and entities,
… precisely because it causes problems with data integration.
… Object Role Modeling is now largely used.

pchampin: [ask enrico about the "birth" example where 'location' and 'date' information is represented redundantly]

enrico: related to databases in the 6th normal form with primary key
… the primary key is your reification, and everything else is represented with binary relationship.
… The problem is, in some cases, you don't have an obvious primary key.
… Some database will focus on who is married to whom, another will focus on who was married where.

Souri: this is similar to what I said earlier. Consider birth as a 3-ary relationship (person, location, date)
… Some people will be interested in the born-in relation, others will be interested in born-on.
… We need a kind of views.

enrico: I will not write things like that. But if I alteady have If someone gives me the triple << :b1 | :enrico :born-in :rome >> :on-date 1962 ,
… and someone sends me << :enrico :born-on 1964 >> :location :rome ,
… I want to make it clear that they are the same reification.

tl: we have to make a decision; do we forbid a Reification to be linked to several triples?

AndyS: I see RDF as a toolbox. I don't think it is for us to say "use it this way or that way".
… We need to provide the basics.

tl: AndyS, if rdf:reifies is many-to-many , doesn't it raise the same issue as the one you were worried about with rdf:subject, rdf:predicate, rdf:object?

AndyS: we want an example that we are all happy with, but that's different from forbidding other things.

<niklasl> :b1 a ex:Birth .

<niklasl> :b1 a :Birth ; :on-date 1962 ; :location :rome .

<niklasl> :b1 rdf:reifies <<( :enrico :born-in :rome )>>, <<(:enrico :born-on 1962)>> .

<niklasl> :t1 a :Fact ; rdf:reifies <<( :enrico :born-in :rome )>> .

<niklasl> :Fact rdfs:subClassOf [ a owl:Restriction ; owl:onProperty rdf:reifies ; owl:cardinality 1 ] .

niklasl: I think we don't need to make a choice. This depends on use-cases.
… In the examples above, :Birth can reify several triples, but :Fact would allow only noe.

olaf: another question to Enrico; you use the same predicate :born-in, once with the person as the subject, once with the reification as the subject
… for me that should be two different properties.

– DRAFT –
RDF-star Semantics task force

01 March 2024

Attendees

Meeting minutes

discussion on Enrico's examples in https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Feb/0061.html

Diagnostics