W3C

– DRAFT –
RCH Special Topic call on Issue 89

12 May 2023

Attendees

Present
dlehn, dlongley, gkellogg, ivan, phila, yamdan
Regrets
-
Chair
-
Scribe
phila

Meeting minutes

<gkellogg> w3c/rdf-canon#89

gkellogg: I see Issue 89 addressing the need to support selective disclosure

dlongley: In a selective disclosure piece, the two agents won't have any info other than the selected quads and the mapping.

gkellogg: the vierifier does not have the orginal dataset

dlongley: Yes. magine each quad is signed individually

gkellogg: I take a subset that might be required with those original canonical labels and I can calculate the hash for that subset

dlongley: All you need to send is the subset that you're sending and the mapping for the subset

gkellogg: So the subset I'm sending them has the original labels

dlongley: Imaginew that the thing you send them has unstable blank node labels
… but you send the mapping so they can generate the relevant subset labels

gkellogg: Given that blank nodes have no IDs, but that's only in the context of a concrete serialization
… Any process that creates a subset doesn't allow you to correlate the quads back to the original bc any labels ... there may be implementations, but formally there's no way to do that
… we're trying to create a way... after doing an operation such as a JSON-LD Frame or a SPARQL query that gives you a subset back, allows you to associate eacxh quad with one in the originak

dlongley: Internally within the c14n alogoithm, we have some steps where we atlk about going through the quads in the input dataset and turning blank nodes into something
… so if those BN IDs don;t actually exist...

gkellogg: IMO that's the wrong model. There are no IDs in the input
… only if we serialize those n-quads
… and that results in an ID

dlongley: But we're talkng about the input before we get to n-quads
… something I put in Issue 89... one of the other options we have is to make the input ordered quads and say that the algorithm doesn't modify the order
… it requires a but more work on the outseide. But if the order is not modified in the output they can do any external modifications they need to
… but I worry that when we describe this in another spec, or at review, I don't want someone to say it doesn't work. It does work but I worry that the formal description isn't quite right

dlongley: regarding making ure BN labels are stable... whatever mechanism is used should be open to innovation. e.g. BNs are skolemized, then a framing operation is used
… then the blank nodes are stable, then we go back to RDF and run the c14n algorithm. We know the de-skolemized mapping. SO we k ow where they map to

gkellogg: I'm still hung up on this. I think skolemization is the key. Then you frame, there are no BNs. Then you deskolemize and expect that you can pass that deskolemized set and hope you can work with those labels
… but thayt means parsing the deskolemized, but there's no way to know that the ... was available

dlongley: For implementations that don't have stable IDs, the deskolemizing produces blank nodes and the abstract dataset and the mapping of what the original labels were.
… [Couldn't keep up with all the words, sorry]

<gkellogg> w3c/rdf-canon#89 (comment)

gkellogg: That is more or less what I have in myu comment

gkellogg: Once we're in the algo, we have a dataset, we cab't talk about specific blank nodes
… it goes over each quad and maps blank nodes in their quad... we can talk about the label... if we include a deskolemization... each blank node can be correlated

ivan: I am trying to consider... one step back. The abstract RDF model talks about Bnodes but doesnb't talk about BN IDs
… every implementation I have seen has internal BN IDs
… would it be simpler for the algo if we define a minor extension of the RDF data model saying that each BN has a BN ID and we go from there, no need to skolemize back and forth
… the extension can be used to map back. Any practical implementation will work because the extension makes no change
… but our description is much simpler

gkellogg: The defn of a normalized dataset is one that has stable IDs for its BNs

ivan: Which extends the dfn a little

gkellogg: Intellectually, to me, using skolem IDs in there comes across. Any parser now will be bale to take in an RDF dataset described with skolem IDs and create IRInides that are those IDs
… Then if skolem IDs are turned back into blank nodes they can be turned into IDs

gkellogg: As long as we maintain the mapping we can operate on that

<Zakim> dlongley, you wanted to say can we accomplish the deskolumization somehow by doing it outside of the canonicalization algorithm though?

dlongley: I like what Ivan is saying and I wonder if there's a way to put these two things together and make it simpler to implemet without skolemization at the c14n layer
… I wonder if we can do what Ivan is saying and say that one way you can do this is to skolemization/deskolemization, if you need it

dlongley: This would make very little change to what we have today
… if your implementation doesn't support this already, you can use skolemization

gkellogg: The algo is described for an RDF dataset, not for some document that serializes a dataset
… even if it were... let's say the input is some serialization of an RDF dataset, we'd then have to describe how you parse that serialization to construct the dataset
… and retain the labels of the blank nodes
… nodes a in lists still would be blank

dlongley: Is there a way at the beginning of the algoritm to say You could take as an input that is a serualized dataset but you should have the abstract dataset

dlongley: I worry that we're putting in a lot of spec text in that some may think doesn't achieve a lot

ivan: When I did my implementation, the parsing and underlying environment I was using, was creating a different BN label. But then I realised that there's an option to re-use whatever is in the serialized input

<dlongley> +1 to ivan

ivan: and I think that will be the general case. It's simpler, don't throw it out (paraphrase)

gkellogg: Let's say you have a subject with two values in a list. There are BNs with each of those Formally, every time I parse I get different bkank nodes. I have no labels and no order
… the order can be arbitrary and not repeatable

dlongley: That might be true for some syntaxes. May be some wiggle room... I'm looking for shortcuts to make things easier, but text that allows people who don't have that can still accomplish the smae thing
… externally, you can build your own skolemization process

<ivan> +1 to gkellogg

gkellogg: If we said that a serialized input is a n-quads doc, not an arbitrary serialization, we can describe a class of n-quads parser that records the IDs for all the blank nodes

gkellogg: That doesn't seem like a stretch. Doing it for an arbitrary format seems heavyweight

dlongley: If we can say that there's an additional thing that you can pass in here... What's the minimum amount of language we have to put in here to keep it simple but supports complexity of people want it

gkellogg: As long as you're maintaining the fidelity of those labels

<dlongley> +1 to gregg, but that's solvable there with skolumization (externally)

gkellogg: I take a canonical doc... I run it through a SPARQL CONSTRUCT to create my subset which is a graph that can be serialized arbitrarily. I don't think we can say that the process must maintain the labels in the input
… Another way you might accomplish that is to go through the skolemization

<dlongley> +1 to gregg

gkellogg: That's external, but you can do it in a way that preserves the labels.

ivan: We modify the algo description a little to say we start with n-quads and we end with n-quads. That means you also need to provide as an optional mapping the exact mapping?

dlongley: Yes. We still want to output what we're outputting today (the abstract mapping) and the canonical mapping. IF we do that, then people can put the two things to gether to achieve their goals

gkellogg: Take the normalized dataset and the map for the blank nodes. I thought the issuer problem might be important but it sounds as if it might not be

dlongley: That will solve the use case for me in the environment I'm in, so I'm all for that. I'm worried that if we don't have that additional abstract mapping for blank node IDs it might create problems in other environments

dlongley: I don't want to deny people the ability to work in the abstract [paraphrase]

ivan: I think the answer to your remark - that's why we have a wide review when we're ready. If no one comes up, well...

gkellogg: One use case for c14n is an alternative to an isomorphism. Rather than taking two graphs, you might c14n each and then compare them and see if they're the same. Might get a DIFF
… c14n doesn't quite get there but iut's better thany anythign else we have.

<dlongley> +1 to what gregg just said ... we should accept either input

gkellogg: Imposing n-quads as input... it sounds like we might want to consider that the input can be either n-quads but if it's a dataset then all BNs are given arbitrary IDs

ivan: I have sympathy for allowing both formats, I have the impression that hte isomorphism use case, in practice there are no issues. Because it has internal BN IDs
… What the BN IDs are are uninteresting

ivan: We're not creatuing a difficulty

<Zakim> dlongley, you wanted to clarify that i don't think we should require the input to be n-quads, it's just an optional input that would guarantee stable identifiers for certain use cases

<ivan> +1

dlongley: I want to agree with allowing either possible input. I don't want to say that blank nodes will be forcibly change if you input an abstract

gkellogg: We can say that in practice many implementations do maintain the IDs used in the input.
… Only create BN IDs for those that don't have them.

dlongley: That allows some implementations to cut some corners

gkellogg: Can I ask you yamdan - you tried to come up with a comment that explains your undertsnading. Are we missing anything

yamdan: I'm struggling to follow the conversation

yamdan: I was wondering whether ... we already assume some stability about BN IDs in the current c14n algo
… originally I didn't think we have to define an addition input stability

gkellogg: Looking at step 2 of the algo, I think it needs some work...

gkellogg: There is a BN to quads map...
… that is initialized from the input dataset

gkellogg: There's a point at which we go through the input dataset and we initialize it with BN IDs that abstractly don't exist
… so we're talig about ensuring that there is a pre-step that has some IDs

dlongley: That sounds more useful - we're assuming there are already some BNs. If not - go make them

dlongley: I think this fills things in at the backend
… It means that people can continue either way

gkellogg: attempts to summarise...

Yes, I think we're saying that we're assuming that your external env will give you some IDs. But the algorithm says we'll keep IDs stable
… and we'll produce a mapping for you based on what you gave us.
… dlongley Framing does this

gkellogg: Is there some place where we want to describe how you might do this? Reqs for selective disclosure etc

dlongley: We will describe this in another doc in the VCWG

phila: Asks that the consensus be articulated in w3c/rdf-canon#89

gkellogg: Agrees to do it

dlongley: We can maybe make simple mention of selective disclosure without going ito details

Minutes manually created (not a transcript), formatted by scribe.perl version 210 (Wed Jan 11 19:21:32 2023 UTC).

Diagnostics

No scribenick or scribe found. Guessed: phila

All speakers: dlongley, gkellogg, ivan, phila, yamdan

Active on IRC: dlehn, dlongley, gkellogg, ivan, phila, yamdan