14:01:37 RRSAgent has joined #rch 14:01:41 logging to https://www.w3.org/2023/09/27-rch-irc 14:01:42 Zakim has joined #rch 14:01:51 meeting: RCH bi-weekly meeting 14:01:51 dlongley has joined #rch 14:01:55 agenda: https://www.w3.org/events/meetings/384826af-9108-48e2-a229-bd169f70995d/20230927T100000/ 14:01:55 clear agenda 14:01:55 agenda+ Scribe (most recent first) Manu, Markus, DLongley, Seabass, kazue, PhilA, Gregg, pchampin, Ahmad, TallTed 14:01:55 agenda+ All minutes online available via https://www.w3.org/services/meeting-minutes?channel=rch&num=200 14:01:55 agenda+ -> Issue 176 https://github.com/w3c/rdf-canon/issues/176 - hash parameterization 14:01:57 agenda+ Next steps to CR 14:01:59 agenda+ Explainer update 14:02:00 present+ 14:02:02 chair: markus_sabadello 14:02:06 scribe+ 14:02:08 present+ 14:02:10 present+ 14:02:14 present+ 14:02:16 present+ 14:02:47 yamdan has joined #rch 14:03:23 present+ 14:03:26 topic: Hash parameterization 14:03:31 https://github.com/w3c/rdf-canon/issues/176 14:03:38 present+ 14:03:54 https://www.w3.org/2023/09/11-rch-minutes.html#r01 14:04:01 markus_sabadello: During TPAC we resolved that implementations must support a parameter to define which hash function is used 14:04:08 ... everyone seemed happy 14:04:40 ... but Seabass raised an issue (via email) pointing out that some aspects may not be sufficiently covered 14:05:22 ... concerns around interop and security as choice of hash function can be controlled by a param. Comments from PA, Dave Longley and Ivan 14:05:32 ... Questions: what do we do with that now? 14:05:38 q+ 14:05:46 q+ 14:06:27 gkellogg: When we're discussing hashing, we're expecting the same function is used internally as well as for the result. But we don't say that. 14:06:35 ... Not sure why we expect that they would be the same 14:06:51 TallTed has joined #rch 14:07:14 ... What is the real purpose of needing to be able to change the hash algorithm within the algorithm since nothing is exposed and has features to avoid collisions 14:07:39 ... What is need to parameterize the *internal* hash function? 14:08:13 gkellogg: It's outside the text to print out the internal hashes used 14:08:15 ack dlongley 14:08:41 dlongley: I agree with what Gregg just mentioned. Specifying how to express the hash info we've decided is outside the scope of our spec. 14:08:55 present+ TallTed 14:09:39 ... There are a number of external meta methods for expressing hash methods. Those are responsible for talking about which internal steps may be needed. I don't think it's our responsibility to create a new metadata field 14:09:46 ... Multihash exists, for example 14:09:58 ... There's a ?? spec that does something similar 14:10:05 ... There's an RFC for naming thigs with hashes 14:10:16 ... There are IANA registries for this sort of thing. 14:11:00 ... Good to say non-normatively in our spec: we've said there's a default hash for the internal piece. Could say Don't change this unless you have a god reason to and maybe document it. 14:11:43 q+ to provide some proposals -- can we keep this "implementation defined" but c14n "has to return hash identifier" 14:12:13 ... To answer Gregg's question - the fact that we call out and use a hash in the algo. Someone may say "you need to use hash function X as a regulation" so I don't think we need to change what we've done, but we need to be able to say how it can be done. 14:12:30 q? 14:12:39 ack me 14:12:48 ack manu 14:12:48 manu, you wanted to provide some proposals -- can we keep this "implementation defined" but c14n "has to return hash identifier" 14:13:20 manu: What we're saying ... why are we even considering this. We have had conversations with some individuals who would object if we didn't allow this kind of flexibility. 14:13:35 ... whether we agree or not, putting some text in the spec mitigates that risk 14:14:21 ... The concrete thing that we could do is to say in the algo, when you canonicalize, right now we output the quads, we could also output the internal hashing algo that was used and we can define maybe 2 function names used (referring to SRI spec). 14:14:36 manu: I think what Ivan wants to do goes a little too far. 14:15:19 ... Problem is that there are 2 things we're trying to express. I don't think that Ivan is proposing expresses the internal hash function used and that's what I think seabass is concerned about 14:15:29 q+ 14:15:36 ... SO maybe we can define that as one of the out put pieces 14:15:40 -1 to returning a value, it's unnecessary, it's an input 14:15:53 +1 to you can encode it however you want, that's not our spec's job 14:15:55 manu: You can encode that however you want 14:16:03 q+ 14:16:07 +1 to what dlongley said, it makes it more complicated and is an invariant from the callers context. 14:16:10 q+ to -1 the concrete proposal to "provide it as an output" 14:16:27 .. Concrete proposal is to allow the hash function to be changed by providing input and you get that same value as part of the output. Implementation specific how that's done 14:16:28 ack gkellogg 14:17:02 gkellogg: I think Dave ad I have similar thinking. If the caller is providing the hash function to use, I don't then need it to tell me what has function I used 14:17:20 That's true, gkellogg -- I retract my concrete proposal to provide the internal hash function as an output. 14:17:24 q- 14:17:56 ... There might be regulatory reason for disallowing use of specific algos. We could use MD5 internally, it really doesn't matter, but if you think it does then, OK. 14:17:58 +1 to gregg's comments generally 14:18:22 ... We already have two things you can get. The blank node map or the C14N representation. We're talking about adding a third thing 14:18:44 I'd be fine w/ explanatory text... saying that how to serialize the hash is implementation specific. 14:19:10 q+ to say just having a hash is always insufficient 14:19:30 markus_sabadello: Since the param is in theinput it doesn't seem necessary to have it as an output as well. I think seabass is concerned with not knowing what to do if you just have the hash. I don't think it's our job to define a new metadata mechanism. 14:20:11 ... Some extra text could say that the hash function used in the input is going to be important for uses of the output so it should e preserved or clear from the context or whatever. 14:20:30 q? 14:20:33 ... Some sort of guidance seems worth adding. 14:20:36 ack markus_sabadello 14:20:39 ack dlongley 14:20:39 dlongley, you wanted to say just having a hash is always insufficient 14:20:50 dlongley: You're never going to be able to regenerate a hash if you don't know all the inputs 14:20:59 dlongley: That's true in any system of course. 14:21:33 maybe "The hash function that was used SHOULD be available as an output, e.g., with a +debug flag."? 14:21:45 q+ 14:21:46 ... I don't think there's anything normative we need to add. But some informative text could highlight the need for any function to have all its inputs 14:22:01 ack dlehn 14:22:14 dlehn: It seems like a communication issue for how you name what you're doing. 14:22:43 TallTed, no, we don't need that, because you know which hash function is used when you called the function... and this notion that you have only a hash is misguided, that is always insufficient. 14:22:48 -1 to invent new names for every possible hash function in our spec 14:22:56 ^ yes, to that. 14:23:08 ... I made a comment in the original PR, when you're naming... it seems like there's a discussion here about the has you use n the output on the canonicalized quads and seems beyond the output of the spec. 14:23:27 q+ 14:23:44 q+ to mention IANA registry for hash alg identifier https://www.iana.org/assignments/named-information/named-information.xhtml 14:23:46 dlehn: Not sure I'm really understanding the problem. 14:23:53 markus_sabadello: I think you are. 14:24:02 -1 to invent just N-many names in our spec that include hashes in the names as that is too restrictive, but +1 to have non-normative text that says meta data will need to identify input parameters to enable reproduction 14:24:10 dlehn: It;s about how to communicate that between systems. 14:24:52 q? 14:24:54 markus_sabadello: Yes and seabass seems to thing the way to do that is to only allow one algorithm to be used. But the TAG review said that for future proofing, we need a way to parameterize it, 14:25:01 ack gkellogg 14:25:31 gkellogg: If the principal output is an n-quads doc that is in canonical form with blank node IDs in canonical order 14:26:12 q? 14:26:12 ... If they are taken out of the context where the original function was called, there's no way to add comments to an n-quads doc, for example. I don't think we want a structure to include commenst etc. 14:26:17 my earlier comment was here: https://github.com/w3c/rdf-canon/pull/161#issuecomment-1700273717. was wondering if the alg naming needed to include the hash name. 14:26:50 q+ to follow on w/ what gkellogg was saying wrt. spec guidance. 14:27:01 gkellogg: A dataset using RDC and using a non-default hashing function must not allow that results to be used in a way that is separated from the original function. 14:27:10 q? 14:27:17 ack yamdan 14:27:17 yamdan, you wanted to mention IANA registry for hash alg identifier https://www.iana.org/assignments/named-information/named-information.xhtml 14:27:18 i don't think there is any normative language we can put here that's reasonably testable, it's just strongly worded advice we can do. 14:27:48 +1 to what dlongley said (again) 14:27:53 yamdan: I originally thought this is just a naming problem. I thinkwe don't need to invent a new ID for each hash algorithm - we already have a registry 14:28:06 ... we can just pickup an ID from this registry 14:28:40 ... and can just combine it with our name. Like RDFC1.0-SHA256 etc. 14:29:14 yamdan: We always mention H-mac SHA 256 etc. 14:30:04 yamdan: But I may have missed seabass's original intent. 14:30:13 q+ 14:30:35 ack manu 14:30:35 manu, you wanted to follow on w/ what gkellogg was saying wrt. spec guidance. 14:30:42 markus_sabadello: Everyone agrees we don't want to invent new hash IDs. Concatenating RDFC1.0 with the hash name is an interesting idea. 14:30:57 manu: -1 to that. This feels like a slippery slope 14:31:53 ... In the data integrity specs, we have tried very hard to stay away from parameterization. We keep it simple. In the algo we say you must call RDFC with *this* hash function. I don't think this is big deal 14:32:18 ... When you get the result back, you know the hash function used because you provided it. What you do is important but it's outside the spec. 14:32:40 ... This is not a problem in the data integrity specs. 14:33:13 ... So I agree with what Gregg was saying. It feels external to the spec. 14:33:42 ... So providing some guidance that it's right to convey the internal hash used. 14:34:05 manu: Rattles off lost of hash functions 14:34:12 s/lost/list/ 14:34:27 manu: It's up to implementations to convey what they've done 14:34:46 markus_sabadello: That seems in line with the idea that the param is important as is preserved in other payers of the application. 14:34:48 IMO, a summary: 14:34:48 1. It would be simpler to not parameterize the hash algorithm. 14:34:48 2. However, we can't do that without creating problems for people who 14:34:48 need to comply with regulations and for future proofing. 14:34:48 q? 14:34:49 3. It's not our job to define meta data expressions. 14:34:50 4. There's no testable normative text we can create here, but 14:34:51 more informative advice could be given to address concerns and we have lots of time to bikeshed that. 14:34:57 ack dlongley 14:36:06 +1 14:36:10 q+ 14:36:12 q+ to get some proposals/resolutions down on what we DO NOT want to do? 14:36:33 markus_sabadello: We should summarize this in the GH issue and ask for his help in adding some language 14:36:50 markus_sabadello: I don't think we can make further progress without him presnet 14:36:51 i think we could indicate that CR is ready to go 14:36:52 ack phila 14:36:59 scribe+ 14:38:08 +1 to Phil's comments 14:38:08 phila: I think we could help today by taking a resolution that we could agree that informative text is needed, but no norative text needs to be changed. Therefore, our previous resolution stands. 14:38:08 scribe- 14:38:08 q- 14:38:25 present+ 14:38:56 markus_sabadello: Summarizes discussion so far for seabass 14:39:47 scribe+ 14:40:30 q+ 14:41:01 markus_sabadello: Presses seabass for an answer whether he's happy with the expected outcome 14:41:15 q- 14:41:35 seabass: First impressions: seems dlongley and I spoke last week and went through some of the emails on the list Havig this extra metada at east solves a couple of the issues. 14:41:50 ... Avoids having to force-try every possible algo. 14:42:03 ... SO it seems like an improvement. 14:42:28 ... If we're not going to limit it to one algorithm, how can we best ensure that people use SHA256 rather than using some other one? 14:42:34 q+ 14:42:45 ack phila 14:42:51 +1 to non-normative text encouraging use of the default if possible and using as few hash algorithms as possible for interoperability purposes. 14:43:20 scribe+ 14:44:31 scribe+ 14:44:32 phila: We've got a default, if you don't say what to use, it will use SHA256. We can add informative text to say that you've got to hang onto the parameter if you do provide one and include that with whomever you communicate / share with. We can provide informative guidance that you need to make that information discoverable or available. We can work on informative text over the next few weeks without the pressure of timing and you can be a part 14:44:32 of that and we can send the normative text to CR as it stands now. 14:44:34 scribe- 14:44:51 seabass: ... Looking through the text... 14:45:06 https://www.w3.org/TR/rdf-canon/#dfn-hash-algorithm 14:45:28 gkellogg: This hasn't changed since before TPAC. 14:45:42 seabass: I missed that first TPAC meeting 14:46:08 q+ 14:46:25 seabass: Can we make the default a recommendation, not a requirement. 14:46:53 manu: I think it harms things if we remove the default as the default. They'll use it unless there's a reason nt to 14:47:19 manu: We said earlier today that the expression of a hash on its own is not enough. We want to add non-normative text to say you can/'t do that. 14:47:33 ack manu 14:47:40 ... You should never look at a hash output nad not know which hash function was used. We want to give that advice ti prevent what you have highlighted 14:48:08 ... If you see a has and only a hash, you shouldn't presume you know which hash function was used 14:48:32 manu: What you suggested was to remove the default. I would be a strong -1 on that. Our test suite is built on that. 14:49:10 ... Also we discussed what the output should be. There was agreement that since you call the fucntion with the hash parameter there's no need to have it in the output. 14:49:46 ... We want to provide strong guidance, albeit non-normative, if there is upstream software, they need to convey which hash function was used. 14:50:28 seabass: When I suggest removing the default, I mean make it mandatory that you say you used SHA256 14:51:18 ... Does that mean implementations should say they use SHA256, or that they use something else? 14:51:18 gkellogg: I'm not quite following you, sorry. 14:51:40 q+ 14:51:58 ... This is a normative requirement of people implementing the algorithm. They implement with the default, but provide a mechanism for using an alternative. But it's up to the implementation to make it clear what they used. 14:52:10 we can't say *how* external metadata will be expressed, but we can have our informative text say that it should always be clear what hash algorithm was used internally 14:52:25 (and a "default" here in our spec is orthogonal to that) 14:53:31 q+ 14:53:51 q- 14:54:10 q+ to say we're still discussing informative text to add to the spec 14:54:12 markus_sabadello: If an implementation 56, do they have to use this parameter? No, because that's the default. So if you invoke without the param, SHA 256 will be used. 14:54:14 ack markus_sabadello 14:55:17 q- 14:55:30 q+ to ask for concrete text that seabass would want. 14:55:31 i think we're talking about informative text changes at this point 14:55:37 Should be RDFC-1.0 to identify the algorithm, though. 14:55:42 markus_sabadello: What if we summarise this in the GH issue. 14:55:45 and we could run Phil's proposal to move onto that 14:56:59 q- 14:57:44 draft proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transitoion to CR revcently 14:58:07 +1 14:58:09 draft proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently 14:58:19 I'd +1 that above ^ 14:58:19 +1 14:58:28 Proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently 14:58:31 +1 14:58:31 +1 14:58:34 +1 14:58:35 +1 14:58:41 +1 14:58:58 +1 14:59:02 RESOLVED: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently 15:00:01 RRSAgent, make logs public 15:00:11 zakim, end meeting 15:00:11 As of this point the attendees have been gkellogg, dlongley, phila, dlehn, markus_sabadello, manu, yamdan, TallTed, seabass 15:00:13 RRSAgent, please draft minutes 15:00:14 I have made the request to generate https://www.w3.org/2023/09/27-rch-minutes.html Zakim 15:00:22 I am happy to have been of service, phila; please remember to excuse RRSAgent. Goodbye 15:00:22 Zakim has left #rch 15:02:56 RRSAgent, please excuse us 15:02:56 I see no action items