14:01:37 <RRSAgent> RRSAgent has joined #rch
14:01:41 <RRSAgent> logging to https://www.w3.org/2023/09/27-rch-irc
14:01:42 <Zakim> Zakim has joined #rch
14:01:51 <phila> meeting: RCH bi-weekly meeting
14:01:51 <dlongley> dlongley has joined #rch
14:01:55 <phila> agenda: https://www.w3.org/events/meetings/384826af-9108-48e2-a229-bd169f70995d/20230927T100000/
14:01:55 <agendabot> clear agenda
14:01:55 <agendabot> agenda+ Scribe (most recent first) Manu, Markus, DLongley, Seabass, kazue, PhilA, Gregg, pchampin, Ahmad, TallTed
14:01:55 <agendabot> agenda+ All minutes online available via https://www.w3.org/services/meeting-minutes?channel=rch&num=200
14:01:55 <agendabot> agenda+ -> Issue 176 https://github.com/w3c/rdf-canon/issues/176 - hash parameterization
14:01:57 <agendabot> agenda+ Next steps to CR
14:01:59 <agendabot> agenda+ Explainer update
14:02:00 <gkellogg> present+
14:02:02 <phila> chair: markus_sabadello
14:02:06 <phila> scribe+
14:02:08 <dlongley> present+
14:02:10 <phila> present+
14:02:14 <dlehn1> present+
14:02:16 <markus_sabadello> present+
14:02:47 <yamdan> yamdan has joined #rch
14:03:23 <manu> present+
14:03:26 <phila> topic: Hash parameterization
14:03:31 <markus_sabadello> https://github.com/w3c/rdf-canon/issues/176
14:03:38 <yamdan> present+
14:03:54 <markus_sabadello> https://www.w3.org/2023/09/11-rch-minutes.html#r01
14:04:01 <phila> markus_sabadello: During TPAC we resolved that implementations must support a parameter to define which hash function is used
14:04:08 <phila> ... everyone seemed happy
14:04:40 <phila> ... but Seabass raised an issue (via email) pointing out that some aspects may not be sufficiently covered
14:05:22 <phila> ... concerns around interop and security as choice of hash function can be controlled by a param. Comments from PA, Dave Longley and Ivan
14:05:32 <phila> ... Questions: what do we do with that now?
14:05:38 <gkellogg> q+
14:05:46 <dlongley> q+
14:06:27 <phila> gkellogg: When we're discussing hashing, we're expecting the same function is used internally as well as for the result. But we don't say that.
14:06:35 <phila> ... Not sure why we expect that they would be the same
14:06:51 <TallTed> TallTed has joined #rch
14:07:14 <phila> ... What is the real purpose of needing to be able to change the hash algorithm within the algorithm since nothing is exposed and has features to avoid collisions
14:07:39 <phila> ... What is need to parameterize the *internal* hash function?
14:08:13 <phila> gkellogg: It's outside the text to print out the internal hashes used
14:08:15 <markus_sabadello> ack dlongley
14:08:41 <phila> dlongley: I agree with what Gregg just mentioned. Specifying how to express the hash info we've decided is outside the scope of our spec.
14:08:55 <manu> present+ TallTed
14:09:39 <phila> ... There are a number of external meta methods for expressing hash methods. Those are responsible for talking about which internal steps may be needed. I don't think it's our responsibility to create a new metadata field
14:09:46 <phila> ... Multihash exists, for example
14:09:58 <phila> ... There's a ?? spec that does something similar
14:10:05 <phila> ... There's an RFC for naming thigs with hashes
14:10:16 <phila> ... There are IANA registries for this sort of thing.
14:11:00 <phila> ... Good to say non-normatively in our spec: we've said there's a default hash for the internal piece. Could say Don't change this unless you have a god reason to and maybe document it.
14:11:43 <manu> q+ to provide some proposals -- can we keep this "implementation defined" but c14n "has to return hash identifier"
14:12:13 <phila> ... To answer Gregg's question - the fact that we call out and use a hash in the algo. Someone may say "you need to use hash function X as a regulation" so I don't think we need to change what we've done, but we need to be able to say how it can be done.
14:12:30 <markus_sabadello> q?
14:12:39 <gkellogg> ack me
14:12:48 <gkellogg> ack manu
14:12:48 <Zakim> manu, you wanted to provide some proposals -- can we keep this "implementation defined" but c14n "has to return hash identifier"
14:13:20 <phila> manu: What we're saying ... why are we even considering this. We have had conversations with some individuals who would object if we didn't allow this kind of flexibility.
14:13:35 <phila> ... whether we agree or not, putting some text in the spec mitigates that risk
14:14:21 <phila> ... The concrete thing that we could do is to say in the algo, when you canonicalize, right now we output the quads, we could also output the internal hashing algo that was used and we can define maybe 2 function names used (referring to SRI spec).
14:14:36 <phila> manu: I think what Ivan wants to do goes a little too far.
14:15:19 <phila> ... Problem is that there are 2 things we're trying to express. I don't think that Ivan is proposing expresses the internal hash function used and that's what I think seabass is concerned about
14:15:29 <gkellogg> q+
14:15:36 <phila> ... SO maybe we can define that as one of the out put pieces
14:15:40 <dlongley> -1 to returning a value, it's unnecessary, it's an input
14:15:53 <dlongley> +1 to you can encode it however you want, that's not our spec's job
14:15:55 <phila> manu: You can encode that however you want
14:16:03 <markus_sabadello> q+
14:16:07 <gkellogg> +1 to what dlongley said, it makes it more complicated and is an invariant from the callers context.
14:16:10 <dlongley> q+ to -1 the concrete proposal to "provide it as an output"
14:16:27 <phila> .. Concrete proposal is to allow the hash function to be changed by providing input and you get that same value as part of the output. Implementation specific how that's done
14:16:28 <markus_sabadello> ack gkellogg
14:17:02 <phila> gkellogg: I think Dave ad I have similar thinking. If the caller is providing the hash function to use, I don't then need it to tell me what has function I used
14:17:20 <manu> That's true, gkellogg -- I retract my concrete proposal to provide the internal hash function as an output.
14:17:24 <dlongley> q-
14:17:56 <phila> ... There might be regulatory reason for disallowing use of specific algos. We could use MD5 internally, it really doesn't matter, but if you think it does then, OK.
14:17:58 <dlongley> +1 to gregg's comments generally
14:18:22 <phila> ... We already have two things you can get. The blank node map or the C14N representation. We're talking about adding a third thing
14:18:44 <manu> I'd be fine w/ explanatory text... saying that how to serialize the hash is implementation specific.
14:19:10 <dlongley> q+ to say just having a hash is always insufficient
14:19:30 <phila> markus_sabadello: Since the param is in theinput it doesn't seem necessary to have it as an output as well. I think seabass is concerned with not knowing what to do if you just have the hash. I don't think it's our job to define a new metadata mechanism.
14:20:11 <phila> ... Some extra text could say that the hash function used in the input is going to be important for uses of the output so it should e preserved or clear from the context or whatever.
14:20:30 <markus_sabadello> q?
14:20:33 <phila> ... Some sort of guidance seems worth adding.
14:20:36 <markus_sabadello> ack markus_sabadello
14:20:39 <markus_sabadello> ack dlongley
14:20:39 <Zakim> dlongley, you wanted to say just having a hash is always insufficient
14:20:50 <phila> dlongley: You're never going to be able to regenerate a hash if you don't know all the inputs
14:20:59 <phila> dlongley: That's true in any system of course.
14:21:33 <TallTed> maybe "The hash function that was used SHOULD be available as an output, e.g., with a +debug flag."?
14:21:45 <dlehn> q+
14:21:46 <phila> ... I don't think there's anything normative we need to add. But some informative text could highlight the need for any function to have all its inputs
14:22:01 <markus_sabadello> ack dlehn
14:22:14 <phila> dlehn: It seems like a communication issue for how you name what you're doing.
14:22:43 <manu> TallTed, no, we don't need that, because you know which hash function is used when you called the function... and this notion that you have only a hash is misguided, that is always insufficient.
14:22:48 <dlongley> -1 to invent new names for every possible hash function in our spec
14:22:56 <manu> ^ yes, to that.
14:23:08 <phila> ... I made a comment in the original PR, when you're naming... it seems like there's a discussion here about the has you use n the output on the canonicalized quads and seems beyond the output of the spec.
14:23:27 <gkellogg> q+
14:23:44 <yamdan> q+ to mention IANA registry for hash alg identifier https://www.iana.org/assignments/named-information/named-information.xhtml
14:23:46 <phila> dlehn: Not sure I'm really understanding the problem.
14:23:53 <phila> markus_sabadello: I think you are.
14:24:02 <dlongley> -1 to invent just N-many names in our spec that include hashes in the names as that is too restrictive, but +1 to have non-normative text that says meta data will need to identify input parameters to enable reproduction
14:24:10 <phila> dlehn: It;s about how to communicate that between systems.
14:24:52 <markus_sabadello> q?
14:24:54 <phila> markus_sabadello: Yes and seabass seems to thing the way to do that is to only allow one algorithm to be used. But the TAG review said that for future proofing, we need a way to parameterize it,
14:25:01 <markus_sabadello> ack gkellogg
14:25:31 <phila> gkellogg: If the principal output is an n-quads doc that is in canonical form with blank node IDs in canonical order
14:26:12 <dlongley> q?
14:26:12 <phila> ... If they are taken out of the context where the original function was called, there's no way to add comments to an n-quads doc, for example. I don't think we want a structure to include commenst etc.
14:26:17 <dlehn> my earlier comment was here: https://github.com/w3c/rdf-canon/pull/161#issuecomment-1700273717.  was wondering if the alg naming needed to include the hash name.
14:26:50 <manu> q+ to follow on w/ what gkellogg was saying wrt. spec guidance.
14:27:01 <phila> gkellogg: A dataset using RDC and using a non-default hashing function must not allow that results to be used in a way that is separated from the original function.
14:27:10 <phila> q?
14:27:17 <markus_sabadello> ack yamdan
14:27:17 <Zakim> yamdan, you wanted to mention IANA registry for hash alg identifier https://www.iana.org/assignments/named-information/named-information.xhtml
14:27:18 <dlongley> i don't think there is any normative language we can put here that's reasonably testable, it's just strongly worded advice we can do.
14:27:48 <gkellogg> +1 to what dlongley said (again)
14:27:53 <phila> yamdan: I originally thought this is just a naming problem. I thinkwe don't need to invent a new ID for each hash algorithm - we already have a registry
14:28:06 <phila> ... we can just pickup an ID from this registry
14:28:40 <phila> ... and can just combine it with our name. Like RDFC1.0-SHA256 etc.
14:29:14 <phila> yamdan: We always mention H-mac SHA 256 etc.
14:30:04 <phila> yamdan: But I may have missed seabass's original intent.
14:30:13 <dlongley> q+
14:30:35 <markus_sabadello> ack manu
14:30:35 <Zakim> manu, you wanted to follow on w/ what gkellogg was saying wrt. spec guidance.
14:30:42 <phila> markus_sabadello: Everyone agrees we don't want to invent new hash IDs. Concatenating RDFC1.0 with the hash name is an interesting idea.
14:30:57 <phila> manu: -1 to that. This feels like a slippery slope
14:31:53 <phila> ... In the data integrity specs, we have tried very hard to stay away from parameterization. We keep it simple. In the algo we say you must call RDFC with *this* hash function. I don't think this is  big deal
14:32:18 <phila> ... When you get the result back, you know the hash function used because you provided it. What you do is important but it's outside the spec.
14:32:40 <phila> ... This is not a problem in the data integrity specs.
14:33:13 <phila> ... So I agree with what Gregg was saying. It feels external to the spec.
14:33:42 <phila> ... So providing some guidance that it's right to convey the internal hash used.
14:34:05 <phila> manu: Rattles off lost of hash functions
14:34:12 <phila> s/lost/list/
14:34:27 <phila> manu: It's up to implementations to convey what they've done
14:34:46 <phila> markus_sabadello: That seems in line with the idea that the param is important as is preserved in other payers of the application.
14:34:48 <dlongley> IMO, a summary:
14:34:48 <dlongley> 1. It would be simpler to not parameterize the hash algorithm.
14:34:48 <dlongley> 2. However, we can't do that without creating problems for people who
14:34:48 <dlongley>   need to comply with regulations and for future proofing.
14:34:48 <markus_sabadello> q?
14:34:49 <dlongley> 3. It's not our job to define meta data expressions.
14:34:50 <dlongley> 4. There's no testable normative text we can create here, but
14:34:51 <dlongley>   more informative advice could be given to address concerns and we have lots of time to bikeshed that.
14:34:57 <markus_sabadello> ack dlongley
14:36:06 <gkellogg> +1
14:36:10 <phila> q+
14:36:12 <manu> q+ to get some proposals/resolutions down on what we DO NOT want to do?
14:36:33 <phila> markus_sabadello: We should summarize this in the GH issue and ask for his help in adding some language
14:36:50 <phila> markus_sabadello: I don't think we can make further progress without him presnet
14:36:51 <dlongley> i think we could indicate that CR is ready to go
14:36:52 <markus_sabadello> ack phila
14:36:59 <gkellogg> scribe+
14:38:08 <dlongley> +1 to Phil's comments
14:38:08 <gkellogg> phila: I think we could help today by taking a resolution that we could agree that informative text is  needed, but no norative text needs to be changed. Therefore, our previous resolution stands.
14:38:08 <gkellogg> scribe-
14:38:08 <manu> q-
14:38:25 <seabass> present+
14:38:56 <phila> markus_sabadello: Summarizes discussion so far for seabass
14:39:47 <phila> scribe+
14:40:30 <manu> q+
14:41:01 <phila> markus_sabadello: Presses seabass for an answer whether he's happy with the expected outcome
14:41:15 <manu> q-
14:41:35 <phila> seabass: First impressions: seems dlongley and I spoke last week and went through some of the emails on the list Havig this extra metada at east solves a couple of the issues.
14:41:50 <phila> ... Avoids having to force-try every possible algo.
14:42:03 <phila> ... SO it seems like an improvement.
14:42:28 <phila> ... If we're not going to limit it to one algorithm, how can we best ensure that people use SHA256 rather than using some other one?
14:42:34 <phila> q+
14:42:45 <markus_sabadello> ack phila
14:42:51 <dlongley> +1 to non-normative text encouraging use of the default if possible and using as few hash algorithms as possible for interoperability purposes.
14:43:20 <dlongley> scribe+
14:44:31 <phila> scribe+
14:44:32 <dlongley> phila: We've got a default, if you don't say what to use, it will use SHA256. We can add informative text to say that you've got to hang onto the parameter if you do provide one and include that with whomever you communicate / share with. We can provide informative guidance that you need to make that information discoverable or available. We can work on informative text over the next few weeks without the pressure of timing and you can be a part
14:44:32 <dlongley> of that and we can send the normative text to CR as it stands now.
14:44:34 <dlongley> scribe-
14:44:51 <phila> seabass: ... Looking through the text...
14:45:06 <gkellogg> https://www.w3.org/TR/rdf-canon/#dfn-hash-algorithm
14:45:28 <phila> gkellogg: This hasn't changed since before TPAC.
14:45:42 <phila> seabass: I missed that first TPAC meeting
14:46:08 <manu> q+
14:46:25 <phila> seabass: Can we make the default a recommendation, not a requirement.
14:46:53 <phila> manu: I think it harms things if we remove the default as the default. They'll use it unless there's a reason nt to
14:47:19 <phila> manu: We said earlier today that the expression of a hash on its own is not enough. We want to add non-normative text to say you can/'t do that.
14:47:33 <markus_sabadello> ack manu
14:47:40 <phila> ... You should never look at a hash output nad not know which hash function was used. We want to give that advice ti prevent what you have highlighted
14:48:08 <phila> ... If you see a has and only a hash, you shouldn't presume you know which hash function was used
14:48:32 <phila> manu: What you suggested was to remove the default. I would be a strong -1 on that. Our test suite is built on that.
14:49:10 <phila> ... Also we discussed what the output should be. There was agreement that since you call the fucntion with the hash parameter there's no need to have it in the output.
14:49:46 <phila> ... We want to provide strong guidance, albeit non-normative, if there is upstream software, they need to convey which hash function was used.
14:50:28 <phila> seabass: When I suggest removing the default, I mean make it mandatory that you say you used SHA256
14:51:18 <phila> ... Does that mean implementations should say they use SHA256, or that they use something else?
14:51:18 <phila> gkellogg: I'm not quite following you, sorry.
14:51:40 <markus_sabadello> q+
14:51:58 <phila> ... This is a normative requirement of people implementing the algorithm. They implement with the default, but provide a mechanism for using an alternative. But it's up to the implementation to make it clear what they used.
14:52:10 <dlongley> we can't say *how* external metadata will be expressed, but we can have our informative text say that it should always be clear what hash algorithm was used internally
14:52:25 <dlongley> (and a "default" here in our spec is orthogonal to that)
14:53:31 <manu> q+
14:53:51 <manu> q-
14:54:10 <dlongley> q+ to say we're still discussing informative text to add to the spec
14:54:12 <phila> markus_sabadello: If an implementation 56, do they have to use this parameter? No, because that's the default. So if you invoke without the param, SHA 256 will be used.
14:54:14 <markus_sabadello> ack markus_sabadello
14:55:17 <dlongley> q-
14:55:30 <manu> q+ to ask for concrete text that seabass would want.
14:55:31 <dlongley> i think we're talking about informative text changes at this point
14:55:37 <gkellogg> Should be RDFC-1.0 to identify the algorithm, though.
14:55:42 <phila> markus_sabadello: What if we summarise this in the GH issue.
14:55:45 <dlongley> and we could run Phil's proposal to move onto that
14:56:59 <manu> q-
14:57:44 <phila> draft proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transitoion to CR revcently
14:58:07 <gkellogg> +1
14:58:09 <phila> draft proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently
14:58:19 <manu> I'd +1 that above ^
14:58:19 <seabass> +1
14:58:28 <phila> Proposal: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently
14:58:31 <manu> +1
14:58:31 <dlongley> +1
14:58:34 <yamdan> +1
14:58:35 <seabass> +1
14:58:41 <phila> +1
14:58:58 <dlehn> +1
14:59:02 <phila> RESOLVED: While we continue to discuss Issue 176, there is consensus that there will not be a need for a change to the normative text discussed that the WG resolved to seek transition to CR recently
15:00:01 <phila> RRSAgent, make logs public
15:00:11 <phila> zakim, end meeting
15:00:11 <Zakim> As of this point the attendees have been gkellogg, dlongley, phila, dlehn, markus_sabadello, manu, yamdan, TallTed, seabass
15:00:13 <Zakim> RRSAgent, please draft minutes
15:00:14 <RRSAgent> I have made the request to generate https://www.w3.org/2023/09/27-rch-minutes.html Zakim
15:00:22 <Zakim> I am happy to have been of service, phila; please remember to excuse RRSAgent.  Goodbye
15:00:22 <Zakim> Zakim has left #rch
15:02:56 <phila> RRSAgent, please excuse us
15:02:56 <RRSAgent> I see no action items