Privacy Interest Group Teleconference -- 29 Nov 2018

Trace Context

christine: moving Trace Context to top of agenda
... [explains what we've been doing re: reviews]

<SergeyKanzhelev> privacy section of Trace Context spec

Sergey: this is nothing new - we're just trying to get people to do the same thing.
... goal is to increase cooperation between differetn vendors of distributed tracing.
... distributed tracing: is way to follow execution of code through multiple components
... makes troubleshooting hard. e.g. if first component needs to return results be calcualted on five different machines
... failure in one compoent may cause the whole to fail. need to be able to tell which failed.
... performance problems even harder to troubleshoot. hard to pinpoint which component slowed.
... everybody feels the need to propoagate the identity of a request, so collecting logs from every system, you
... can identity which failure / execution time belongs to which request. you want correlate data from components, grouped
... by transaction
... historically, many DT vendors do @@@, their own thing[?]. they inject a piece of identity and propoagate it forward.
... the problem is when we get more SaaS (software as a service). some components in-house, some from AWS, some from MS, etc.
... you depend on the protocols the vendors provide.
... we felt need for a common spec for interop.
... for privacy, one problem created is : when protocols were vendor-specific and -controlled, ... you now have a case where the identity
... will be propagated by others.
... The spec defiens 2 headers: traceparent, tracestate. we split it for privacy - traceparent is strictly controlled identity format
... 128 bits of randomness + 64 bits as another ID - these are two numbers + one bit - that's all you need to correlate.
-- tracestate is a relief valve. vendors want to innovate and propagate new (or old!) headers.
... we create a dictionary of name-value pairs
... or whatever you want to put there.

<christine> https://w3c.github.io/trace-context/#privacy-of-traceparent-field

scribe: in privacy section of doc: for traceparent, we think privacy concerns are e.g. putting IP address in array instead of random.
... and propagators may realy with realizing. we think this is hypothetical.
... even if they relay it, this is just a byte array, relays don't necesssaily know it's an IP address
... #2 is @4
... #3: e.g. Company A provides services to company B. eg. file stoarage. if traceparent identify transaction as a whole.. even if you haven't exposed info on every single request, company B could deduce how many customers customer A has
... if they correlate calls. or how many files company A has.
... @6
... specification suggests not putting personal information into the trace state’s opaque key-value storage
... but privacy sensitive services might consider removing unknown keys to avoid storing extra personal data that you don’t want to
... don’t want propogation of data to happen unintentionally

<Zakim> jnovak, you wanted to ask about role of the UA in this specification versus this being designed for server-server communication

jnovak: two servers talking to each other, or would this also go through the user agent?

jnovak: will the UA participate by sending the trace data to multiple origins?

SergeyKanzhelev: see browser as initiator of requests, might set the initial identity of a request, which service will propogate through multiple services

SergeyKanzhelev: trace-id is an identifier end-to-end, when the browser initiates a request, it sends the trace-id to Service A, which might subsequently send the same id along to Service B

SergeyKanzhelev: to confirm that it’s all part of the same transaction
... ids are being forwarded “down the line” on the server side

christine: to clarify, the browser might be the initiator, but is the trace-id generated by the browser or the first service?

SergeyKanzhelev: different scenarios, for example, javascript on a page making ajax calls to a service

SergeyKanzhelev: would like in the future for a browser to include an identifier in the initial page request — but that isn’t happening right now

SergeyKanzhelev: if browsers would help us correlate all the requests for distributed traces, that would be even better

<scribe> scribe: weiler

npdoty: there seems to be some interest in moving ths into browser eventually.
... in web orivacy model, we think of things being segmented by origin. have we thought about, rather than dividing by
... vender IDs, maybe divide by origin? Also, do we expect these to go back up the chain?

more complicated things, thay might cross origin

sergey: by origin, o you mean domain name?

nick: it's more complicated, but yes.

sergey: we've talked about putting more properties on each key-value pair
... realized that it complicated spec too much. would be better to have a simple proxy
... smarter data collector.... @7 ... as long as you operate nicely, it's the same risk as if someone puts SSN in URL.

<pranjal> +q - just wanted to clarify if the trace id will be reset with every trace?

sergey: even if I didn't want to store xyz data... then don't store it.
... you can protect yourself by not playing nicely and dropping everything.

nick: if headers are sent in HTTP responses, or expected to persist across http request and response pairs, there are potential cross-origin privacy implications

sergey: @9. We have use for that. wg doesn't feel like it's used in the same way, so we didn't spec it.

nick: i think would create more concerns we: cross-origin tracking, if we expect same identitifier to be persisted between ....
... it could violate assumptions we have about privacy model, if browser will take identifier from one party and send to another.

sergey: we could explain that.

moneill: basically, javascript applications in the browser can send XHR, that would include trace identifiers in HTTP request headers
... nothing especially new because XHR can already include all sorts of information in the body, but concern if it creates a new fingerprinting capability or a way for bad actors to circumvent existing privacy controls, but I don’t currently see it as a problem
... don’t see a problem unless browsers are also participating and potentially forwarding on headers more generally. is that something that might happen in the future?

SergeyKanzhelev: we don’t have as much detail on the implications of browser implementation. don’t see how to address it in the specification right now

<Zakim> pranjal, you wanted to clarify if the trace id will be reset with every trace?

pranjal: how the trace-ids will be reset? is that just up to the implementation?

SergeyKanzhelev: generally recommend every single AJAX call should have a separate trace id

pes: motivate the need. could this happen all with server-to-server communication? why does this need to be pushed into the client?

SergeyKanzhelev: in identifying where the source of latency is, debugging prefers to be able to track the transaction from end to end, which includes the client

plh: if there are different trace ids for different ajax calls, can you see that they are all related?

SergeyKanzhelev: if you’re monitoring at a higher level, you might have id’s for multiple pieces of client functionality

npdoty: possible with existing functionality for client-side javascript to just use a cookie or a GET parameter, just that having a standardized header would make it simpler?

SergeyKanzhelev: also helps with interoperability, for example with proxies, load balancers, etc.

weiler: 1) auditing, can we audit whether the information included in trace headers is acceptable (doesn’t include sensitive, PII, etc.)? rather than any random number, limit to hash of a recent timestamp, so that it’s less easily abused

SergeyKanzhelev: an interesting idea to be able to audit, so that people can’t put information within the mostly random identifier

weiler: 2) use cases, are these being used all the time, or just being initiated during debugging a problem? who turns it on?

SergeyKanzhelev: want to make distributed tracing part of every application that is deployed, your components never break the distributed trace. try to make it by default in as many places as possible

<npdoty> SergeyKanzhelev and plh, is there a mailing list where you would like us to send more comments?

christime: take-aways: cross-origin issue; audit idea from sam; further discussion re: why is this being done client and not just server-server?
... Feel free to come talk to us any time.
... Thank you!

sergey: thank you for opportunity

Questionnaire

jason: thank you for contributions to questionnaire. updates:
... 1) Luassz is adding me as an editor on behalf of ping.
... 2) 3 out of 4 of our edits have been merged. 4th set, on mitigations section, we sent PR weeks ago, and he pushed back w/ edits.
... once those are in, we'll be in good shape
... request from TAG to split doc up. we don't want to take that on until we like the whole. tossing around idea of splitting intro/questionnaire and mitigations separately. not sure what that willl look like

<npdoty> so anything needed from us right now?

christine: jason and I talked re: deprecating older docs. we think we can start on that now.
... we'll f/u with sam offline on that

<npdoty> yes, +1 for deprecating the older in-progress pieces

Private browsing mode

pete: we had a call a couple of weeks ago re: what a doc would look like
... main sticking point is whether client should advertise to server that it's in private browsing mode.
... a well-meaning site could do good thing and no add'l entropy leak
... counter-argument: extra entropy
... consensus this would be useful.

nick: where is the discussion ongoing?
... where should we discuss it?

christine: we talked about this, starting with MNot's work from when he was on the TAG.
... browsers were less intersted in the past.
... I want this IG to discuss what would be useful.

pete: goal on call was not to set bounds on what clients could do. want to help docs describe what to do when browser is in private mode.
... and warn sites "don't assume this will always be available".
... has been in private email.

moneill: I'm interested. there was an issue with Boston Globe detecting private mode.

<npdoty> +1, we have often wanted to give some kind of guidance about what to include in new feature specs regarding how they should interact with a private browsing mode, and it would be great to have more details

moneill: I think flag should not be there.

christine: we could have an ad hoc call next week on this.

<npdoty> (I’m generally on the other side, I think sites can already detect it, and might as well make it explicit)

<moneill> +1

<npdoty> next week works for me

sam: moving the discussion to the IG list would be great!

christine: meeting next week; next full call Dec 20th

- DRAFT -

Privacy Interest Group Teleconference

29 Nov 2018

Attendees

Contents

Trace Context

Questionnaire

Private browsing mode

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output