Kleber: Already two sessions,
this is the third on session on Ad Selection on APIs from the
Privacy Sandbox
...: This is a problem I want to solve, I would like to
motivate you that this is a problem worth solving.
... Study published shows that there is a -52% revenue loss
when you remove cookies.
... So some notion of user identity approximately doubles the
amount of revenue a publisher receives from having ads on their
site.
... The details show a break out for the top 500 publishers. It
varies by industry. News for example, loses 62%.
... I would like to make this difference much smaller. However,
I want to do so without enabling cross-site linkability.
... In this morning's privacy threat model discussion we listed
six properties, I think the identifying users across sites,
i.e. linking together a users browsing across the web, is
hugely important.
... So I would like to do this in a way that does not allow the
recreation of the user's browsing history.
... Two motivations: I like the web and want sites to be able
to continue making money.
... Secondly, everything we're talking about to improve privacy
could trigger an arms race. If the benefit of winning is
billions of dollars, then someone is going to try and win
it.
... I would like the gap between monetisation on the private
web and on the non-private web smaller.
... That's the what and why, finally the how.
... My goal is to figure out how to get back ads targeted at
something not on the current page.
... I have two APIs I would specifically like to talk about,
but I want to discuss and get help.
... Part 1: What kind of information is ok to use? I don't mean
in all circumstances - user consent, state, etc. can change all
this.
... Part 2, once we've decided this information is ok to use
how do we avoid unintended consequences, e.g. re-enabling
cross-site topics.
... For example, a user may self select ten topics of interest.
However if they are the only user interested in these topics
then they are now identified.
... The two APIs are FLOC and PIGIN.
... FLOC is about letting ads be targeted, not at you
personally, but at a large enough group of people who are like
you. An example of what the browser might know already is your
browsing history, interests you volunteer, or perhaps there's
ML in hte browser learning about your intersts.
<dbaron> https://github.com/jkarlin/floc
<toml> +q to note that "Kinda
like you" shared with 999 other people can be a pretty darn
personal bucket, and can be extremely sensitive even if it's
not narrow.
...: The important part of FLOC is the clustering of users
based on that set of signals. This is some crypto-black magic
I'd rather not get into, but I'd like to assume we can do
clustering without sending that data to a server.
... So assume there's a way to group users with a similar
set...
Hadley: Question, I don't understand how the browser understands everything I've done and how it reveals that information to an advertiser.
Kleber: First the browser decides
you're in a particular cluser, e.g. 1234567.
...: Currently the advertiser observes a third party cookie in
your browser.
Hadley: How is it sent?
Kleber: In a specific header sent
to the advertiser.
...: The advertiser can choose to target ads at a particular
cluster or FLOC.
..: The advertiser can observe how that FLOC behaves in
aggregate, but not you specifically.
Hadley: What stops the advertiser linking you to the FLOC?
Kleber: That was a problem we
discussed as part of the Privacy Budget discussion. We need to
constrain the amount of information that leaks about you. The
FLOC is some of those bits of information that leaks out about
you.
...: Any questions?
Tom: You talk about Privacy
Budget as being the mechanism for preventing the leaking of
information. However, other things come in like screen
resolution, connection characteristics, etc. However, your FLOC
could be incredibly personal. Sexual preference, union
membership, etc. If it's expressed in your browser history then
it can contribute to the FLOC. The Privacy Budget would not
cover this.
... This is a feature of the current advertising model, but I
would like to stop it.
Brad: The Privacy budget covers identifiablitiy. The explainer talks both about k-anonynymity and do not deviate from current population demographics.
Tom: What kind of limits are those? If it's homosexuality in 10% of the population - how is that prevented.
Kleber: There are two different
problems being solved - fingerprinting versus sensitive
characteristics.
... So, properties of the FLOC being too personal.
... It is plausible to build a cluster that has better privacy
properties than k anonymity.
... This only works if you identify specific characteristics
that you want to ensure you do not leak. Then build the
clustering mechanism to not reveal those.
Tom: I don't understand what you're saying. I don't understand the difference between a FLOC that identities people being as gay versus saying that's not possible. I'm not sure we can enumerate all the sensitive charactertistics, but let's say it was a 100.
Kleber: I can't fairly represent the details, but I can point you to papers on this and link them in the notes. However, I agree the list of sensitive characteristics is not something you can just list out.
Tom: Let's say we can definitely enumerate a set of characteristics to avoid. What is the basis information you need in order to do that?
Kleber: Yes, you need a training set.
Tom: So you would need a whole bunch of people who disclose their browsing history and sexuality and you need to repeat that for every single sensitive characteristic?
Kleber: yes, this is one of the paradoxes of privacy research. To avoid recognising the characteristic you need a system that can recognise the characteristic. T-closeness is the magic word to search on.
<dbaron> .... (K-anonymity and T-closeness)
<Zakim> toml, you wanted to note that "Kinda like you" shared with 999 other people can be a pretty darn personal bucket, and can be extremely sensitive even if it's not narrow.
Tom: I think the solution I would propose, though it may be naive, we could not disclose so much information by doing things like analysing browsing history.
<englehardt_> +q
John: We've had some offline
conversations, but I want to bring them up here. I worry a lot
about applying Machine Learning to people's behaviour to decide
things about them. Even as part of a FLOC.
... The browser could actively tell the user, "I think you're
gay" and "I want to broadcast this" and I don't think users
would like this. Why do we want to apply ML to discover
this?
<Zakim> christine, you wanted to ask about risk of "identifying" users with enough FLoCs
Christine: You say in the slide the FLOC can change over time. Is there a risk a uesr could be identified by linking a number of FLOCs? If I know you're in 43a, 43b, etc.
Kleber: If there is a first party
site you visit over time where you're signed in, then they will
be able to see the FLOC changing over time. They might be able
to learn more about you over time. Sites you visit over time do
have the opportunity to learn more about you, this is a chance
to do that.
...: This doesn't cover changing the model of first party
identity. So seeing the change in your FLOC over time is
dependent on your ambient identity on the web.
Melanie: I share some of the concerns about not knowing what's sensitive. e.g. gender In Microsoft, we've seen that some people really do like tailored ads. What about an opt-in model? I think that could be a more valuable model to advertisers.
Kleber: My goal is to figure out
in what circumstances it's ok use this information. User
consent is needed and it's a big sliding scale of how involved
they were in that choice.
...: In the case where you are using the browsing history, you
may be fine sharing your top 10 visited sites and that feeds
into a specific cluster. You get to review the information
being used to build the FLOC.
... Likewise the user volunteering interests seems like a
reasonable thing to use.
... However, it doesn't fully address Tom's concern. Those top
10 interests may be giving away something that they didn't
intend.
... Clustering those interests puts you in a cluster of
representative interests, not specifically your interests.
However an advertiser could still use that cluster to derive
something like sexuality.
Tom: That sounds like a good reason not to share any of that information.
Kleber: I think people may have different feelings about sharing seemingly innocuous information that leads to latent discoveries of sensitive information.
Stephen: We are specifically concerned about cross-site tracking that allows advertisers to discover more about users than they expect. This API seems like it still gives away information but is about preventing that arms race. Are we confident this is useful enough?
Kleber: Yes, the balance is this
being useful enough for advertisers and it being information
that the user is happy to volunteer about themselves.
... Now onto PIGIN.
...: FLOC was about interest based ads, or targeting people who
are similar to you.
... PIGIN is about remarketing, aka those ads that follow you
around the web.
<dbaron> https://github.com/michaelkleber/pigin
...: Ads based on the advertiser observing you did something in
the past where they believe you would be interested in an ad,
e.g. you added something to your basket and it's still there 2
hours later.
... Currently these ads are based off of cookies.
... PIGIN is about allowing the advertiser to do this without
being able to track you and link your behaviour across
site.
... The advertiser gets to create interest groups and when they
see the user do something, they can request the user is put
into the interest group.
... It's up to the user agent what it does with that
request.
... At some point later when you are browsing around the web,
the browser sends a request to the network, and it sends the
interest group the user is a member of.
... The browser can't reveal all the interest groups. However
we will assume there is some magic crypto black box service
that can pick the most valuable interest groups you are a
member of that is also appropriately large enough.
Tom: Those interest groups still could be very sensitive.
Kleber: Yes, if you visit a site and add something to your cart the site could ask you if it can show you ads. So, if you've been to kites.com and been to Harley Davidson then it may be revealed that you are in both.
Tom: The perceived innocuous
nature of multiple data points are at odds with the conjoint
data that results. We know the user is not able to reason about
the impact of all those data points together.
... They then find they are being targeted on a resulting data
point that they did not realise about themselves. This feels
like we are setting up data leaks based on seemingly innocuous
steps.
Kleber: In the event a person
finds themselves targeted on a surprising characteristic, the
user can tell that they were targeted on those specific
interest groups. We can make it clear that the adverstiser must
reveal the types of ads targeted against a specific
group.
...: The browser can help you find out why you were targeted
with those ads.
Tom: I love the idea of creating
an audit trail of what ads were served and why. I don't think
what you've described covers the case where it's a combination
of innocuous characteristics A, B, C, and D reveal the
sensitive characteristic D. The user sees they are targeted on
the innocuous ones, but they are not told about the
other.
... Inferring things about people is definitely valuable.
Kleber: Would revealing only one of these groups solve this? If you're a New York Times subscriber and see that you're getting Harley ads one week and Kite ads the next, they have that all that information.
Tom: I feel like you're making my point - just because I read those articles doesn't mean I want the site to know all these other interests.
Christine: This proposal has the feeling of making the browser more integrated into the ads system and becoming more involved in the ads role.
Kleber: That's fair. It's a new idea of the browser having specific APIs intended for the advertising use case. This is the browser saying that advertising is part of the way the web works and we are giving it first class support.
John: If you're pursuing this it would be good to make the signal to advertisers - broadcast or repository - be a separate thing. Other browsers can then chose to do different things. Whether the interest is deduced, chosen, or falsified.
Kleber: That's the intent. The browser can do whatever it wants, it has the responsibilitity to maintain the k-anonymity.
<englehardt_> +q
Kleber: This is an attempt to
solve a few points, but I'd love to keep the conversation going
on wider topics - e.g. how to allow advertisements to show up
in the browser based on more than just the page you're on that
moment.
...: Some of these are more long term
... Moving a block of ads into your browser for private
information retrieval.
... Or maybe when you're on Site A, that site gives the browser
an ad to be displayed at a later date.
... This becomes something more similar to Brave's model where
the ML happens in the browser.
Tom: It's a great system and it's going to save the web.
Kleber: Possible.
Stephen: You still run the risk of leaking information about the user even if selection happens on the device. For example if users click on ads that correlate with users that have diabetes then I can track that.
Kleber: That's unrelated to PIGIN or FLOC though, that can happen with other ads.
Tom: We limit that by requiring just one category on an ad.
Kleber: Summary - heavy scepticism.
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/latent discoveries/latent discoveries of sensitive information/ Present: toml christine weiler taraw No ScribeNick specified. Guessing ScribeNick: rowan_m Inferring Scribes: rowan_m WARNING: No "Topic:" lines found. WARNING: No meeting title found! You should specify the meeting title like this: <dbooth> Meeting: Weekly Baking Club Meeting WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]