W3C

– DRAFT –
Improving Web Advertising BG

27 April 2021

Attendees

Present
apireno_groupm, AramZS, arnaud_blanchard, bLeparmentier, bmay, Brendan_IAB_eyeo, dinesh, eriktaubeneck, FredBastello, GarrettJohnson, gendler, hober, imeyers, jdelhommeau, joshua_koran, Karen, kleber, kris_chapman, lpilot, Mike_Pisula, mjv, mserrate, nics, nlesko, pedro_alvaradoo, pl_mrcy, seanbedford, wbaker, weiler
Regrets
-
Chair
Wendy
Scribe
Karen

Meeting minutes

<wseltzer> https://github.com/WICG/privacy-preserving-ads/blob/main/MACAW.md (Kelda

<wseltzer> Anderson)

<wseltzer> https://github.com/w3c/web-advertising/issues/111 (Aram Zucker-Scharff)

Wendy: On today's agenda we are hearing about MaCAW
… from the team that brought us PARAKEET
… Looking at issue that Aram raised on publisher deal types in use case document
… And any other business
… On deck a discussion next week from the PRAM business use cases document
… Any agenda curation?

Agenda-curation, introductions

Wendy: other business to raise for us?

Basile: Next week, we are launching ML Challenge
… I sent a mail

Wendy: the ML challenge
… add that to the agenda

Wendy: Any introductions of new participants to the call?

MaCAW

Wendy: not hearing any, let us go to MaCAW

KAnderson: Please take it away

[slides: Privacy-Preserving Ads - MaCAW]

<wseltzer> https://github.com/WICG/privacy-preserving-ads/blob/main/MACAW.md

Mehul: We will do a quick recap of PARAKEET
… and talk about addition with MaCAW
… everyone curious about networks
… and timelines
… Let me spend a couple minutes on PARAKEET
… this summary won't do it justice, so refer you to the detailed proposal

<wseltzer> PARAKEET

<AramZS> (I approve of the use of cute bird photos being added to illustrations of the various bird proposals.)

Mehul: The whole idea is advertiser side is similar to FLEDGE
… JS on behalf of DSP
… adds feature to local storage, call an S
… when user shows up on publisher site
… browser will add differential private version of feature
… anonymizes features
… SSP N point
… gives winning bid back
… high level summary with scrubbed out publisher context
… Second thing is to explain trade-off between monetization and privacy paramenters
… In beginning we will test without differential privacy
… instead of cookie provides a feature in user request
… from the limitation perspective, they are together
… some potential adversarial privacy attack
… in the issues we added items on how to make it non-correlatable
… thank you, Michael for open issues
… second two issues we will focus on
… As we implement the privacy function
… on targeting profile and user features impact bid models, ranking and auction
… second thing is c' impacts brand safety controls
… still misses out on certain tokens
… we do provide some solutions in GitHub
… that is a high level summary of PARAKEET
… before going into MaCAW
… a simplified graphic of PARAKEET

[slide]
… helps to anonymize brwoser request
… DSP runs the bid model
… SSP runs quality control and auction
… if we consider this as simplified setup, MaCAW adds more
… as part of c' and s' there was some accuracy question
… through user features and publisher context
… runs on first level auction
… response to provide a set of ad instead of one winning ad
… and can specify if they want to [missed]
… for each ad, reach out to DSP end point to compute high-level function
… talk about brand safety running only in publisher context
… and bid model
… ad known to DSP
… knows what ad means
… disfunction
… ends when computation finishes
… will retrieve bid
… outcome still not known
… try to report
… encrypt value
… drop bits that are zero
… take ad and bid and try to compute auction score
… view in plain text
… a lot of auction...depends on ad text
… bid values provided for ranking
… identify ad with highest score
… this is overall proposal
… how does it work
… how does DSP know to implement, how does SSP know how to implement
… What does ad response look?
… This is a strawman proposal
… high level I would focus on this ad response
… steps 4 and 5
… computed between two hosted servers
… focus on ad response in step 3
… we assume ads are coming from diff DSPs
… provide some sort of origin
… model is hosted, or any computation is hosted
… out file
… how it knows how to interact in protocol
… some sort of JS processes the publisher context and standardized structure
… convert some information to fit into model
… a few things for auction
… SSP provides endpoint for computation
… ad quality, auction, ranking
… similarly brand safety and big @ are in same construct
… how to work with two-party secure compute?
… interesting things will happen
… this is where we provided a compiler
… ML to start
… focus on DSP
… trains the bid model
… outputs some flow
… model format
… then there is a compiler; takes this model in, outputs model.out and model.weight
… assume not share model with anybody
… model.out is high level to think about, some sort of a flow
… to participate in secure compute
… browser service will host this model.out
… keep it cached
… ad server does model.out
… technically look at ad server, doesn't need to figure out ad secure protocols
… compiler takes care of it
… this is ML model
… compiling can write any C code
… most of use cases...sorting function, explainer
… write a C program
… provide a structure to browser service
… Nishant and Divya are developing this compiler
… becomes a generic programming language
… to express any kind of evaluation
… two-party secure protocol for that
… All good so far
… Let's talk about performance
… Evolution we did
… N variable set up
… latency is 200 millisecond
… computation is 51MB
… ad server where model is hosted
… key concern we have as we grow number of variables, the latency grows steeply
… from @ to 805
… more progress to make
… two key directions to discuss
… First is mixed mode computation
… instead of encrypting all the variables
… like publisher context and features
… chooses notion
… compiler randomly encrypts a part of it so called mixed model
… like GPU side of world
… doesn't need to worry about reach
… if in plain text...negotiates run time
… brings down computation requirement
… Second idea is a non-colluding helper party
… no incentive to participate with ad server
… if third party not possible
… no incentive
… something like that is quite staggaring
… brings down latency
… can be further optimized
… that is key situation
… I will take pause here
… there is a lot of content in the explainer
… If you have questions, file and issue
… We will be hosting biweekly calls
… end talk
… key thing
… while we restrict certain flows
… we are trying to re-enable in a privacy-preserving function
… with secure compute
… more robust it becomes, more robust...for ad servers
… this what I have on the MaCAW
… Do you want to walk through PARAKEET Test?

<AramZS> would be great if the slides could be shared? But at least some of them are at https://github.com/WICG/privacy-preserving-ads/blob/main/MACAW.md

<Paul_Farrow> Off-topic question for the group. Are there conversations happening around openRTB standard for transmitting FLoC IDs that I am not aware of?

<jrosewell_> https://www.microsoft.com/en-us/research/project/ezpc-easy-secure-multi-party-computation/

Wendy: Thank you, Mehul

<Zakim> weiler, you wanted to ask why c' and s' need to be in plaintext and to ask re: trusted parties

Wendy: let's go through the queue

Weiler: you said c' had to be in plain text; why is that?

Mehul: had to be available together to DSP
… still encrypted
… for party to consume
… they are together

Weiler: thank you for clarification, but was a bit misleading
… trusted party looks like browser service
… you describe compute offload
… and it is also doing the proxy
… are there any other trusted parties or just that one?

Mehul: just that one
… some interesting proposals about IP proxy
… this is only trusted party is first answer
… proxy can be eliminated if there is [missed]
… nat catcher is one of them
… have some restriction

Weiler: your doc wasn't convincing to offload the computation
… I see where you need the proxy

Mehul: offload is because multiple transactions functioning
… client located at end in network, the latency and bandwidth would be impacted
… why we offload it here
… be done with ad server
… let me clarify for PARAKEET or MaCAW

Weiler: not sure I had distinguished between them

Mehul: proxy part can be done on a client
… poss of hosting in a browser client
… had quite a bit of peer to peer comms
… once we know differential privacy...
… secure compute is expensive to run

Weiler: how much data are you talking here?

Mehul: for anonymization
… how many rests are going out
… between devices

<weiler> How much data needs to be sent between the computing parties?

Mehul: if it lasts two seconds, how many requests went out to NYTimes

Weiler: how much data gets sent for this secure computation; how many megabytes?

Mehul: this slide [performance; key considerations charts]

Wendy: great if we can get these slides to share

Brian: interesting proposal
… seems to me it's focused on a subset of use cases that require access to both SMC
… consider pure C and pure S channel and one that is privatized
… that don't require both S& C together
… and be unencumbered for multiparty computing

Mehul: great question
… separate C and S
… time-based attack if those parties collude; that was a restriction
… open up separate channel for S and C and combine in browser channel
… time is interesting part of that
… mixed mode doesn't need to encrypt everything
… can be real time random info
… but not assume that info on C
… is an attempt; in step 5
… we had to encrypt the ad
… could be a function in publisher context
… auction requires bid
… there is a time based issue

Brian: for example, one thing to consider
… preventing fraudulent ad request
… do that solely based on who user was
… and proxy could make sure no privacy compromising info was combined

Mehul: good idea, I would like to learn more if you can put that into an issue
… we can pick that up

Brad: wondering if you had plans to upstream the work for Edge
… along the lines if we can get
… these proposals being tested on multiple browsers and test side by side

Erik: We are very interested in that, Brad
… slide about to get to is about our initial experimentation plans
… for rapid evealuation
… we want to start doing commits to browser
… our preference would be to do upstream in Chrome

Brad: look forward to that

Michael_L: I had a couple questions
… first I made note you mentioned private and trusted parties
… only the browser being a trusted

Wendy: come back to Michael

ErikT: slide with performance characteristics
… for two PC
… what is security model is this under?
… malicious or semi-honest

Nishanth: this is under semi-honest
… one clarification
… performance number, there is matrix vector computation happening
… 400 cross 400 matrix with vector of 400 plus 1

<Michael_L_> Not sure what happened

Nishanth: performance better if [missed]

<Michael_L_> I'll rejoin

Michael-L: adding third party
… for improvements
… in my experience moving from two party to third party slows things down
… does it provide triples?

Mehul: no two parties will collude
… in this setting, this third party has no input; setting can be greatly improved

erikt: has traditional level of trust

Mehul: still third party with one corruption
… we don't allow collusion between two of the parties

Mehul: what happens, browser server and helper colludes and decrypts
… if that server and helper collude...user features

[too fast]
… no special trust with helper service; assumes no collusion happens

erikt: you might be participating but others might collude behind your back

Mehul: yes

erikt: client here is not the browser

Mehul: one second
… this is the client [points on slide]
… good bandwidth
… number of round...keep it on server side

erikt: might prefer not calling that client
… confusing that is the browser instance
… seems impractical for browser to...

Mehul: assume browser itself in similar network, give same performance

Erik: My feedback is better to say helper one, helper two
… client in this doc makes it feel like it is happening in the browser
… be concerned that browser downloads 50MB for each ad
… communicate this is a rquirement
… might clarify the naming to prevent people from misinterpreting

Wendy: Thank you
… Michael are you back?

Michael_L: similar to question just asked
… why you are making assumption that browser and ad server would not collude
… think of one browser and ad server that colludes
… assure you that they collude
… what about breaking the encryption?

Mehul: encryption is one part
… it's browsers and ad servers colluding
… browser knows user data
… part of model
… need to be super setup
… limitation in design

ErikA: A fair question and concern
… browser vendors need to demonstrate they are not colluding
… a few ways to document how to avoid that
… open sourcing and demonstrate that
… a fair concern
… compared to alternatives
… with levels of trust from third parties, harder to do
… onus on us to demonstrate wherever there is helper service to have independent evaluation

Michael_L: have you considered some compliance requirement?

Mehul: have not gone that far yet
… discussion...we can add more to it
… more focus on technical part and start layering more details if you have some thoughts

Michael_L: is this FLEDGE or FLoC adjacent proposal
… timing attack only in FLoC/FLEDGE model
… is this a privacy preserving for use with F/F or ubiquitous

Mehul: first clarify on the timing attack
… not referring to FLoC or FLEDGE
… we are referring to our own PARAKEET models
… from computation perspective
… does all ad serving
… not timed to compare with FLoC, FLEDGE or SWAN
… user visits site; have user features
… could be retargeting, interest based
… not explicitly
… not clear on timing attack

Michael_L: smaller publishers can participate but adds a penalty
… adds 500misecs to some request
… 500 misecs is a lot
… have seen removal of publishers because of that

Mehul: Spurious ad request is where to remove parameters
… example of FLEDGE, a separate discussion
… in PARAKEET doing c' s' prime
… how many requests...a way to group it
… request model, introduce more request and less scrubbing
… becomes more efficient in steps 4 and 5
… too early to comment on what @ looks like
… referring the spurious request
… Agree with the small publisher comment and we will keep that top of mind
… for monetization opps independent of size of publisher

Wendy: Michael Kleber

Kleber: on trust in browser service, an excellent question
… From Chrome's POV
… PARAKEET and MaCAW are a great development
… but our chief concerns are way to run browser service in way to get appropriate level of trust
… in what that Michael L was talking about
… plus one the proposal, and need a better answer
… for how to trust the browser and server being run by the same company

Mehul: that is an interesting and valid concern
… trust model to focus on is not ad server
… think about a bid server in future setup
… trust but two parties
… user trust
… bid models, focus more on user role
… you are right, may be right for user
… probably would work with additional guarding

Kleber: interesting question but needs more work

Erik_Anderson: some of this work is in progress
… we are about to publish to Github repo
… close to landing
… we have an early version of PARAKEET service and server to help reason over these flows
… how to do an early registeration
… how to gain compact with us
… and have a more formal sign-up page for folks who want to engage and be aware of how you are using the service
… our goal is to have folks try out the API
… and after you have hands-on experience, get that into us early so we can iterate
… this initial one is similar to FLEDGE
… approach in terms of not being quite locked down yet
… not have all the anonymization pieces yet
… not get any more than with cookies today
… get enough traffic going through this prototype
… so anoynimity checks
… give a more accurate assessment
… some chicken and egg to get one of them in place first
… that's it at a high level, and we will post more details
… let folks experiment with the service

AramZS: so, I'm curious for the test setup
… to test this the ad server needs to be given certain features
… ad server changes; unclear what those are

Mehul: I can give a quick anser
… key change
… probability to DSPs more than SSPs
… key change there
… we are talking about
… user ID or cookie gets passed around
… comes to ad server
… instead of user ID
… that's the change
… accept user features in that request
… or whatever they are doing

AramZS: confusion as to who has to do what
… publishers has to do changes in what they send to ad server
… request would have to change; access change; a polyfil
… SSP or DSP receives those features
… you mean JS on publisher page on behalf of advertiser?

Mehul: no,....[missed]
… no change for SSps
… set up for DSPs only
… DSP first makes ad server changes
… onfo in user interest
… Google JS as DSP, on Nike.com, user visits shes
… that is first change

AramZS: you need a combination advertising leveraging the values
… and publisher collecting values and putting them into bid process

Mehul: that is correct, but as developer you can override that slot
… Google developer working on ads team changes DSP script
… goes to Nike.com, adds features in...
… goes to WashPost, write it

AramZS: I understand he is saying that override
… intercept request and use developer tools

Erik_A: Very first phase for a developer to understand model
… how it works end to end
… then you can access polyfil library
… right now just looking for early feedback

AramZS: so how to place stuff in user feature set
… that makes sense
… I wonder if a lot of publishers who do their own remarketing would be interested in this
… that all makes sense

Pranay: I had a question on the test setup
… Anonymization of contextual features?
… I don't see that/

Mehul: the anonymization of context, user features are independent of how much traffic we are seeing
… not part of goal in test setup
… current is to provide a working setup so ad request flows
… and data flows and APIs come together
… get that to parity with system, no privacy scrubbing
… if it comes to parity, put more traffic into it
… then we can test for anonymization
… and have more visiblity into that
… and add functionality
… safety controls, etc.
… you are right, this is not the goal right now

Pranay: We will wait for that setup down the line in subsequent testing

Mehul: yes, may ways to do anonymization
… don't want to focus on one right now
… many combination of algorithms

Pranay: the @ has 6-7 data flows

[missed]
… we as publishers want to know in advance what kind of latency this means
… that is a bit far out, but important for us to understand the overall latency

<AramZS> Agreed! Latency is a big concern in what I'm seeing here

Mehul: yes, we are trying to implement on our own ad servers
… in the wild, with many people trying
… many we can know
… zero point one zero percent

Valentino: how do companies submit for the trial?

Mehul: We will publish the Github repo
… and you can try to implement using the API
… we are not there yet to have a portal for signup
… Erik?

Erik_A: more informal Github meetings or send us email
… we are currently working on a portal
… to make it less daunting to register
… let us know your concerns about the process

Wendy: good to hear you are having lively conversations in the WICG
… and in the Github repository there
… sounds like that's where people should go
… We are at the end of our hour
… Aram, do you want to say anything about the issue you raised?

Publisher deal types in use cases

AramZS: a lot of the proposals we have seen

<wseltzer> https://github.com/w3c/web-advertising/issues/111

AramZS: from browsers do not take into account things like direct sales at client side level
… they are not familiar with that process or execution
… header bidding was built for execution outside the ad server
… to make sure these concerns are brought forward more effective
… I brought up bidding flows, direct sales flows in the repo
… great if we can bring some of these use cases into the doc of use cases and then map them against the proposals
… I want to make it clear how publishers use multiple SSPs in bidding process
… we mentioned speed, but there are many other issues

Wendy: Good, we can add to documents and bring up for future calls

Aleksei: about development
… now only JS libarary
… next step is to play with browser service, proxy
… who will play with that; how can I run on my local host?
… instead of talking to MS services?

Erik_A: not in the plan, we can talk offline
… point you to our alpha pre-prod thing

<wseltzer> [adjourned]

Erik_A: interesting idea

Wendy: Thank you
… sorry for going over
… thanks for great conversations
… see you next week

Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Maybe present: Aleksei, Basile, Brad, Brian, Erik, Erik_A, Erik_Anderson, ErikA, ErikT, KAnderson, Mehul, Michael-L, Michael_L, Nishanth, Pranay, Valentino, Wendy