W3C

– DRAFT –
Improving Web Advertising BG

15 February 2022

Attendees

Present
AramZS, aschlosser, blassey, bmay, dinesh-pubmatic, dmarti, GarrettJohnson_, hober, jeff_burkett_gannett, jrosewell, Karen, kris_chapman, l_pilot, lbasdevant, mjv, npd, pedro_alvarado, weiler, wseltzer
Regrets
-
Chair
-
Scribe
Karen

Meeting minutes

<wseltzer> feedback from the group on the current state of the proposal vs use

<wseltzer> cases. https://github.com/w3c/web-advertising/issues/134

Wendy: Welcome folks

[Wendy reviews agenda]

Wendy: Lots of good items for discussion
… Let's start with introductions

Introductions and Agenda Curation

Wendy: and agenda curation
… Do we have anyone new who would like to introduce themselves?

<wseltzer> Alexandru: Product manager for CRUMBS

Alexandru Daicu: I work for @
… Glad to be here

Wendy: Anyone else?
… any further agenda requests?

Questions on Topics API and FedCM

Wendy: Let's jump to follow-up from last week's discussion
… we wanted to hear more about the Topics API and Federated Credential Management
… Let's invite the queue
… Join us in irc and "q+" if you have a question

GarrettJohnson: Hi there
… I have one comment and two questions about Topics

<GarrettJohnson_> * 1) Since sites opt in to Topics, the most commercially relevant domains have better options: i.e. contextual targeting & FLEDGE IGs. Thus, one concern is that we should expect most commercially relevant Topic groups to mostly be users assigned to this at random and to find real users in the most anodyne Topics groups. * 2) Topics API incentives ad platforms to appear on as many pages as possible: how does this affect page loaload speed. [CUT OFF]

GarrettJohnson: I'll post

<wseltzer> Minutes from last meeting

[scribe thanks Garrett]
… Last question is researchers can use browsers; can researchers ID these third party domains?

<wseltzer> Topics API

<GarrettJohnson_> 3) Currently, researchers can use the browser to identify the 3rd party domains that browser interacts with and inspect information flows. Will it still be possible for researchers to identify the 3rd party domains and which topics are sent?

JoshKarlin: Let me start with second question first
… how would Topics affect performance
… good discussion to have here
… API only applies
… getting credit for having seen a user on a site about a Topic
… would need to have an iFrame on the origin
… one discussion
… on Github
… could we send Topic requests via Fetch, via iFrame attribute to get Topic as well via XHR
… others do this, like Trust Tokens and of course cookies
… a more lightweight approach for Topics and registering

<npd> there's already a current incentive for third parties to load as many pages as possible, to fingerprint or to drop cookies. it would be nice if we actually improved on that status quo

JoshKarlin: Not settled but under discussion
… I think you are saying for some domains that have

<wseltzer> https://github.com/jkarlin/topics/issues Issue List

JoshKarlin: more commercially relevant
… they would have contextual targeting in FLEDGE and would not need Topics?
… that is not at all obvious
… Seems like something we need to experiment with

Garrett: Shall I comment on that?
… My point is there is a bunch of Topic segments pretty valuable like Car and Finance
… if you are car buyer, doesn't make sense to participate in Topics
… other people can target this group and not need Topics
… set up a FLEDGE IG
… if it's opt-in basis
… if you are sitting on valuable domains, doesn't seem like you would want to participate in Topics

Josh: Interesting Game Theory; I could see that
… possible that one of those sites would be interested to use the API for broader interest

<npd> wouldn't carbuyer.example also have the ability to market access to third parties who want to see the user on that site?

Josh: unclear that that Topic won't show up for that user
… do others have thoughts?

AramZS: wearing my publisher hat for a moment
… I can also see why a site would be incentivized not to activate the Topics API
… not sure why that is a bad thing
… One of problems publishers have currently
… is data leakage can cause users
… to be tabbed by various third parties, in this case, the Topics API
… and then retargeted on cheaper sites
… removing from publisher site opps and forcing prices down
… If I have valuable site and Topics
… I would want to not use it, and I don't think that's a bad thing
… As a publisher with a valuable user base
… not a bad choice to not allow my users to be retargeted
… If I am mistaken
… I don't think intention is to force
… users outside of its own context to be tracked

Josh: yes

AramZS: seems to be a working as intended feature, so not sure what the issue is

Garret: concern that there won't be valuable Topics for users

<npd> JoshtJohnson_ (IRC): I'm not sure we heard as much about your researcher question. is your concern that a researcher or a user won't be able to instrument their browser to see where data is flowing? or a different kind of researcher use case?

<AramZS> Yes, I fully invite advertisers who don't see sufficient scale of a topic on the Topics API to come, instead, and find those users through our direct sales team.

Garret: that is a concern; I don't see the API changing and forcing publishers to use it

Wendy: Thank you

BrianMay: Looking to me like the issue coming up
… is it worthwhile putting this thing together
… if people with valuable signals, will keep to themselves
… would indicate this is a site I would not want to buy against
… would not be able to get enough of my interest, so it's actually becoming a counter-signal

MichaelKleber: I think that the on-going discussion
… of people wanting to keep signals to themselves and not use Topics API
… is actually helpful
… in today's world of third-party cookies, this same problem is already the case
… a valuable site...
… as soon as you let third parties onto your page
… you are giving away that info in some sense
… the nice thing about Topics
… is the way things work today
… publishers can effectively give away that info without thinking that they are doing it
… the Topics API makes clear and explicit that this info flow is happening
… Now it's happening without explicit intention
… to be in a world similar today, they can continue

<AramZS> In fact the Topics API is more control for the eTLD-1 to control that data leakage through a CSP I think?

MichaelKleber: if they don't like something, out of lack of thinking or paying attention

<npd> it seems like the publisher has an explicit option today to include or not include the third parties they embed

MichaelKleber: then Topics making it very clear seems like an advantage

BrianMay: A lot of publishers don't have a choice
… publishers with valuable content will make choice

Michael: Then third parties will need to make the pitch as to why they offer value as a result

Kris_Chapman: I was wondering...difference between

<npd> presumably the publisher should also see value in being able to target ads based on user interests

Kris_Chapman: and IG defined by brand or publisher in FLEDGE, and then a Topics
… seems there are three levels
… Topics and a self-defined IG; one for reach, one for targeting whom to reach
… wondering if discussion whether to pick contextual or interest based
… and how to pick self-defined vs. a Topics

Josh: Most obvious difference between them for usability
… Topics you can call from publisher context
… less fine grained data
… with FLEDGE you can make whatever topic you want
… Topics meant to be easy to use

Kris: That makes sense
… is there more institutional logic in FLEDGE on how to define these things
… Michael?

Michael: yes, that is absolutely right
… FLEDGE is enabling logic for anyone to create an audience
… Topics is about straight forward control of logic
… if a party all in on FLEDGE
… and have done all the work to serve ads that way, then maybe you will just use FLEDGE
… and Topics is small part of signals you use
… and figure out when to add people
… I think you are exactly right
… FLEDGE is a lot more work
… it's a heavy lift to start using FLEDGE
… it's a big change to how ad serving works
… Topics is intended to be more usable

Kris: Will tech providers build that logic into their choices, or is there something Google would push about the logic
… from advertiser or brand perspective, how that logic happens
… if done from every adtech platform, you'll have different results

Michael: yes, I expect that is what will happen and adtech firms will compete with each other on how good their platforms are
… Our goal is to be a platform for adtech to do their stuff on top of

Paul_Bannister: going back to conversation
… with Aram and Michael
… Topics is an upgrade from current world where Publisher gets control over what is being shown
… and game theory aspect
… We should design a system that publishers will want to use
… indicative value...like cars.com, and don't want to participate
… and general interest sites with low value, and want to particpate
… doesn't seem like a fair give-get
… where sum is greater than the parts
… if there is a more equitable give-get, that would be a goal

<wseltzer> AramZS: Inevitable that there will be some systems that some publishers don't want to use

<wseltzer> ... interesting thought that this could be a negative signal of low quality, which could also be useful

Paul_Bannister: I think there is some additional incentive
… when Topics API applied on basis of more than just domain
… very little incentive for WashPost to implement the Topics API
… to give only news
… don't need to indicate to a user
… even with a more detailed set of Topics
… I imagine there will be smaller and legit publishers who will use it
… and larger publishers with sales teams who will see it as a data leakage
… true for some, but not all publishers
… Don't think there is an API design that could get around that
… Moving to a different topics
… maybe a way to think about Topics API moving to a more positive signal
… specific sites that could be blocked
… or ask API to block particular sites from participating
… flip site of large publishers not using it
… is bad actors would want to use it
… brand safety
… check my ads folks
… have gone out of their way to remove systems to remove hate speech
… this might be a layer to allow those sites to create factors to monetize
… Chrome would have to add a block list
… would be asking a lot for a browser to maintain a list
… or call API and maintain some way to ask API not to apply on particular sites
… curious if this issue has been thought through or how it might be addressed?

Josh: Is concern that there would be a malicious player
… that browser or publisher doesn't want to access the API?

Aram: inverse, that an adtech system may not want to access the API

Josh: Only way for brand or Adtech to access if via an iFrame
… cannot be forced into using the API
… concern with fetch request or iFrame request
… use API and here is your Topic
… then don't have a choice

<npd> I would certainly want my browser to help restrict access to API if it's being used for purposes other than just targeting a current ad

Josh: we have to think that through carefully
… maybe think about server side saying yes
… I definitely hear your concern there

Michael: To be very clear
… the answer is what you are asking for is how we designed the Topics API in the first place
… if as a publisher there are sites you don't want to use for brand safety, then don't invoke the Topics API on that site
… not affect Topics you see on other sites
… if no one calls the TOpics API
… no way for site to see...or what things to see on other sites
… some adtech may call on other sites
… some division of web will get different views by their own choice

Aram: I see the other side of it
… the publisher may need some type of header so as not to display
… this implies every adtech provider would need to maintain their own block list
… as we know from current environment
… can end up on site they don't want to be on
… adtech needs to keep a block list on specific URLs or domains

Tess: ask for a clarification

<wseltzer> Josh: yes, you can use permissions policy as a gate

Tess: assume, gaited on permission policy
… can prevent third party embedder from @

Josh: can use permission policy who from anyone can use on your site
… or a sub-frame can make decision
… if you don't use API then your site is not included at all
… users calculation for Topics that week, and adtech origin is not aware that user visited that topic
… what I was referencing earlier
… it's challenging to call iFrame...perhaps send Topics and request headers
… send user's Topics along with that
… being opted in without their say to receive these TOpics
… as currently designed, only way to receive Topics is if adtech wants to
… so we are being careful there

<npd> there have been proposals for Client Hints to include an Accept- style response header

Achim: @....Topics on another site
… question for Josh if you can dive a bit more into the classification of @ names
… sub-domain classification; should not be about signal; how to produce meaningful results

Josh: nice to use page content or URL
… where we stand is not quite confident to do that yet, because URLs can contain sensitive information
… use the host name, top level domain, plus host name plus sub-domain
… training data is host names and then human labelled Topics
… and see those names and what Topics are there, and we train on that
… there is a lot of question

<npd> I'm not clear on what the sensitivity difference is between the domain and the path

Josh: X thousands sites, would we include the human labelled data; will be some false positives

<kleber> npd: If you do a search in Google Photos, for example, the search term appears in the URL's path component :-/

Josh: maybe we supplement the model with human-labelled
… can imagine sites...like drive.google.com
… could think "cars"
… why we are thinking of supplementing the top sites
… discussions about gaming this, and how big is that risk
… one site wants to be most valuable Topics, like cars
… so every site adds "my site's topic is cars"
… then it will call a car ad

<npd> kleber (IRC): paths can absolutely be sensitive! I was just thinking that domains can also be sensitive, and I'm not sure one is inherently less sensitive than the other

Josh: we worry about API being polluted an user's Topics no longer having want

<Zakim> npd, you wanted to comment on other implementations of the API

Josh: want to make sure they are correctly classified

NickD: something I think is coming up and worth discussing
… as we get towards standardization
… state which things are implementation...or request topics for targeting
… you may have different implementations not inferred from browsing history
… or infer it from 'this ad is good'
… or I'm willing to select or remove Topics based on interest
… how are these inferences going to be drawn on the client?
… Maybe users will pick Topics in different ways and pick browser and UIs as way to do that
… and not just ML on domain names

Wendy: any response?

MichaelK: yes, we completely agree, the API should be a different thing from the logic
… built so you get a separate tag
… to go with the Topics assignment algorithm

<AramZS> I think the likelihood is that sites will try to game the system actively.

<AramZS> Incentives are VERY high

MichaelK: if different browsers make different choices of algorithm...or signal to consumer [did not get all of that]

Wendy: thanks for pointing us to the standardization considerations

Olaf: the discussion has been a bit one-sided
… if I'm car.com
… will attract car people
… or WashPost, I'll attract news
… but maybe someone wants ads for different topics
… not just attract car ads
… attract ads for fashion or travel
… I think we have been one-sided

<npd> I'm not sure you even need to tag with the algorithm for the API consumer. if I tell you what ad topics I'm interested in, I'm not sure I also need to tell you how I decided my own interests :)

Olaf: owners or valuable domains might want to fill up with ads they haven't been able to sell
… Wonder if Topics might be an interesting addition to them
… Am I only one with that opinion?

<blassey> npd: here's another bit of prior art for browsers having users selecting and curating topics they are interested in https://wiki.mozilla.org/Content_services#:~:text=Mozilla%20Developer%20Network.%22-,UP/Intent%20Engine,-UP%20is%20our

<npd> no spec yet that I've seen :)

BrianMay: I'm a little confused about how Topics API gets triggered; seems like from an IFrame
… if publisher has not opted out, then they have tacitly opted in

Josh: Yes
… Topic of that publisher page will be know to that party

Brian: So publisher could opt-in without their knowing it

<npd> but they could use Permission Policy to explicitly say yes or no

Josh: If they are unaware of the adtech, then yes

@: could we prevent unknown publisher?

@: so publishers could just set the policy; since they have no way of knowing

Josh: you can restrict the permissions policy; can say who is allowed

Michael: Maybe what Paul and Achim were getting at

<AramZS> Seeing as the queue is closed I will note that while we are interested in testing Topics API, I'm not sure it would actually have value for us as a large publisher with a significant direct sales group handling our own domain. I can't say for sure without the actual testing. Though I know other large publishers would likely be more definitively uninterested in the implied leakage of user data. And I'm also sure some would.

Michael: one third party we are willing to let call the API...they are offering us money to do that
… and can permit them to add cars.com visitors
… who are interested in cars, from FLEDGE or calling Topics on that site POV
… that is perfectly reasonable
… for adtech companies and publishers to come to an agreement about
… not sure if that is what Achim and Paul were driving at

Achim: yes, that is what I meant

Michael: The FLEDGE API is well-suited for who is allowed to build audiences
… Topics API is not directly designed for it, but permissions policy gives you that control
… and could be used that way if people wanted to

DonMarti: hi, wanted to go back to topic of
… niche or topic-specific sites being on uneven with sites of more general interest
… that might not produce as much usable Topics info

<Katherine_wei> Can these privacy sandbox APIs such as Topics and Fledge be used on Chrome mobile apps? Assuming yes but wanted to confirm

DonMarti: understand concern for not using the URL path due to sensitive info
… and understand concerns about not wanting to to use TOpics that site claims for itself
… could a site be allowed to specifically designate section
… as in this is the sports section of 'big metro newspaper' or this is the business section
… or specific contributors, like this is Bob's or Alices' channel

<npd> subreddits, for example

DonMarti: and spec your own domain for Topics

Josh: A sub-domain is included in a TOpic
… so a sports domain
… will assume it will be about sports if labelled correctly
… how it is done today
… I'm all ears for how adtech can make this classification better

<AramZS> I would like this as well on one hand... on the other hand I imagine that this would be very easy to game.

Josh: cannot understand how it avoids problem of gaming...love to hear you on that

<dmarti> https://github.com/jkarlin/topics/issues/17

Josh: suggestions that ML model is in charge, or maybe get extra info
… browser thinks it
… is close enough to Topic on a page
… there is a way to do this

DonM: splitting out subdomains has its own concerns
… understand you don't want a site to say, TOpics are such and such
… but all the privacy policy is in its own section so the ML gets trained on legal info the actual content

<kris_chapman> it feels like this is pushing advertisers to use specific ad tech/publishers - which works for ad tech/publishers, but not generally liked by advertisers

DonM: I dropped in a link to the issue

<wseltzer> https://github.com/jkarlin/topics/issues/

Wendy: noting an active discussion in the Topics API list, encourage you to head there

Angelina: as you are talking about trying to get accuracy for the Topics

<npd> this sounds like an interesting implementation-specific detail where UAs can come up with their own system to accurately infer interests or learn them from users in different ways

Angelina: and to trust if content is being indexed properly
… we have Google search has history of being accurate of providing topics and content based on key words
… getting one's company listed properly may be one suggestion
… my question is more around the number of Topics
… thinking about buy-side, agency side for so long
… amount of content Topics that an advertiser has
… questions around the number of Topics besides one per API caller
… could that result in issuing too many ads
… my habits, consistent week over week, but same time, don't want topics that dominate my online behavior to be the only ones to dictate what I see
… thinking about what else could be assigned to a browser

Josh: Great question
… different sites will receive different Topics
… if only have TOpics from Topics API, will have five Topics for this week, all mixed together
… hopefully there is some variety there and not all monotonous
… adtech will likely take advantage of additional topics
… can draw upon older data on what is of interest to the user

Angelina: Will users have option to add their own Topics?

Josh: That is entirely up to the browser; I cannot say

Wendy: Thanks for this discussion
… I closed queue to get a brief update on some of the other items here
… the PAT CG had a lively F2F last week
… Aram offered to give us a progress report
… Aram, can you spend a few minutes on that/

PATCG Progress report (AramZS) https://github.com/w3c/web-advertising/issues/133

Aram: high level overlap for this group

<wseltzer> https://github.com/patcg

Aram: PAT CG met last week; six hours of meeting time and went through a whole lot of stuff
… main focus was on measurement
… we are actively spinning out additional topics on measurement
… as things are being talked about and discussed
… our meetings are not as frequent
… so we expect much of discussion to happen in Github and with pull requests
… if you want to get involved, check out the issues
… good place to get engaged
… you don't have to know markdown to interact on an issue
… trying to get everyone involved
… briefly, Topics API and potentially PAT CG talking about it in that context
… there is an issue about it; so please engage if interested
… we will be scheduling another meeting soon
… It is a Community Group, so anyone can get involved
… just sign up via the CG page
… open up to any questions
… if you are interested in the measurement proposals, that is happening in PAT CG so good place to get involved

Wendy: I put the Github repository in irc so people can follow

Rotem: Firstly
… I am part of this group [PAT CG]
… info is shared in nice way; well documented

<AramZS> That's good to hear, thank you!

Rotem: wondering if any plan to make meetings more friendly for those of us based in Europe

Aram: yes
… we will put up a poll and be more friendly to our European participants
… we have participants from all over the globe
… but yes, next meeting will be more European time zone friendly and we will switch back and forth

<kleber> I look forward to the US-at-2am meeting era

BrianMay: We got a number of things in PAT CG that started elsewhere

<npd> applause for equal unhappiness

BrianMay: if talking about Topics in PAT CG, we would also want to talk about it in GitHub for Topics

Aram: yes, question if PAT CG should take up Topics in some way
… hope to resolve this week
… pending proposal authors agreement, to follow the CG model of bringing into PAT repository as an official thing we take up
… discussion about Topics API should happen in Topics Repo, where what to do about them in PAT CG

<wseltzer> https://github.com/patcg/meetings/issues/32

Brian May: So move into PAT CG so they are all in one place?

Aram: yes, where Privacy CG has taken up things brought up
… and in set of repos that PAT CG manages
… does that answer your question?

<AramZS> That is indeed the correct issue

BrianMay: It does

Wendy: link to issue 32 on taking up Topics API
… if you were referring to a different one, please add link
… to follow
… question for PAG CG on what they will take up and to know where things are going
… and see where people want to participate
… CGs are open to all
… click the join link for the contributor agreement if you want to participate
… and we look forward to hearing and seeing your progress
… This takes us to the end
… Other topics that we will push over to future meetings
… Geo-IP use cases
… requirements of first-party sets
… and other topics you want to add for future meetings
… We next meet on March 1st, two weeks from now

<wseltzer> [adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).

Diagnostics

Succeeded: s/CRUBS/CRUMBS/

Succeeded: s/[CUT]/load speed. [CUT OFF]

Succeeded: s/what/how would/

Succeeded: s/others do this/others do this, like Trust Tokens and of course cookies/

Succeeded: s/use it/use it, and I don't think that's a bad thing/

Succeeded: s/Garret/Josh/

Succeeded: s/we/third parties/

Succeeded: s/call/call from publisher context/

Succeeded: s/yet/yet, because URLs can contain sensitive information/

Succeeded: s/Uis/UIs

Succeeded: s/vial/via/

Succeeded: s/@:/Brian May:/

Succeeded: s/topic/issue

Succeeded: s/GIO/Geo-IP/

No scribenick or scribe found. Guessed: Karen

Maybe present: @, Achim, Angelina, Aram, Brian, BrianMay, DonM, DonMarti, Garret, Garrett, GarrettJohnson, Josh, JoshKarlin, Kris, Michael, MichaelK, MichaelKleber, NickD, Olaf, Paul_Bannister, Rotem, Tess, Wendy