<steveb> Hi
<wseltzer> chair: Travis
<scribe> scribenick: slightlyoff
Travis_: proposed by
jrosewell
... opportunity to debate UACH tradeoffs; not sure how we want
to proceed
... do we want to go through open issues?
<AramZS> is there a github repo?
<AramZS> ah here
<AramZS> https://github.com/WICG/ua-client-hints
jrosewell: yoav added tags to some tags asking for feedback? Perhaps start with usecases?
<Travis_> https://github.com/WICG/ua-client-hints/issues?q=is%3Aopen+is%3Aissue+label%3Afeedback_requested
<discussion of which issue to start on>
Travis_: can you give us some background on this, James?
jrosewell: 2 areas: HTTP header
for UA; no definition for how that should be structured, so
conventions have materialised over the years. Problematic for
parsing and strucrture. Second consideration is re:
fingerprinting
... one of the other uses are fingerprinting and detection;
e.g. for fraud detection
... analytics, etc.
... there's strong ovelap with those use-case and
tracking-prevention policies from Mozilla/Apple/etc.
... Accept-CH is a header that provides extra information
today, currently providing bandwidth, memory, etc. Proposal is
around adding more fields, particularly information currently
part of the UA header. There's some complexity around something
called GREASE
<weiler> that misrepresents GREASE
jrosewell: an issue around the
need to potentially obsfucate.
... another issue around the structure and value of the fields.
Aligning the fields and making them easier to
extract/parse.
... another issue around the first/second request timing
... another document that discusses the entropy that a device
provides in different situations
... <discussion of parties involved>
<Masinter> I wanted to ask if
Travis_: does anyone else want to supplement this description of the feature?
<Masinter> anyone had considered the old IETF work on Media Features
<steveb> Perhaps, I think it relates to what the data is for. For example, if Sec-CH-UA is for telling the server what 'user-agent' (i.e. browser) the client is using, GREASE would appear to get in the way of that responsibility.
<Masinter> yes, thanks wwendy
jrosewell: <discussion of how/when browsers apply rules/policies>
<AramZS> @MasInter: can you link?
jrosewell: my business provides
services that provide device information services; not in the
grey areas talked about...feature phones in sub-saharan africa
making server-side optimisations based on device model
... making heavy use of this
<Masinter> RFC 2506, 2913,2533
Travis_: this is on firs request?
jrosewell: yes.
... there was a companion proposal for potentially making this
available on first request
<AramZS> For those not familiar with the URL structure:
<AramZS> - https://www.rfc-editor.org/rfc/rfc2506.html
<AramZS> - https://www.rfc-editor.org/rfc/rfc2913.html
<AramZS> - https://www.rfc-editor.org/rfc/rfc2533.html
<AramZS> Thanks Masinter!
jrosewell: lots of firms involved
in analytics use this data too. First party (site owners)
learning how to make their sites better, e.g. based on which
OSes and features are available. Next are aggregated analytics
(e.g. statcounter), comscore, ipsos, etc. Aggregated from
multiple sites
... the aggregated analytics cases are potentially impacted by
this. Tried to collate these in our PR.
Travis_: using the queue for our discussion today
<jyasskin> The HTTPWG at the IETF had a discussion of first-request client hints at https://httpwg.org/wg-materials/interim-20-10/minutes.html#client-hint-reliability.
Masinter: wondering if I'm missunderstanding...have you looked at older work on media features from IETF? Trying to describe capabilities, charistics, and content-type ("the 3 c's") as a model. Didn't succeed because the client may report something which the server couldn't trust (buggy), so folks moved to UAs instead
Travis_: is there a question there?
Masinter: question is: have you considered that older work?
jrosewell: appreciate the link to that body of work...that original use-case is partially what that information is used for today; tryign to understand what that device can do, it's capabilities, etc.
yoav: I wasn't aware of that earlier work, Masinter . Perhaps tackles a slightly different problem? CH doesn't try to tackle feature detection. CH as a draft has been in the HTTP WG for several years and is now graduating to an experimental RC. In the review process nobody raised that earlier work
AramZS: biggest problem that CH
needs to address is the question of fraud and how it's dealt
with
... some feature detection outside of CH is available....two
interests: how Users may restrict data about themselves, and
how a website may restrict data available to third parties (for
lack of better terminology)
<yoav> https://github.com/WICG/ua-client-hints#spam-filtering-and-bot-detection
jrosewell: 2 different scenarios; fraud detection services based on historic interactions resulting in, e.g., a captcha box...and once you get into a page that's loaded and can call APIs you can learn a lot more for identifying fraud
weiler: I don't think I understand these fraud use-cases
<Zakim> weiler, you wanted to ask Aram to say more re: fraud uses
AramZS: as a publisher we have our own fraud detection issues. A few main concerns: the first is fraudulent visitors. Bots (mostly) or some sort of click-farm operation...don't want to serve them some resources if they aren't legit...e.g. ads or the whole site. 3P fraud; advertisres also want to guard themselves, don't want to serve their ads to bots either. Ad networks don't trust publishers to report fraud information. Those networks need
to be able to make assertions.
<weiler> /me may I interrupt?
AramZS: some cpaability detection, and some are user-agent based. E.g., not on a bot list. Some degree of fingerprinting. Ignoring it's valence, it's being used to identify bots today. Seeing them in one place then blocking them on next encounter. Once detected as bots, a UA and other properties cause those bots to be blocked. If an ad is shown to a bot and then the bot is re-classified, there may be an accounting 'make good' to account for
"illegitimate" impressions
weiler: so you're using "fraud" to mean "bot detection" primarialy?
(sorry, scribe interrupt for 30 seconds)
AramZS: questions of DDoS...a big
problem for smaller publishers who are reliant on 3p
solutions....a redirect to captcha for too many users (dialed
up)
... the lines are helped to be set by, e.g., 3p join-up of a
cookie for a user that has previously passed a captcha
yoav: for fraud detection, linked
previously to the explainer. Trying to include it in the UACH
proposal. In that section, under "fingerprinting", but agree
with you that it can be considere different in kind.
... question: what parts aren't covered in that use-case
section? How can we do it better?
... is there something that's preventing you from accepting
client-hints, either for your own or for 3p origins? Can you
delegate that effectively w/ CH?
jrosewell: AramZS talked about publisher fraud, and ew also see "survey fraud"...folks getting paid a small amount to fill out surveys
Travis_: don't really want to rathole on various sorts of fraud
jrosewell: currently the
language's relationship to privacy budget is unclear. Who gets
to decide?
... very dangerous...who gets to make the decision?
<AramZS> jyasskin: I'm not seeing anything at that link? But yeah, I would love to talk more about that. I played around with it and it looked like it somewhat worked, but wasn't sure
jrosewell: on the first request side of things, if you're making a ping to some environment to get access, there will be a performance impact
<eeeps> jyasskin: AramZS: that delegation is now defined in https://wicg.github.io/client-hints-infrastructure/ (and https://w3c.github.io/webappsec-permissions-policy/)
<Zakim> cpn, you wanted to mention another use case
<AramZS> Ah thank you jyasskin I will examine those.
cpn: just wanted to mention another use-case; similar to jrosewell 's point re: first request. For interactive TV applications, we're targeting non-evergreen environments. Targeting different models and manufacturers of devices to serve javascript that contains workarounds for specific models and devices
<yoav> https://tools.ietf.org/html/draft-davidben-http-client-hint-reliability-01
cpn: looking at CH with some interest to understand if we retain the ability to continue to work around issues
Travis_: jrosewell when you introduced the topic, you mentioned GREASE and analytics...were there other high-level topics you wanted to discuss?
jrosewell: who makes the
decisions. Also, migration strategy. Used in many ways no one
person can understand. Want to see migration done incrementally
over time. Millions of websites are using this feature....if
it's a half-day job for one site, that's millions of
half-days
... the complexity of the new solution; are there alternatives?
Can we tidy up what's there instead?
Travis_: <recaps
topics>
... going back to yoav, did you want to continue on
bot/entropy/fingerprinting?
<AramZS> No need for me to queue for this: but as transitions go, from what I've seen in Canary the switch over does sound fairly reasonable in terms of time request. That said, a version of the rollout where both are simultaneously available with decreasing quality on the old method seems reasonable?
yoav: on first-request, I posted a linke to a CH reliability proposal that will address it (an IETF draft)
Travis_: can someone give us an overview of GREASE?
yoav: I can try
<weiler> [I like the critical-CH proposal.]
<AramZS> weiler: link?
<weiler> https://tools.ietf.org/html/draft-davidben-http-client-hint-reliability-01
yoav: protocols tend to ossify; receivers of protocols tend to rely on defacto existing values. Protocol extensibility is in theory valuable, but ends up being irrenelvant. In TLS, Grease is used to exercise *all* the protocol features/values in order to make sure that clients handle all extensions.
Travis_: so a way to keep implementaitons on their toes?
yoav: a way to keep implementations conformant; hopes to keep protocol from ossifying. In the context of CH, we have seen over the years that various properties/sites rely on the UA in ways that hurt untested browsers. Want to avoid that this time around.
<AramZS> oh this critical-CH proposal is interesting!
yoav: want to make sure that consumers rely on structured headers instead of bad regexes
<MikeSmith> this is https://wicg.github.io/ua-client-hints/#grease I guess
yoav: want ot make sure we don't repeat mistaken abuse the way UA was. Harder to deal with deliberate blocking, but we can relieve a big compat concern if we ensure folks don't shoot users in the foot accidentally
<AramZS> ahhh so many meetings so little time haha, thank you for the link
<MikeSmith> https://tools.ietf.org/html/rfc8701
steveb: working with jrosewell at
51 Degrees...RFC 8701...struggling to understand how this
extends to CH. CH tells you what the browser is (which the site
can use) or it's not (because it's so randomised that it's not
useful)
... trying to say "it's going to contain this information" but
so random that it doesn't
thanks, MikeSmith
jrosewell: if the goal is to
avoid regexes, we put a lot out in OSS in order to avoid
this
... there are way around regexes that are working well
<steveb> The relevant RFC for GREASE in TLS is 8701
yoav: where we're currently using
GREASE in the latest UACH impl in Chromium is to add another
value to the brand version set that browsers send
... that added value includes charachters that ensure a regex
that isn't a conformant SH parser is likely to fail at some
point
... so the value isn't randomised...the itneresting bits of the
value aren't randomised, but to read them you have to use a
conformant parser
... goal is to ensure that implementations aren't aweful
... that it also includes an unknown value is also to help
allow-list known browsers
... has prooven to be a bad practice for web compat
jrosewell: not quite sure I'm understanding how, e.g., "Edge" and your regex is looking for "E", "D", "G", "E" is going to solve the problem of a regex looking for that set of chars
yoav: if that's your regex,
that's indeed a hard problem to solve. Trying to attack
problems that are more complex than that
... not sure if that's a realistic example of a regex?
jrosewell: you've got an experience of a browser being blocked?
yoav: you can have conformant SH
parsing that can result in blocking. We're trying to avoid
folks using naive regex impls for detection of browsers
... that's the reason for motion of delimiters in the
serialised value...if you've got other ideas for how to prevent
that, would appreciate them
weiler: steveb charachterised this as an identifier of the browser...what the browser is...I'd tought of CH as more of a "what the browser can do"....memory, etc. rather than identity. More about capabiliities rather than identity? Can yoav can explain this change in direction?
yoav: UA CH is mostly about
capabilities and user environment and that's how CH
started...it's a content negotiation mechanism...various
aspects used in this negotiation. Some, like device memory,
DPR, viewport width, etc...
... ...there are some CH values for netinfo that tell you about
the network situation. UACH are an extension of that...a
different dimension of content negotiation but relying on the
content negotiation mechanism
steveb: one of the main things
about GREASE is that it's meant to prevent ossification of
protocols...every UA has "KTHML" now...most UA now have
"Chromium"...
... you can see a situation where sites may rely on this
Travis_: does anyone else want to
chime in on utility of UACH in lieu of UA?
... let's talk about migration strategy
... how are editors and implementers considering rolling this
out over time?
yoav: not sure I'm the best person to represent this view -- don't own a large web proprety myself -- but what we had in mind for UACH is to make it available/shipped for a while so that properties can migrate towards it before any sort of information reduction is exercised against the UA string itself
Travis_: are there plans for changes to UAs a well brewing in the background?
yoav: yes. There are plans, but UACH is still being rolled out
jrosewell: I'm relatively new to
all of this...there's an IETF doc that you and a few others
were authoring...experimental stage...
... can you comment on the relationship between these
docs?
... can you talk about document status and how that relates to
mass availability?
<wseltzer> slightlyoff: There is no requirement of formal document status relative to any feature shipping in chromium
<wseltzer> ... governance body, API owners, make decisions about which features launch in our engine
<wseltzer> ... other vendors have similar process
<wseltzer> ... there's no requirement for standards process
<Zakim> weiler, you wanted to answer
weiler: I can give you a rough
approximation at the IETF
... IETF document statuses are more descriptive than
prescritptive....how much of this are we seeing in the
wild?
jrosewell: thanks to Travis_ for
chairing and to the scribe
... thanks to the w3c for arranging the session and thanks to
everyone for discussing...lots to read. Would welcome more
discussion in this forum.
... many issues we only touched on and didn't look into in
detail...access restrictions...what browsers decide..."judge,
jury, executioner"....
... "who gets to do what" quesiton isn't one we touched on
today
... great progress, would like to do this again
<wseltzer> [adjourned]
This is scribe.perl Revision of Date Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/[to keep them from ossifying]// Succeeded: s/RFC <scribe miss>/RFC 8701/ Succeeded: s/steveb/weiler/ Succeeded: s/stages/document statuses/ Present: AramZS hober wseltzer cwilso Travis_ Francois yoav jrosewell slightlyoff Laszlo_Gombos Jemma gendler jyasskin jeff Found ScribeNick: slightlyoff Inferring Scribes: slightlyoff WARNING: No "Topic:" lines found. WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]