<inserted> scribe: chaals
QT: We think there are some
issues with caching policies for server proxies.
... want to talk about problems when the website comes out of a
server cache, and options
(slides)
Issues: cache strategy not clear
to the developer.
... there is no clear agreement on how to control where the
page might be cached.
... And there is no clear specification for the cache time
CMN: Why robots.txt?
QT: Because the scenario is that a proxy also crawled.
(slide showing that... if robots.txt says don't cache, that information gets into the chain and browser is told to get it fresh)
QT: Another propblem - if the
proxy service caches the page, then you lose statistical
information on view.
... there is no way to get the notification of when the user
requested the site.
... (Firefox propose x-forward-for as a basis of a
solution)
... use that header to trigger a pingback from the proxy with
user data when the page is served out of proxy cache
... or use a meta, and have the browser do the same thing.
@@1: So the browser would send additional information?
CMN: If the page loaded uses the user data to generate something, how does that get into the page?
QT: The page can use a script
CMN: OK - but then you don't need a pingback because the script can directly ping e.g. onload.
QT: So can we standardise this for cache control?
CMN: What data is not available to a script?
QT: If we can do it in a browser
then maybe it will work better than page script.
... some website will not use it correctly.
CMN: Looking to see if there is
something more we can do than script. If the website writes the
code wrong, we cannot fix that - they may well handle the
pingback wrong too.
... And if it is blocked for user privacy, the browser will
probably still block the information going out.
QT: Some websites in AMP/MIP do
not show the right URL - it is listed as coming from the amp
server not the original domain.
... we need to show the correct URL for the browser.
CMN: If you use link rel="canonical" in the page, the browser could use that and show the real URL. But is that a problem elsewhere?
@@1: HTTP2 allows for connection coalescing, you can have multiple origins mixed.
QT: shows example of using a query parameter to hold the real URL
XW: Do you have an implementation in the Baidu browser to show the URL issue?
CMN: Same origin issue. But having thingcome from one cache means you can more easily build XSS sine everything is now on the same origin...
<angel> s/everythiong /everything
YW: URL fudging is risky
JohnJansen: so the ask is a way to bypass Same Origin when you are getting stuff from a server cache.
YW: Most of the re-hosted
websites have a CDN provider that have the origin certificates
and can serve it from the origin with whatever
constraints
... the CDN serves the content with the same constraints as the
cache without the origin doing anything, and effectively serve
it from the real origin
QT: Another problem - we load the
page in iframe, and change the URL so it is same origin as the
cache.
... If we don't have the same origin we cannot provide our
transition effects.
... We need to try to find some solutions about the cache
origin policy problem.
YW: CSP on the page?
JJ: Implementing something in the browser to allow magical cross-origin insertion from the CDN seems pretty risky
[@@scribefade]
JJ: So what would the browser do, ideally?
QT: Let the browser pass the
iframe URL to the real URL. Or let the browser provide the
navigation transition script.
... problem is the cache has the wrong URL. We need to do
something to translate that to same origin.
YW: Think this needs to be solved
on the server side. I have an elegant proof of concept but it
doesn't fit in the margin
... and I don't think it is a standards problem.
CMN: There is also Yandex (and I
think others) using a subdomain on the AMP server domain, to be
able to do this.
... it feels like a small-scale standards problem
[Big question seems to be finding a forum...]
YW: Like AMP is not a standard, I can see a solution being developed along similar lines
<angel> https://groups.google.com/forum/#!forum/amphtml-discuss
YW: And solutions that require
protocol or browser changes will break the Web
... you need to figure out how to serve the content from the
origin while applying the cache constraints.
rrasgent, draft minutes
This is scribe.perl Revision: 1.152 of Date: 2017/02/06 11:04:15 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: i/QT:/scribe: chaals Succeeded: s/M<N/MN/ Succeeded: s/everythiong /thing/ FAILED: s/everythiong /everything/ Succeeded: s/@@2/JohnJansen/ Succeeded: s/Qianqing/Qingqian/ Present: JohnJansen Chaals QingqianTao YoavWeiss mattkot XiaoqianWu AnqiLi(Angel) Found Scribe: chaals Inferring ScribeNick: chaals WARNING: No "Topic:" lines found. WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]