<boazsender> :wave:
<gsnedders> RRSAgent: make logs public
<foolip> We're now using this channel to take notes for the TPAC meeting
<boazsender> https://www.w3.org/wiki/TPAC/2017/Testing
<miketaylr> sup
<gsnedders> ScribeNick: boazsender
<miketaylr> boazsender: my issue is that i can't log into the wiki
Foolip: I work for google in sweden. I lead ecosys infra team. trying to make the web platform more predictable. I want to figure out what to do this quarter for wpt.fyi, to change the metrics to incentivize interop. this data is already out there, but the presentation affects behaviour.
kereliuk: I work at google on chromedriver, and how it relates to wpt. I want to talk about figuring out what the state of things we can test are with the current tooling.
jeffcarp: I work on chrome ops. I work on wpt.fyi. Today I want to understand the needs of wpt.fyi.
johnjansen: I work at microsoft on microsoft web driver, and infra for wpt consumption.
scottlow: I work at microsoft as interop PM. interested in learning more about wpt.fyi and getting plugged in.
<gitbot> [13web-platform-tests] 15foolip closed pull request #8089: Clean up click-cancel.html a bit (06master...06click-cancel-cleanup) 02https://github.com/w3c/web-platform-tests/pull/8089
<gitbot> [13web-platform-tests] 15foolip 04deleted 06click-cancel-cleanup at 145732e85: 02https://github.com/w3c/web-platform-tests/commit/5732e85
clmartin: I work as a program manager at microsoft as a program manager webdriver, dev tools, and internal engineering on test infra.
<miketaylr> boazsender: I'm from Bocoup. I have a lot of interest in this meeting.
<miketaylr> boazsender: I'd like to add to the agenda: talk about test discoverability, workflow about figure out where tests should go
<miketaylr> boazsender: test coverage analysis
miketaylr: I work at mozilla on the compat team. I'm here because I think tests are great. I contribute to this project ~4hrs every year. I'm here for emotional support.
<twisniewski> me
twisniewski: I'm here to be a fly on the wall and to get to know everyone intimately, so we can get everyone's tests out of their trees and int wpt.
rbyers: I manage the web platform team in waterloo for google. I want developers to stop saying that the web is hard to build for. The main thing I want to get out of today is what are the biggest bang for the buck we can do on the chrome team such that interop is a natural outgrowth of the engineering discipline we apply to our work.
<Hexcles> me
hexcles: I'm working on chromium import/export for google. I am interested in wpt.fyi. and interested in talking to people about import/export.
gsnedders: I'm a google contractor. I want the web to become interopable by the end of the meeting.
automatedtester: I work on moz. I would like to learn about how mozilla can contribute better. I have a similar goal to rick to making web developers lives better. I see web compat as a bug of the web.
ato: I also work for mozilla. I worked on marrionette and webdriver and gecko driver. My primary goal to day is how to expose a privileged api to help with testing. or can we instrument the web browser to automate these test cases that are hard to automate. what we discuss on this will have a direct impact on BTT.
jgraham: I work at mozilla. I have done lots of the infra for wpt for a while and also some internal mozilla stuff like our two way sync. I would like to kno what concretely we should be concentrating on in the next year to achieve better interop.
Wilhelm: I am wilhelm. I used to work at opera. I am here lurking.
foolip: it looks like there is no one from apple here.
rbyers: its worth pointing out that we have two people at google working on webkit (focus on ios)
jgraham: igalia are also contributing to webkit.
<gsnedders> rniwa and youenn are both apparently coming according to who registered
miketaylr: at mozilla we are the compat team. we look at broken websites. I think testing and promoting interop through wpt is the future.
ato: this is how we shift this from reactive to proactive.
twisniewski: if anyone has any questions about our broken website logging, we're here and we're always on IRC.
foolip: lets try to sure the agenda.
rbyers: can we add to the agenda an update from each browser?
jgraham: +1
foolip: can we front load the 2018 conversation?
rbyers: how do we measure our progress in 2018.
webkit enters.
youenn: I'm youenn working on webkit. I work on SW, and other things. I work on wpt on the weekend.
miketaylr: can you change the name of edgeHTML?
clmartin: no
... i have an agenda item. whats everyone's policy for
contributing tests back?
scottlow: new agenda item. what are people's thoughts on how these tests map back to real world use case. how do we clean up the tests to catch real world problems.
johnjansen: this is about mapping tests to real world problems.
foolip: so failing tests?
johnjansen: I think that fixing those wouldn't move that needle.
jgraham: maybe add this to the agenda.
twisniewski: also proposing clean up.
jgraham: I think we should add this as an agenda item. can we push goals to the end?
youenn: related to that, there is some concern that there may be duplicated tests between webkit src layouttests and wpt. that this may be slow. we would like to improve this.
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-212137cf99a662aba7c4 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-212137cf99a662aba7c4
we also have a breakout tomorrow
rbyers: we'd like to take the consensus of today to the break out tomorrow
foolip: I'm giving a talk about this tomorrow, and the breakout can be an opportunity for qs on that
jgraham: I suggest we start with the status updates
<foolip> https://bit.ly/hackfest-wpt is a talk from which I'll be pasting some links
foolip: so ecosys infra. predictability effort in chromium started two blinkons ago (1.5yrs) ecosys infra is a spinoff of that which focuses mostly on testing.
rbyers: we've done two things - cross cutting to give advice to evry blink engineers, plus a core team working on infra.
foolip: the first thing we did
was to work on two way sync to make sure that chromium
engineers could make changes and land them in wpt. jeffcarp did
that and its working.
... i measured the contributions from chromium for the year
before and the year since and got a 220% increase. this is
because of two way sync.
... people are using it.
... the other half of export is frequent import. we're always
importing (a few times per day)
<MikeSmith> gsnedders, how many more you need?
<gsnedders> MikeSmith: currently we're just at 100%, not yet over 100%
foolip: we import them and run
them through the cq, and we're working on a system for
discovering regressions.
... then we started working on the wpt.fyi dashboard. our goal
is to run it for every commit for as many configurations we
need (16) and do so in 50 minutes. we're not there yet, but
we're on our way.
<gsnedders> MikeSmith: thx!
foolip: when we're making changes
to wpt, we want to know that we're not breaking anything. james
did a stability checker there, and another part we're trying to
sort out now is how to tell why a test is flakey.
... inside of blink again, we've started asking for wpt tests
as part of intent to ship. its not required, but you have to
show us why a test is not a wpt test if its not. just aksing
makes people more likely to write a test as a wpt test.
<rbyers> https://web-confluence.appspot.com
foolip: we also did web api confluence dashboard. what that is doing is looking at the window object in different browser and describing whats there, and then looking at that to help prioritize compat work for browser engineers. and it gives you some things that wpt doesnt.
<rbyers> https://foolip.github.io/day-to-day/
<foolip> https://github.com/foolip/testing-in-standards/blob/master/lifecycle.md
<foolip> https://github.com/foolip/testing-in-standards/blob/master/policy.md
foolip: I've been working on
getting testing to be an integrated part of the standards
process. I recently rediscovered that rbyers was the first to
do this for pointer events.
... recently we started in whatwg to have a test pr with each
spec pr. this worked better than we thought it would.
... the editors doing the work say they are more productive.
and we are finding issues we didn't expect to find.
<foolip> https://foolip.github.io/ecosystem-infra-stats/testpolicy.html
foolip: on this graph that I just
pasted, I am counting the number of specs that have this policy
that we have in whatwg for tests blocking spec prs in
CR+.
... there are about 200 specs in the world that matter. we are
passed 100. they are the most active 100, I think.
... so we can say that a lot more than 50% of the work is
happening this way.
... greg said that all the other wgs besides css that microsoft
is involved in are doing things this way.
... I just got webrtc on board, and web sec is coming. css is
more tricky.
johnjansen: i don
... i know we have agreement from each of the wgs we work with
to do this.
foolip: this is step one of 4 steps in my master plan.
wilhelm: plh is making this a soft requirement for new groups.
<foolip> the 4 steps are in https://github.com/whatwg/html/issues/1849
rbyers: we're going to keep racheting up the pressure in chromium. once we have the automation going, we'll eliminate all unit tests that are not wpt tsts.
foolip: automation is a big deal.
we just landed the bits that allow tests to click themselves.
the model is that wpt is already controlled by webdriver, the
tests will just talk to webdriver and ask it to do
things.
... thats all wrapped up with a testdriver.click() api which
returns a promise.
... we're talking to second screen, and permissions to mach
devices.
... I want to have a solution for all automation problems for
chromium engineers.
... this may not be possible for bluetooth.
... we're doing some work right now to figure out what all the
manual tests are.
... we're working with teams to find their painpoints. we
titied up the css tests.
<dom> Accelerated Shape Detection in Images
<foolip> 2018 OKRs: https://docs.google.com/document/d/1hCOWgU95pN0LaZLdkoJyjlS9A5sjyQQWW6gh8YaDF8o/edit?usp=sharing
foolip: our three 2018 objectives so far are:1) web platform is coherent, web developers are happier.
rbyers: we should add that we did a survey whose results that we cant make public, but it was clear the biggest problem was interop.
foolip: 2) web platform has high
quality shared test suites, and everything about quality,
automation, dedupe, etc, falls under this. basically in 2018, I
want to delete all layoutTests.
... #3 is that shared tests are a first class citizen in
chromium.
clmartin: what do you think about users over webdevelopers?
foolip: its similar to the
relationship between engine devs and web devs.
... the value has to do with users being happier.
jgraham: I think that talking
about users is more difficult.
... e.g. they might not like the web because of 5mb of tracking
on a page, and thats not something we can fix.
... what we can fix is that the websites they visit dont work
in certain browsers.
... we can give them choice over browsers.
zcorpan: another thing that affects users and that I think you can measure is when websites block a certain browser. thats a real problem that all of us have experience, and interop reduces that problem.
jgraham: so thats a special case of the site doesn't work.
johnjansen: from a user perspective, we've discuovered with a new browser in the market. a lot of time we'll see that websites wont look at edge, and will say it doesnt work in edge, but it actually does.
rbyers: I think this still follows for interop. at google our products do this. they think interop is such a problem, that of course we would have to test it.
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-735324 from 14d46dcbb to 142966740: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-735324
<gitbot> 13web-platform-tests/06chromium-export-cl-735324 142966740 15Florent Castelli: Make getUserMedia throw DOMException or OverconstrainedError...
johnjansen: when you go into the js engine or networking stack, there are differences.
rbyers: also, we don't have a
user goal, becuase we dont have an interop problem.
... but I think that FF and edge should measure this.
<kereliuk> me
kereliuk: however far down you go through the pipeline from users, web devs, browser devs, its harder and harder to measure.
johnjansen: we used to not have an interop problem either, that is real to keep in mind. we had 98% market share. google does have an interop problem, but you dont notice because you have such market share. thats what we were doing. at microsoft, we did a hard turn (180 deg on a cruise line). it came down from up top. we decided to build a new browser, forked trident, started implementing chrome prefixes, and the primary goal was to build an
interoperable engine.
<rachelandrew> this has had a review, but I think someone with write access needs to merge it https://github.com/w3c/web-platform-tests/pull/7364#pullrequestreview-74849500
johnjansen: in edge, we use
interop and compat interchangeabley.
... we measure interop with a team of people really using the
internet in all browsers and scoring their experience in 100
websites (rolling 100), and we say we should always be within
1% of success, and then we also measure firefox. if we
experience a gt 1% slip, then a bug gets filed.
... we did a/b testing (edge in ie, ie engine in edge) to see
how we were doing on this, and after a year, we passed this
test.
... so we had to build internal infra to run wpt, because we
are tied to our o
... because we are tied to our OS and IE, and our infra
preexists the WPT infra.
... we also have legacy XML material.
... we have a fork of wpt, we run it on 4 branches of edge. it
takes about 4 hours.
... you can look at how each branch is doing against public
releases, etc.
<gitbot> [13web-platform-tests] 15tomalec opened pull request #8096: Remove tomalec from url/OWNERS (06master...06remove-tomalec-from-url) 02https://github.com/w3c/web-platform-tests/pull/8096
johnjansen: we have an analyzer
that ignores tests known to fail, and prioirizes sudden
regressions.
... we're trying to get our team to push to our internal github
server. this is not happening becuase our internal DRT
(developer regression tests) are easier for our devs to
write.
... we're working to convert these tests, and making better
tools.
... i am able to represent the value to our team. 1) the tests
already exists. 2) I'm running code coverage analysis.
... its tough because we have a lot of history.
rbyers: how will that two step thing work?
johnjansen: we haven't gotten to this yet.
jgraham: if this is some source transformation thing, will these be super weird?
clmartin: we converted our DRTs to the wpt harness
jgraham: lets assume that we find a way to have internal reviews auto merge, and then someone fixes a bug, would you see it?
johnjansen: we're going to do a
two way sync.
... we'll plan to delete the DRT when we get it into wpt.
... the interesting thing we just discovered through code
coverage is that we are getting only minorly more out of DRTs
on top of WPT.
... I'm also looking at bugs logged for this, and I'm finding
way more coming from WPT that we missed through WPT.
... we are very close to having VP approval for changing
individual contributor behavior.
foolip: have you made the WPT path maximally easier?
johnjansen: no.
... its easy. the problem we have is a strongly templated
approach for DRTs. we need to replicate it for WPT.
jgraham: sounds like there could be a feature req for wpt.
johnjansen: yes
rbyers: what about cultural/legal issues with the public nature of this.
clmartin: I think you see less ms
engineers talking on wpt because our engineers are not working
on it
... there is some fear, but its changing.
johnjansen: there is also a cultural extension that ways on us from the past.
<gitbot> [13web-platform-tests] 15foolip closed pull request #8087: Fix a typo (stray [0]) in the RTCCertificate test (06master...06webrtc-certificates-typo) 02https://github.com/w3c/web-platform-tests/pull/8087
<gitbot> [13web-platform-tests] 15foolip 04deleted 06webrtc-certificates-typo at 148337777: 02https://github.com/w3c/web-platform-tests/commit/8337777
tea break
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-ce8ef48f00cb1b034fcc at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-ce8ef48f00cb1b034fcc
<gitbot> [13web-platform-tests] 15annevk closed pull request #8096: Remove tomalec from url/OWNERS (06master...06remove-tomalec-from-url) 02https://github.com/w3c/web-platform-tests/pull/8096
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-dac2cd37d89215cc1ef6 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-dac2cd37d89215cc1ef6
scribe nick: wilhelm_
<zcorpan> ScribeNick: wilhelm
<wilhelm_> jgraham: Mozilla are running wpt, more or less. We have two-way sync, which we've had for a couple of years
<boazsender> ScribeNick: wilhelm_
jgraham: Not as automated as the blink system, working on finalizing automation
<gitbot> [13web-platform-tests] 15bert-github opened pull request #8097: The descriptive text was missing the word "filled" with respect to the reference file (06master...06visibility) 02https://github.com/w3c/web-platform-tests/pull/8097
jgraham: We are working on making
our two-way sync more rapid, because that is hopefully going to
improve the feedback that devs get, prevent conflicts, make the
experience nicer
... I don't have statistics on how much tests we're writing,
but we have some of the problems that other people have
mentioned
... Still culture for not always writing wpt
... Difficultly to learning a new system, some tests can't be
supported. Some solved by click automation
<foolip> https://bit.ly/ecosystem-infra-stats has some stats about contribution origins, and usage for Chromium.
jgraham: I don't know the
ergonomics of that vs. gecko-specific. Easier to write single
browser test.
... Implemented stability testing upstream
... Interal work on stability, for other test types
... Internal infrastructure will catch some problems before we
upstream
... We want to improve: tell people when spec changes, test
start failing
... We want to surface that more obviously. When there is an
import, you'll see a bug somewhere that tests started
failing
... We can be more responive to that kind of thing, dashboard
is helpful
... Discussion on dashboard for each PR
... "We started failing, this all works in Edge"
... We don't run wpt on Android
foolip: We neither
<ato> jgraham: (Would be nice if we hooked that up to mozreview/the new review tool.)
jgraham: (Describing legacy systems blocking some progress)
foolip: What is the strategy for wpt default?
jgraham: I'm less confident that
we can remove browser-specific tests
... I don't know how to test: schedule a GC at a certain time
in a cross-browser way
... We can't write a testing API for such things
foolip: I don't think we can delete layout tests, but we can move 80% of them. Clean separation of webplatform vs internal tests for a good reason
jgraham: If you write a non-wpt
test for web platform things, a justfication is needed
... we have also been doing work directly on upstream, more
impactful. Test stability
... wptrun makes it easier to run, unmaintained testrunner
being phased out
... Goal of making upstream ergonomic
<boazsender> Nell from microsoft and the web vr spec editor just joined us
<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8097: The descriptive text was missing the word "filled" with respect to the reference file (06master...06visibility) 02https://github.com/w3c/web-platform-tests/pull/8097
clmartin: We have an XML file that has an inventory, looking at time budgeting. "You have this much time to run your legacy tests"
AutomatedTester: On the flight
over I looked at one direction, we do have a lotof duplication,
a lot of coverage in mochitest. Look at what is usable, migrate
acrsoss. A lot of work.
... We can't meaningfully automate it over
... The tests should be testing something in a meaningful
way
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-a63feaee753c930c0077 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-a63feaee753c930c0077
AutomatedTester: SW tests have been imported, appreciated not having to write tests
jgraham: We grabbed the SW tests, fixed them, upstreamed them. Now this would not happen, Chrome developers upstream due to better process
AutomatedTester: From Mozilla
point of view, we had a lot fewer mochitests, wpt helped
... Certain bits of WebRTC++ will need specific browser
harnesses
foolip: They are writing more wpt
tests by default, requring test changes for normative
changes
... Other areas are a lot worse than WebRTC
boazsender: For new features, there is momentum for convening on shared test suites. For old tests, there is work to do, an analysis of duplication across vendor test directoires
jgraham: I don't particularly care if we run the same tests twice
boazsender: two reasons: speed, inverse of coverage: easier to reason about if we have coverage
jgraham: From the wpt
perspective, code coverage from mochitests is not
relevant
... If you see coverage on browser specific tests but not wpt,
that is a good candidate from wpt upstreaming
zcorpan: Cost of duplcate test: maintenance, spec change. Not only do we need to find the relevant tests, editing five different tests.. Man-hours
jgraham: You get a spec change, wpt test fails, you fix the code, you see the mochitest failed, you delete the mochitest failed
boazsender: That's awesome, and I think that establishing that for all four teams is the cultural challenge
jgraham: Cultural challenge: if
this test fails, delete it
... "Mochitests run on android"
... Traditionally, we run everything on every platofm. No real
reason for why we don't run it on Android, it's just work.
youenn: We are currently slowly
importing every test suite, planning to import all the test
suites implementing features on
... When it is imported once, we refresh it every 2 weeks, by
doing painful manual uploading to our infrastructure
... Sometimes we see flakiness, handled manually
... People are using wpt more and more, trying to fix
implementation space on this
... We also have wpt serving webkit-specific tests, so we can
use the whole wpt features
... It may be convenient to use testharness.js++, tests will be
easier to upstream
... Now we have a script under review to create wpt PR from
bugzilla patch
... Long-term plan, one review on bugzilla
foolip: Blockers?
youenn: Allocate time to do these scripts, busy implementing features
rbyers: Reach out to us for help
youenn: If you are moving wpt
tests into webkit, future changes will get lost. Working to
upstream more
... When we start modifying bugzilla and bots, we need to work
with the community working on these bots
... INterested in latest webdriver things, to run the manual
tests through wptrunner
... That's the plan
... Hopefully we can advertise these new features, start
thiking about using testharness.js
... wpt is evolving a lot, which is great. Nobdy in Webkit has
time to see what's happening there.. I discovered Webdriver
features by chance last week, was very happy. Clicktest
automation
... Need a way to advertise this to webkit community
foolip: Would an emal summary if what we're doing per quarter be useful?
youenn: Yes
... Nobody in Webkit following wpt closely, notice when things
start breaking
foolip: Another way for visibility: use priority labels
youenn: If there is adoption needed on webkit needed, CC webkit team members
foolip: Who is primary contact?
youenn: At the moment, me
... I'll try to push things internally
... I'm not in a position to delegate...
<gitbot> [13web-platform-tests] 15gsnedders opened pull request #8098: Fix check-layout-th.js script includes (06master...06check-layout-fix) 02https://github.com/w3c/web-platform-tests/pull/8098
foolip: Roadblocks on implementors blocking contributions to wpt
jgraham: We want to hear from MS, Webkit, about what problems they are encountering
JohnJansen: I don't think there
is anything inherently problematic about upsteeaming, we need a
cultural shift that is in the process
... It's down to prioritization
clmartin: General question: automatic upstreaming, should we do GH PR?
ato: Idea is that if it is reviewed internally, just upstream
jgraham: Historically, the policy
has been a linked review
... If we say no, that is a policy change
foolip: Not possible if internal review happens with code review
jgraham: Concern: if people
submit crap ,that is harmful to everyone
... If you link to a code review, you can document that it
happened, audit it
boazsender: can you describe internal code review?
clmartin: In BSO + git, pull
request
... CI
boazsender: Is there a policy for test review?
clmartin: Test review is per team currently, 100x test runs, catch flakiness
boazsender: Combination change of edge + wpt policy that can achieve review goal
foolip: Do you change the source code in the same commit on the same review?
clmartin: Bundled together
foolip: You cannot make wpt
first-class without keeping that
... How much information can we upstream? Name of reviewer?
jgraham: Original policy was a
change from CSS process
... In spirit of current policy: hypothethically, original
reviwer approves PR in Github.
... Can't enforce meaningful comments
foolip: In the spirit of getting Edge contributions: PR, edge export: Who wrote the code, who reviewed it.
twisniewski: Value of review
beyond rubber stamp: What is what we want from the review.
Annotation, link to a spec for why the test exists.
... Even if there is a not a person to reach out to, there is a
reference to a spec
jgraham: We have deliberated
removed this requirement, CSS has it
... It is not enforceable at the same time as we request
upstreaming
twisniewski: We have a problem that we have tests not covered by the spec
gsnedders: Plenty of CSS tests with links, fail for unrelated reasons, not that useful
jgraham: From Edge POV, would this make your job easier?
JohnJansen: No
... Code review not blocking us from contributing. Linking to
spec lines would hurt, make work harder
jgraham: You commit to your DRT tests, code review is not exposed. Proposed change: this person reviewed this
gsnedders: Do a second upstream review, second Edge person rubber-stamps
jgraham: That provides no value
JohnJansen: Model we try to
build: we check in DRTs, written in a way to ease upstreaming,
we submit pull request, do the review on actual PR
... Getting DRT easily upstreamable is our biggest challenge,
not code reviews
jgraham: In Gecko, Chrome, it has been imported that people can use existing workflows, magic upstreaming
zcorpan: If there is an internal review, and that is useful to make public, maybe there could be tooling in your review to export individual comments
JohnJansen: WebDriver tests upstreamed first, because we wanted feedback from other vendors
jgraham: For controversial things this is good
ato: GOod for spec changes + tests
zcorpan: In some cases you want to export individual comments from test reviews
JohnJansen: I don't know how to flag "this needs review from webkit"
jgraham: Labels
ato: Labels could block integration
<gsnedders> RRSAgent: make minutes
jgraham: maybe we should make policy more relaxed
clmartin: (describing policies for keeping track of known failures on import)
JohnJansen: Who would be on point
for bad tests?
... Pretending, hypotetical bad tests from one vendor. How to
follow up?
rbyers: That is the reason for current policy, to have a way to handle that
boazsender: Inbox workflow: to not discourage test authors, auto-landing, backlog of tests
ato: How is this different from PRs with second review?
jgraham: There is an information visibility problem, dashboard is part of this. If you add tests that fail everywhere, that is more visible
JohnJansen: Challenge: vendor sees crap tests from other vendor. Visible feedback
gsnedders: You should see the pull request..
jgraham: It should be possible to see this data from history
JohnJansen: This is not to surface on the dashboard. If MS engineers submit crap tests, there should be a way for MS to see this, learn, escalate
jgraham: If there is a systemic problem from one source, people will notice and complain
JohnJansen: I want to make sure our code reviews are sufficent, good quality tests
ato: Removing tests is different
boazsender: Is there a way for you to tell if bad tests are written in your org?
nell: Does some of the flakiness come from the difference between browser engines? May work fine in one browser, only becomes visible once upstreamed
jgraham: We have tooling to
assess flakiness in major browsers
... Question about bad tests is not just flakiness, also
wrong
gsnedders: Two categories of wrong tests: false negatives and false positives.
jgraham: If some test just passes, but is wrong, it's not a big problem. Irrelevant test?
boazsender: We want to find the
blockers from Edge
... Establishing that workflow.
... Independently of wpt
jgraham: I see that there isn't a
problem. We can adjust the wording of our policy, what you do
internally we can't solve
... If you or another vendor submit bad tests, people will
notice
foolip: You want to do a good
thing, we can accept little information.
... Everything is a PR, problems are not unique to MS
jgraham: Let's talka bout webkit
boazsender: Long-term plan for DRT conversion
<foolip> agenda is at https://www.w3.org/wiki/TPAC/2017/Testing
<gitbot> [13web-platform-tests] 15bert-github opened pull request #8099: Fix to numbers-units-011 (06master...06numbers-units) 02https://github.com/w3c/web-platform-tests/pull/8099
youenn: Concerns about
efficiency, github issue about making test running faster
... Caching, performance improvements would be good
... Easier ways to use tools like wpttestrunner, stability
checks
... From organzational POV, we are fine with how things are
done. Reviewed on webkit, no need for double review
... There is this fear that we have webkit-specific tests,
people will not see why it is this way. Testing a particular
test that is redundant for another browser, but for us it is
not
jgraham: Generic problem.
Skeptical to deleting duplicates for this reason
... Maybe a comment "special case in webkit" would help
foolip: Deduplciation effors must be very careful as to not destroy existing value
jgraham: Hypothetical: code coverage test per test
foolip: We can do that!
boazsender: We can deduplcate webkit, chrome, firefox
<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8098: Fix check-layout-th.js script includes (06master...06check-layout-fix) 02https://github.com/w3c/web-platform-tests/pull/8098
youenn: There are things that are difficult to handle: memory cache. Easy to break the tests. We'd keep those tests internal for now? Maybe just a comment
jgraham: For specific tests, we may need a mechanism for indicating that "this test tests something special in gecko, don't change it without review from Moz engineer"
foolip: We have comments on references to exact bug reports
twisniewski: Some duplicates are just extra tests, may also be useful for other vendors
jgraham: As long as it's not testing browser-specific internals..
foolip: Gecko roadblocks?
jgraham: Covered in status update, not policy things, technical issues that need to be solved, cultural stuff
foolip: For Chromium, automation
seems most important
... Make all people who do standards treat that as a
focus
... Make wpt the default working mode
... Make layout tests smaller, wpt bigger.
... Needs coordination with webkit
youenn: Issue with license of tests?
foolip: If we move layout tests without coordinating with webkit, problems will arise on webkit side
youenn: There are people working on these issues, can ping them
foolip: Maybe licensing thing as well..
youenn: More priority on getting the export things running
foolip: asked layout, big
painpoint is CSS TS is a big mess, gsnedders fixed that.
... Duplication with webkit.. that will be your webkit if we
solve our other problems in the wrong way
youenn: I can try to increase priority on this
foolip: Deduplication must take Chromium + webkit into account at the same time
youenn: Automation: there may be some sharing of GS layer
<gsnedders> testdriver-vendor.js
jgraham: Will you have a way to detect for testdriver-vendor.js if you get a different result upstream, instead of when running in your infrastructure?
foolip: I want to solve this by getting content_shell into wpt
jgraham: I want to avoid situation where chromium engineers make things that depend on vendor-specific APIs
foolip: By noticing the differences, we can sort out most of that
boazsender: Can you explain that more?
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-df5f5d9bc1a4b2f514c1 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-df5f5d9bc1a4b2f514c1
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot created 06chromium-export-cl-733862 (+1 new commit): 02https://github.com/w3c/web-platform-tests/commit/d10bcc029e31
<gitbot> 13web-platform-tests/06chromium-export-cl-733862 14d10bcc0 15Xida Chen: [PaintWorklet] Pass un-rounded size to PaintWorklet...
jgraham: The automation (testdriver.js) works through a common implementation through WebDriver. Not all browsers can run that infrastructure, WD can be slow. Vendors can replace the click method with their internal test API
foolip: content_shell what we use to run our tests, there are some differences
jgraham: If you are running a vendored version, there may be differences to the standard ways
boazsender: Why is it a good idea to write content_shell tests?
jgraham: They don't want to run the full version
foolip: Historical reasons, could
be slower
... We're running everything in content_shell, getting more
stuff upstream is a matter of finding out where to go..
jgraham: Testdriver automation is
specifically made for this use case, with known risk
... "don't ship stuff in Chrome if it doesn't work in the
standard ways.."
zcorpan: Preventing content_shell specific things to leak in, make content_shell specific calls log when they happen
\o/
Resuming at 13:30
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 1484362b2 to 14075802b: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579
<gitbot> 13web-platform-tests/06chromium-export-cl-692579 14075802b 15Tom Anderson: Spell length correctly...
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 14075802b to 1415c4749: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579
<gitbot> 13web-platform-tests/06chromium-export-cl-692579 1415c4749 15Tom Anderson: Spell length correctly...
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 1415c4749 to 143e35a1b: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579
<gitbot> 13web-platform-tests/06chromium-export-cl-692579 143e35a1b 15Tom Anderson: Spell length correctly...
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-753262 from 14539d4a2 to 14d354cb3: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-753262
<gitbot> 13web-platform-tests/06chromium-export-cl-753262 14d354cb3 15Matt Falkenhagen: service worker: Upstream websocket test....
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot closed pull request #7528: Spell length correctly (06master...06chromium-export-cl-692579) 02https://github.com/w3c/web-platform-tests/pull/7528
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-db735ca54c8559c950be at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-db735ca54c8559c950be
<gsnedders> ScribeNick: jeffcarp
reconvening
boazsender: next item: Assessing the compat impact of tests.
JohnJansen: when prioritizing
bugs, EdgeHTML gives high priority to bugs on highly trafficked
websites
... but always fix bugs before triaging test failures - catch
22
... fixing the failure on a live site fixes that site, but
might not cover other API invocations that would be covered in
a test
boazsender: can think of examples of tests written specifically for compat issues
JohnJansen: we see that with <table>s, which is why there's a rejuvenated effort to get <table>s spec'd
<gitbot> [13web-platform-tests] 15gsnedders opened pull request #8101: Fix #8095: give a better error message when elementsFromPoint is unsupported (06master...06testdriver-safari-error) 02https://github.com/w3c/web-platform-tests/pull/8101
JohnJansen: we test interop for the spec, not interop for the web
everyone: what are WPT tests for? should they test the spec or test for the web?
jgraham: we are not focused on tests that test the spec, we hope people upload tests that test the web, even if it does not correspond to a line in the spec
boazsender: would it be useful for us to try to categorize these tests?
jgraham: we want to emphasize that testing to 100% of the spec is not the goal
boazsender: should we have a mission statement / shared set of goals?
foolip: it's a group of many test suites, they might have different goals
zcorpan: the WPT documentation is technical only right now, it doesn't mention organization/test writing goals
jgraham: many interop bugs are from using many web platform features in unison, where do you put those interop tests? which folder in WPT?
boazsender: would it make sense to have a new top level directory for this?
<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8101: Fix #8095: give a better error message when elementsFromPoint is unsupported (06master...06testdriver-safari-error) 02https://github.com/w3c/web-platform-tests/pull/8101
zcorpan: a new directory is the
wrong way to do this
... better to have an out of band way of annotating a specific
test (e.g. this is testing web compat problems)
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-746494 from 14307fc13 to 1410c2dac: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-746494
<gitbot> 13web-platform-tests/06chromium-export-cl-746494 1410c2dac 15Hiroki Nakagawa: Worklet: Disallow Worklets on insecure contexts...
jgraham: one potential approach is to add comments to wpt.fyi
<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-e72fdc711db03e73a427 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-e72fdc711db03e73a427
clmartin: putting in the actual test data is better
tom: if we're making it precise and formalized we'd need to validate
clmartin: we want to use this to find tests that would improve interop
boazsender: what is that number? (of tests?)
JohnJansen: yes, we want
sub-prioritization (not just "interop")
... if we fix 100% of failures in WPT, we don't know if that
will actually help real web devs and users
jgraham: you can never tell that,
you can tell that a specific test fixes a bug in gmail,
e.g.
... a test could be not important at all today, but tomorrow FB
could rely on it
zcorpan: it might be possible to find tests that exercise things websites are doing
<boazsender> here's an example of a mapping between a compat issue and a test: https://github.com/w3c/web-platform-tests/pull/8080
zcorpan: if we get a mapping between browser implementation and tests, we can run through httparchive and see which code paths the tests & the sites are exercising
andreastt: which tests are
important over time changes
... e.g. when FF transitioned from single to multi process, a
different set of tests mattered
miketaylr: one of the difficulties is that websites serve diff. versions to diff. browsers
JohnJansen: we've done the
crawling, but typically the browser isn't going to go down the
bad code path because the web devs use browser detection
... it feels like an unsolvable problem
... this breaks the web
boazsender: recency (what tests most recently went from 0 to 3 passing) might be a good metric
JohnJansen: if only Edge fails a case, why would they upstream it if all other platforms pass?
boazsender: we're working on running tests on every PRs and collect the time series data
jeffcarp: we should talk about using the wpt.fyi archives too
(that last one wasn't boazsender - that was me)
gsnedders: we have 213 issues
& PRs labeled "infra" in WPT
... is there anything infra-wise that's blocking people?
andreastt: what kind of breakage?
jgraham: mozilla has 100Ks of issues open, having issues open in and of itself isn't bad
foolip: ecosystem infra rotation triages bugs and puts them on roadmap
gsnedders: for quite a lot of
test suites
... nobody notices the bugs that are filed against it in
WPT
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-756096 from 14895bb95 to 14fc64769: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-756096
<gitbot> 13web-platform-tests/06chromium-export-cl-756096 14fc64769 15Hiroki Nakagawa: Worklet: Implement exception report path for ThreadedWorklet...
clmartin: so we need more resources there for issue triage?
foolip: the problem to be solved
is that ppl have a bad first experience
... they create a PR and it doesn't get looked at (you have to
know who to ping)
jgraham: we've spent a lot of
time with OWNERS files
... can you make sure people are paid to do it?
... it's not specifically part of anyone's job, and break the
illusion that you can work on WPT from within your own
browser's repository
foolip: is there any part of the test suite that's under good control?
jgraham: in some cases the spec editors are taking control
andreastt: webdriver's doing very well
jgraham: the unique thing about webdriver is that about 80% of the ppl who care about it are in this room
boazsender: this is a major barrier, if you're not working on a web browser, it's difficult to work in WPT
<ato> jeffcarp: Hi, I’m andreastt.
ato: ahh thanks
jgraham: we did Test the Web Forward to teach 100s of ppl to write WPT but got a lot of junk tests
zcorpan: mentoring is
important
... we could say if you contribute a test, you should review
someone else's test
jgraham: that's hard to do at
this level
... people don't have the context to review issues outside
their spec (e.g. grids)
clmartin: the process should be
painless to upstream, but devs should also care about upstream
WPT
... having a shared channel to go to?
jgraham: WPT is unusual as a single repo because it's a conglomeration of 200+ repos
ato: I think we've made great
strides
... the individual contributor experience is so much better
than a few years ago
... for instance, `wpt run`
... it's beneficial for me to be in the OWNERS file, why aren't
others?
jgraham: for every CSS PR there are probably 10 ppl cc'd there
boazsender: there are great
things we can do for spec editors to improve
enfranchisement
... we can build community for spec editors, e.g. in the
context of TPAC
gsnedders: most ppl writing PRs in WPT are 1. web compat ppl who don't work outside their own browser, 2. a few slightly random people, 3. to a lesser degree, spec editors
foolip: have been talking about per-contributor metrics to reward contributors with awesome contributions
zcorpan: not just onboarding ppl making a PR, also onboarding implementors (how to read the spec, etc.)
JohnJansen: in a prev. meeting, we talked about having a rotating "cop" for maintainer duties
foolip: could build some tooling to give us and people we're onboarding a higher level view of things
zcorpan: could have a shared dashboard for what needs doing
foolip: something that tells you what issues need comment
clmartin: something that pings you if a comment is untriaged
boazsender: an intermittent call,
quarterly?
... we are adding more headcount to WPT infra
... we are not a browser but we are increasing velocity and
this will have an impact
<clmartin> 👍
jgraham: there are some things
that need to be fixed, like the docs are out of date and the
website is less than ideal (apologies to gsnedders )
... it's fun to go make a cool dashboard but it might be higher
impact to update documentation
<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8005: Tag master with the PR id that merged at that point. (06master...06tag_master) 02https://github.com/w3c/web-platform-tests/pull/8005
jgraham: there are people paid to work on infra, so writing docs happens, but looking at random PRs doesn't fall within the scope of what ppl are paid for
<gitbot> [13web-platform-tests] 15gsnedders 04deleted 06tag_master at 146357192: 02https://github.com/w3c/web-platform-tests/commit/6357192
foolip: the browser vendors can fix that
jgraham: we don't have many people who prefer technical writing over coding
boazsender: I find that very
exciting (JohnJansen +1s)
... there are a lot of moving parts to the user flow in WPT
<gitbot> [13web-platform-tests] 15jgraham tagged 06merge_pr_8005 at 06master: 02https://github.com/w3c/web-platform-tests/commits/merge_pr_8005
boazsender: relationship between
the WPT website and infra is not obvious
... thinking about the various personas of people who interact
with WPT and laying out the paths we want them to take through
this
jgraham: we've made a lot of
progress
... there have been a lot of improvements in the WPT workflow
in the last year
... e.g. you can run a test in the browser of your choice
... there are other things that haven't been worked on as much,
e.g. documentation
boazsender: I flew to California for jgraham to show me an undocumented CLI argument
jgraham: it's easy for me to justify improving docs & workflow
clmartin: for EdgeHTML's internal tests, if it's not being viewed, it's not being managed - could ping relevant spec editors with a mail with a list of bugs/PRs
jgraham: there may not be general buy-in from the spec editing community that they're responsible for that
foolip: for each directory, there's someone who cares
JohnJansen: how do we see issues by directory?
foolip: sort by labels, there's a label per directory
ato: ... is there a bot that pings stale bugs/PRs?
everyone: no
ato: it would be useful
foolip: what I'd like to do is
start manual
... ping people manually
... then if people are feeling it's repetitive, make a bot for
it
JohnJansen: internally the rotation is a week
ato: one of the problems here is
that most of us have general familiarity with the web platform
but aren't domain experts
... in specific specs
boazsender: we could make spec buddies
foolip: I'll write a script for myself and then if it's useful others can use it
<scribe> ACTION: foolip will write a script
<scribe> ACTION: follow up on github issue about not following up on github issues
foolip: if there are labels that nobody cares about...
boazsender: no on spec buddy
idea? (buddying up with spec editors to help triage bugs)
... if everyone in this room befriends one editor
... everyone send an editor a postcard (with a github
URL)
... follow spec editors on twitter
... I'll find you a spec buddy and follow up in 3 months
foolip: 3 months!
boazsender: one person from each browser?
JohnJansen: testing this, seeing how it works with a small group would be a great way to start
<boazsender> https://docs.google.com/spreadsheets/d/1CP4yv8bcyHZ5JDmQNiP0so-by1zZpndj_n3QsKoRP1I/edit#gid=0
boazsender: click on this link to sign up for a spec buddy
dknox: introductions: I'm a PM on the Web Platform team on Chrome
More introductions: Alex, working on WebKit at Apple
<miketaylr> harald kirschener
two more people: Andrew & Harald
dknox: the idea for metrics is
that we have the WPT dashboard
... we use wpt.fyi to prioritize tests that need fixing, but
it's a fairly generic tool, it just lists pass/fail
... thinking about how we can make the tool more useful
... for a while we were shying away from this because we don't
want to encourage people to view it as a public interop
dashboard
... had idea: have an "interop score" per spec directory /
repo
... could be weighted sum of tests passing in 4/4, 3/4,
etc
... more generally how do we quantify the interop for a
repo?
... also how to help browsers prioritize interop work
... could create an ordered list per browser for
prioritiztion
*prioritization
dknox: main goal is helping
people working on implementations pick low hanging fruit
... main non-goal is making implementors feel pressure
<foolip> WPT Dashboard Interop Stats PRD: https://docs.google.com/document/d/1g22NuZ82RqPUYEOUzihc3quCuWqZ2NcearOGOIYlF20/edit?usp=sharing
dknox: in this meeting we want to find out how to structure metrics to be most useful
jgraham: it seems this is aiding
a manager-y point of view (deciding what people will work
on)
... one disadvantage is that it could create pressure for
people to not submit tests
... another constituency is browser developers who only care
about passing tests in their browser
... seeing if a specific browser passes is still valuable
clmartin: like caniuse with more
data?
... who are our customers?
foolip: this is for bowser
engineers?
... a goal for this is that it doesn't become benchmarking
<foolip> that is what I said, as a claim
dknox: if web developers chose to
use any of this they'd want to use it like caniuse
... but we've been thinking of this only with browser vendors
in mind
boazsender: I think that it will
appear to web developers that this is useful information, but
the information is not in the format in which web devs use
features
... cultural difference
zcorpan: caniuse turns out to be
useful for browser devs as well
... a single spec can have lots of features
... there's already a mapping between caniuse and specs (e.g.
html) (bikeshed does this)
... being able to show a mapping between tests and features to
see interop status could be useful
jgraham: what if we forgot about sharing results?
<gitbot> [13web-platform-tests] 15gsnedders 04force-pushed 06web-platform-test-lint from 14adc03d9 to 149ff5788: 02https://github.com/w3c/web-platform-tests/commits/web-platform-test-lint
<gitbot> 13web-platform-tests/06web-platform-test-lint 14207a923 15Mike West: Add a lint check for 'web-platform.test....
<gitbot> 13web-platform-tests/06web-platform-test-lint 149ff5788 15Geoffrey Sneddon: fixup! Add a lint check for 'web-platform.test.
dknox: things we've been thinking
about most specifically - unsorted list of tests for each
browser where tests are passing in other browsers
... also: a top level view of each directory, with the number
of tests passing in all 4 engines
<foolip> is it like https://github.com/w3c/wptdashboard/issues/83#issuecomment-333371334?
dknox: should we exclude tests
passing nowhere?
... we don't want to impede ppl from adding new tests
... at a basic level, let people see high level interop
scottlow: in that case is it even valuable to show individual test results?
jgraham: it's still useful to see
for implementors
... the data is relatively new, nobody's being pointed at it,
nobody's integrating with it
... it would be useful to pass into a PR/CL a link with test
results and what's expected to change
<zcorpan> ScribeNick: zcorpan
foolip: jeffcarp what are the options?
jeffcarp: who is using this dashboard is the best place to start
jgraham: i can give you a list of
feature requests
... but unrelated to how we present the data
... how we present it is interesting in that we shouldn't be
making it look like Acid4 thing
clmartin: from our perspective, hte current information is something something right hand left hand
boazsender: wpt.fyi shared metric, impacts ...
jgraham: i'm against it being
gameified
... want to be able to point a dev to detailed results
foolip: i think we need to keep
the current presentation
... have both and be able to flip between them
... make it without colors and boring, gray?
... flip side view of the thing with 3/4 4/4 breakdown
jgraham: my use case i care about
test file and below that, not so much per directory
... when you drill down enough you see how well a particular
browser does
... second feature request is same thing but for the changes
for a particular PR
... want to be able to paste a link to show which tests went
from pass to fail
boazsender: we're also working on
this in bocoup
... these are things that people are going to leave this
meeting and do it this quarter
... interop score, on front page, and per test file browser
breakdown...
... distribution and per-browser breakdown
... does that feel right to you edge team?
clmartin: if we can react quickly...
jgraham: i agree
boazsender: curious about reflection from webkit
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-754161 from 143eec5ac to 14ee46f41: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-754161
<gitbot> 13web-platform-tests/06chromium-export-cl-754161 14ee46f41 15Tarun Bansal: NetInfo: web platform tests for the saveData attribute...
alex: regularly fix bugs that
make tests pass...
... at some point which browser pass which tests needs to be
available, shouldn't try to hide it
... side-effects "don't add that test, it will ruin our score",
etc, should try to avoid that
... but this is a public repo ...
AutomatedTester: there can be
cases where people aren't implementing...
... will reflect negatively on a particular feature
... "why aren't you doing web compat?"
... we are but have so many engineers, prioritization...
<gitbot> [13web-platform-tests] 15gsnedders closed pull request #6199: [css-tables] Add tests for visibility: collapse, visibility:hidden (06master...06visibilityAndCollapseTests) 02https://github.com/w3c/web-platform-tests/pull/6199
AutomatedTester: a number might reflect badly
jgraham: if you don't implement a
feature then that's that
... maybe you have judged that there are more important things
to work on
... e.g. methods on prototype instead of instance
... might not be important but affects a lot of tests
gsnedders: it's not important
JohnJansen: another example is
acid3
... we didn't implement the last 3 tests
... got lots of bugs about that
... very difficult to look at something that shows a score and
not treat it as a benchmark
... having a score inherently is a notification that we should
try to avoid
... focus on things supported by 4/4 engines
jgraham: acid3 the process was
(censored)
... we have intentionally not repeated that
boazsender: the thrust of this
effort is to get data between browser teams
... if not drilldown, then how?
JohnJansen: i use the tools
... wpt run
clmartin: i know which tests edge fails that other browsers are passing
jgraham: we don't have that internally, if we have a tool it would be public
JohnJansen: we don't need this so biased against it
boazsender: a new engine engage
in this workflow
... to the extent that we can move your workflow to interact
with the material, would make the web a service
dknox: having ??? already be there fresh can put themselves into that pipeline
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot closed pull request #8059: NetInfo: web platform tests for the saveData attribute (06master...06chromium-export-cl-754161) 02https://github.com/w3c/web-platform-tests/pull/8059
boazsender: once a month or so, people who are not in the standards process already can use the tools
JohnJansen: different
conversation
... brave browser comes along, they don't want the drill ins,
they want the 4/4 from the top
... DOM is 100% interop (LOL)
... you don't care about the tests are failing in edge, ...
<gitbot> [13web-platform-tests] 15jgraham tagged 06merge_pr_8059 at 06master: 02https://github.com/w3c/web-platform-tests/commits/merge_pr_8059
jgraham: it'd be useful to have someone from servo
clmartin: drew had a suggestion, what if drilling in just showed data per data, but not a score
dknox: one thing i'm hearing,
feels like there's high-level agreement about what we want to
do
... james submitted a suggestion a while back ???
... we have 4 browsers with test results, hasn't done
anything
... it seems like we're scarred, which is real, but how about
we try something and be ready to pull back quickly
... just have a combined score, and everything else stays the
same
AutomatedTester: implementation
reports
... people do want drilldowns there
... should that be separate?
... for spec process point of view
gsnedders: typically out of
scope
... spoke to plh about this
... not so useful except for ticking boxes for the
process
... grabs all the results from the dashboard so you don't need
to run the tests again
boazsender: that data is available
jgraham: you have to edit the url carefully etc, not trivial
foolip: how are we going to present it
alex: why are we trying to remove
information
... shouldn't we embrace that competition improves the web
foolip: not going to remove it, but not default view
JohnJansen: terrible to be conformant to css2.1 but at the same time breaking the web
jgraham: want to avoid having
people game the system by contributing tests that pass in their
browser and fail everywhere else
... don't want it to be a thing you have to run by the
marketing department
... not saying the data for browser developers will still be
there....
... yes that will still be there
... dishonest because the tests are incomplete
... in 2dcontext tests, chrome passes some number of tests,
firefox passes some other number of tests, that's useless
info
... conclusion, try to make incremental changes
foolip: the interop view, will be there in a few weeks
dknox: there's general agreement,
we'll try something, we'll push that out and see how it goes,
can try something else, can discuss that
... need to try somehting concrete
foolip: are things percentages or numbers?
<gitbot> [13web-platform-tests] 15gsnedders pushed 1 new commit to 06web-platform-test-lint: 02https://github.com/w3c/web-platform-tests/commit/b269bdf85ba4886c3ccd44ba35ec4abcde1e7bde
<gitbot> 13web-platform-tests/06web-platform-test-lint 14b269bdf 15Geoffrey Sneddon: fixup! fixup! Add a lint check for 'web-platform.test.
jgraham: this discussion isn't going to resolve
foolip: look at mockups after break
boazsender: what do we take for
you, for edge to use these tools?
... wpt results reporting
JohnJansen: we wouldn't
boazsender: so why are we making this
JohnJansen: for use, the external
tool to help us,
... we don't like the idea of a score card, even at per-test
level
boazsender: data sharing
... how about having shared harvesting of data?
... if we use the same tools, it becomes better
JohnJansen: it's an interesting conversation
<break>
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-594890 from 1437e3b02 to 148355796: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-594890
<gitbot> 13web-platform-tests/06chromium-export-cl-594890 148355796 15Benjamin C. Wiley Sittler: Async Cookies API: First layout tests....
<Wrs2> Hi
<scribe> ScribeNick: jeffcarp
foolip: let's talk about metrics for 10 more minutes
https://github.com/w3c/wptdashboard/issues/83#issuecomment-333371334
<foolip> https://github.com/w3c/wptdashboard/issues/83
boazsender: also talking about data sharing
foolip: looking at wpt.fyi
... options - we can either have a score or have columns that
show 4/4, 3/4, 2/4, etc
jgraham: we don't know how to
compute a meaningful score
... you start focusing on the metrics, not on the higher
goal
... since we don't have an idea what the right meaningful score
is, we should avoid scores
ato: it also focuses on the purpose of the project - interoperability
foolip: going from 3/4 to 4/4 most important metric?
jgraham: we can always imagine edge cases where going from 3/4 to 4/4 is not super useful (or really useful?)
JohnJansen: a challenge is that the denominator is always different
jgraham: to fix that the plan is to get a union of all subtests run
boazsender: we would like to get to a place where the denominator was the same
foolip: this is on our roadmap
for this quarter
... NOTRUN is a status a test can have
gsnedders: I think percentages is more helpful
(we're talking about the 4/4, 3/4 columns design)
foolip: it becomes a spec to spec comparison, not a browser to browser comparison
dknox: we're thinking about this
as a tool to aid thinking, not replace thinking
... gives you places to start and dig in, whereas now the
problem is intractable
twisniewski: is this 4/4 agree? or 4/4 pass?
my bad, thx
ato: especially for webdriver,
we
... have a lot of tests that no browser implements
<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-594890 from 148355796 to 14da48eca: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-594890
<gitbot> 13web-platform-tests/06chromium-export-cl-594890 14da48eca 15Benjamin C. Wiley Sittler: Async Cookies API: First layout tests....
foolip: the browser scores will be the way they are, or more boring
everyone: more boring!
<foolip> ato points out that 0/4 is also needed
(we begin to debate the hex values for each color on the dashboard)
foolip: please submit ideas for designs to public-test-infra please
mdittmer in Waterloo is implementing
(the color debate will be conducted by raising hands, we'll be doing a binary search, those in favor of #000-#777 raise your hand)
gsnedders: we have a variety of
harder cases, e.g. geolocation
... on the other hand, webbluetooth has an entire
protocol
... proposal is to do simple cases of webdriver, only test JS
APIs
... behind flag for complex cases
(The topic is Automating manual tests)
<gsnedders> RRSAgent: make minutes
ato: there's been talk about
adding duplex communication to webdriver
... talking about getusermedia
... mentioned on mailing list, gsnedders's code is very cool -
if every browser is implementing the webdriver api, why don't
we just expose this as a privileged web api?
jgraham: what is the question?
<gsnedders> RRSAgent: make minutes
boazsender: I wonder how much
having a synthetic interface will really exercise the
platform
... how much should be done in the Webdriver spec vs. how much
should be done by each feature?
<gsnedders> RRSAgent: make minutes
jgraham: we've always wanted
things to be in webdriver
... one of the things we've talked about having more compact
impact - easier testing of websites in mobile browsers
... web developers can't (/won't) test in content shell
foolip: the question about
permissions was whether it'd be part of the WD spec or an
extension to WD
... it'd be easier for implementors to write it in their
spec
gsnedders: what I wanted in this
agenda item was whether ppl are happy with where the line is
drawn
... thought we can do everything through WD
jgraham: a lot of specs we don't
know how to deal with in a generic way
... we should have conversations with those specs
... for certain APIs we end up with browsers specific flags,
like webrtc and webvr
foolip: given we have gsnedders's
thing, if you can use WD, do it in your own spec and wrap it in
testdriver.js
... if it's deeper than that, maybe go the way of webusb
jgraham: that's a discussion to have with ppl working on specific features
JohnJansen: adding to the WD spec?
foolip: the extension spec
jgraham: the point of testdriver is that it works across browsers
JohnJansen: we'll talk more in WD
wg on Thurs
... webvr is a good example of this
NellWaliczek: the challenge is that each browser has implemented webvr on a different VR platform stack
<foolip> https://github.com/w3c/web-platform-tests/pull/5535 is the PR being discussed
NellWaliczek: how can we allow
the tools to make these work across browsers?
... we'd love to have someone join our conversation for how to
approach these problems
jgraham: should we be advertising
"if you have special testing requirements, meet with us" to
working groups?
... I don't think we've solved testing on webvr for anything
yet
NellWaliczek: e.g. if we're doing tests about moving the headset
jgraham: how do you test vr apps?
NellWaliczek: finding ways to
prevent web developers from needing to buy every single piece
of hardware to test their app
... we're not there yet for tooling, but we're on the verge of
having these great tools for writing VR tests
... going to start VR testing as a service
... would be great if from the beginning we could set best
testing practices
foolip: the PR mentioned above is very similar to the current issue in getusermedia
<zcorpan> https://www.w3.org/wiki/TPAC/2017/Testing
breakout session will be on the wiki ^^
jgraham: we're going to talk
about fuzzy reftests
... it is not always easy or possible to write pixel perfect
reftests
... e.g. sometimes GPU doesn't give you the same antialiasing
every time you run it
... FF has implemented a system for fuzzy reftests
... WPT may need to implement this (this year)
Hexcles: when Chromium runs tests in content shell, it disables a bunch of AA
rbyers: (we avoid the problem by not testing what we're shipping)
gsnedders: Chrome also avoids the problem by not using GPU in the CQ
jgraham: for Gecko reftests, the
default config is no GPU
... there's a subset that people run on GPU
clmartin: we have a lot of tests in this area
gsnedders: understanding for what Edge does is that fuzziness is allowed within a given rectangle
JohnJansen: that's correct
<gsnedders> I heard of a case where Servo had issues <span>foo</span> <span>bar</span> and "foo bar" rendering differently
JohnJansen: would using Ahem help?
twisniewski: it wouldn't fix it
100%
... there could be a way to do the fuzzing in a more sane
manner if we have more control over the tests
boazsender: demonstrating how the scrollable behavior in an input is rendered differently and impossible to assert in javascript
<zcorpan> (that should then be a separate test)
<zcorpan> (imo)
jgraham: there are clearly cases where moving to Ahem won't fix certain GPU problems
foolip: in WebKit and Chromium
there are pixel tests, for anything that's fuzzy we generate
per-platform baselines
... we could get into generating png baselines for wpt.fyi
jgraham: where would we store those? we wouldn't run that in FF infra
foolip: you'd catch accidental
changes to the test
... we can't do just pixel tests
... for the shared infra, fuzzy tests seems more palatable
Hexcles: I don't think fuzzy matching has anything to do with pixel tests
clmartin: should baselines be the responsibility of vendors?
<gitbot> [13web-platform-tests] 15foolip created 06oops-RTCDtlsFingerprint (+1 new commit): 02https://github.com/w3c/web-platform-tests/commit/8c5d6017f9d9
<gitbot> 13web-platform-tests/06oops-RTCDtlsFingerprint 148c5d601 15Philip Jägenstedt: Fix a instanceof DictionaryType bug in RTCCertificate.html
<gitbot> [13web-platform-tests] 15foolip opened pull request #8102: Fix a instanceof DictionaryType bug in RTCCertificate.html (06master...06oops-RTCDtlsFingerprint) 02https://github.com/w3c/web-platform-tests/pull/8102
boazsender: another example of
this is audio
... the webaudio wg was talking about testing agenda, talking
about how to deal with variation in sample amplitude in
ArrayBuffer, there's a reasonable amount of difference between
platforms in the way sound works
JohnJansen: image comparison tests are always really flaky
rniwa: sometimes the browser
isn't flaky, it's the test that written flaky
... e.g. for gradients, it can be drawn differently each time,
but the test should not be asserting it's exactly the
same
... in the case of AA, it is expected that AA result will be
different between each time
JohnJansen: allowing e.g. 10 pixels can be different will make you miss failures
miketaylr: the threshold should be on a per-test level
jgraham: that might make certain tests impossible
<ato> -s/miketaylr/wilhelm/
jgraham: the second possibility
is that you could have a small set of parameters
... could have some masking that allows specific regions to
flake
JohnJansen: I think the first thing is to use Ahem, a non-AA font for non-font tests
jgraham: there are not many objections to some sort of global parameterized things
zcorpan: one objection is that it misses bugs - on the flip side, if you don't have fuzziness, you're not able to write a test that tests one bug and not two bugs
JohnJansen: if you're having two
bugs causing one failure your tests are too general
... you're asking the test author to determine fuziness, but I
think it's on the browser vendor
jgraham: you could have a default fuzziness level and let vendors override it
zcorpan: make it a boolean and make vendors define it
rniwa: if a test is fuzzy in only one browser, do you mark it fuzzy in all browsers?
JohnJansen: how would that work in wpt.fyi
jgraham: could use fuzzy matching
zcorpan: wpt.fyi could store those baselines
jgraham: should we in the test encode params for diff. browsers? no
foolip: are we going to do fuzzy reftests?
jgraham: don't have specific plans to work on it, but there is some internal expectation at Mozilla that it should start working at some point (mayb e2018)
rniwa: wpt.fyi could run a failed test with fuzzy matching and suggest that the test enable fuzzy matching
<scribe> New agenda item: Test discoverability and coverage
boazsender: how to onboard new contributors
zcorpan: another way of having
test metadata is having it be part of the relevant spec
... sometimes tests fall through the cracks and browser devs
only notice when they implement the change
rniwa: I've had many cases where a change to a dom api will affect unrelated tests
Alex: important thing is that we have simple tests that don't use 10000 features
<Hexcles> RRSAgent: make minutes
foolip: could experiment with putting metadata in specs themselves
JohnJansen: CSS wg uses bikeshed for this and it feels very fragile
clmartin: a comment on the top of the file is unmaintainable?
foolip: it's not useful enough
rniwa: in the perf wg in some of the prefetch cases, missing tests with resource types that don't work with CORS
<foolip> https://github.com/tabatkins/bikeshed/issues/1116 is the Bikeshed feature
twisniewski: want to make sure when someone imports a test, the intention of that test is clear
JohnJansen: there are some whatwg tests that should be ported or deduplicated?
jgraham: for some tests you can just write the JS file and it wraps it for you
<JohnJansen> https://wpt.fyi/dom/abort
jgraham: but we shouldn't put
extra requirements on every test
... .worker.js - runs test in a worker
<Hexcles> http://web-platform-tests.org/writing-tests/testharness.html"Auto-generated test boilerplate"
jgraham: .window.js - runs test in a window
<foolip> JohnJansen: it's implemented in https://github.com/w3c/web-platform-tests/blob/master/tools/serve/serve.py
<foolip> in particular https://github.com/w3c/web-platform-tests/blob/master/tools/serve/serve.py#L131
jgraham: some tests need to write bytes to a socket, e.g.
rniwa: the current requirement of using wptserve is painful because it can be slow
jgraham: random WPT developers
are probably not going to add metadata to their tests about
whether it needs wptserve
... but you could probably get a reasonable approximation using
grep
rniwa: (concerns about adding number of necessary processes for using wptserve)
jgraham: wptserve could be faster but not multiple times faster, more like percentages faster
boazsender: wanted to talk about shared goals for WPT project, but might have hit the limit for the day
Adjourning for the day
<gsnedders> RRSAgent: make minutes
This is scribe.perl Revision: 1.152 of Date: 2017/02/06 11:04:15 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: i/I work for google in sweden/Topic: Intros Succeeded: s/dicsipline/discipline/ Succeeded: s/gsnedders:/gsnedders,/ Succeeded: s/??/nell/ Succeeded: s/???/content_shell/ Succeeded: s/expain/explain/ Succeeded: s/13:00/13:30/ Succeeded: s/boazsender: we should talk about using the wpt.fyi archives too/jeffcarp: we should talk about using the wpt.fyi archives too/ Succeeded: s/jgraham/foolip/ Succeeded: s/jgraham/foolip/ Succeeded: s/???/a new engine/ Succeeded: s/miketaylr/twisniewski/ Succeeded: s/we have a variety of harder cases/Topic: Automating manual tests/ Succeeded: s/Topic: Automating manual tests/we have a variety of harder cases/ Succeeded: i/we have a variety of harder cases/Topic: Automating manual tests Succeeded: s/gsnedders/jgraham/ Succeeded: i/going to talk about fuzzy reftests/Topic: Fuzzy reftests Succeeded: i/Test discoverability and coverage/Topic: Test discoverability and coverage Succeeded: s/have been ported into WPT/should be ported or deduplicated?/ Succeeded: s/gsnedders/jgraham/ Present: zcorpan WARNING: Fewer than 3 people found for Present list! Found ScribeNick: boazsender Found ScribeNick: wilhelm WARNING: No scribe lines found matching ScribeNick pattern: <wilhelm> ... Found ScribeNick: wilhelm_ Found ScribeNick: jeffcarp Found ScribeNick: zcorpan Found ScribeNick: jeffcarp Inferring Scribes: boazsender, wilhelm, wilhelm_, jeffcarp, zcorpan Scribes: boazsender, wilhelm, wilhelm_, jeffcarp, zcorpan ScribeNicks: boazsender, wilhelm, wilhelm_, jeffcarp, zcorpan Agenda: https://www.w3.org/wiki/TPAC/2017/Testing WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: follow foolip WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]