IRC log of aria-at on 2023-06-05

Timestamps are in UTC.

18:59:41 [RRSAgent]: RRSAgent has joined #aria-at
18:59:46 [RRSAgent]: logging to https://www.w3.org/2023/06/05-aria-at-irc
18:59:46 [Zakim]: RRSAgent, make logs Public
18:59:47 [Zakim]: please title this meeting ("meeting: ..."), jugglinmike
19:00:05 [jugglinmike]: meeting: Bi-Weekly Meeting of Assistive Technology Automation Subgroup of ARIA-AT Community Group
19:00:16 [jugglinmike]: present+ jugglinmike
19:03:06 [mmoss]: mmoss has joined #aria-at
19:05:47 [Sam_Shaw]: Sam_Shaw has joined #aria-at
19:05:47 [Sam_Shaw]: present+
19:06:01 [Sam_Shaw]: scribe+
19:06:36 [Sam_Shaw]: present+ James Scholes
19:06:49 [Matt_King]: Matt_King has joined #aria-at
19:06:58 [Matt_King]: present+
19:07:29 [Sam_Shaw]: Agenda: https://www.w3.org/events/meetings/cff6ac42-bdc2-4a38-8993-7dd46e507741/20230605T120000
19:08:51 [Sam_Shaw]: TOPIC: w3c/aria-at - #945 - Rethinking the wording for assertion verdicts
19:09:02 [Sam_Shaw]: https://github.com/w3c/aria-at/issues/945
19:10:18 [Sam_Shaw]: jugglinmike: The working mode doesn't have the term verdict yet, but its one we intend to add.
19:10:45 [Sam_Shaw]: jugglinmike: The working mode refers to verdicts as supported, not supported etc
19:11:20 [Sam_Shaw]: jugglinmike: automation refers to the verdicts as acceptable, omitted, contradictory
19:11:32 [Sam_Shaw]: jugglinmike: I have a proposal for a new set of terms
19:12:05 [Sam_Shaw]: correction: automation refers to the verdicts as good output, no output, incorrect output
19:12:45 [Sam_Shaw]: the proposed new terms are acceptable, omitted, contradictory
19:15:21 [Sam_Shaw]: js: I like the new terms you proposed. In terms of bubbling up the results, I wonder if no support, partial support, supported is clearer
19:15:30 [Sam_Shaw]: MK: Thats why I wanted to use numbers
19:15:58 [Sam_Shaw]: MK: Partial support could mean anything between a little support to almost fully supported
19:16:27 [Sam_Shaw]: JS: I agree but if something is 90% supported, the remaining 10% could still make it unusable
19:18:27 [Sam_Shaw]: MK: I agree, unless we have multiple layers of assertions we don't need numbers. We also want to be diplomatic
19:19:45 [mmoss]: present+
19:20:26 [Sam_Shaw]: MK: I think for your solution is pretty solid
19:21:01 [Sam_Shaw]: MK: We just need to decide if we extend the use of these terms, or bubble them up
19:22:18 [Sam_Shaw]: jugglinmike: Yes bubbling up we need to consider, the case where a feature is all supported except one, its not supported. For verdicts that can be in three states, understanding why its partially supported is tough. I'm not sure if bubbling can work if we are looking for a percent score
19:22:25 [Sam_Shaw]: MK: Yeah supported needs to be binary
19:24:32 [Sam_Shaw]: JS: I think we need all three states
19:25:21 [Sam_Shaw]: MK: What do the responses tell us? Either there is some support there or there isn't. Then the reasons is because someone tried, or someone didn't try to support
19:25:51 [Sam_Shaw]: MK: If you measuring something using a percentage, then it needs to be binary
19:26:53 [Sam_Shaw]: JS: for the reports, are there three levels to two of support?
19:27:04 [Sam_Shaw]: MK: Any level of support beyond assertion is a percentage.
19:27:23 [Sam_Shaw]: MK: At the AT level, the test level, at the AT level, all will be a percentage
19:28:12 [Sam_Shaw]: MK: So we would say, using Mikes terminology, At the assertion level if the response is omitted or contradictory then that counts as a 0. If its acceptable then it counts as a 1.
19:28:41 [Sam_Shaw]: MK: We could do other reports we could run that say what percent is contradictory, which percent is omitted
19:29:04 [Sam_Shaw]: MK: I don't know that we need to bubble up these terms in the reports we have now
19:29:35 [Sam_Shaw]: MK: We don't need terms for working mode, its just level of support
19:29:54 [Sam_Shaw]: jugglinmike: I do think the working mode uses supported not supported.
19:29:59 [Sam_Shaw]: MK: I can get rid of that
19:31:01 [Sam_Shaw]: MK: I have some other issues for the working mode, particularly 950, I think we need to work on another iteration of the working mode and share it with the community
19:35:24 [Sam_Shaw]: MK: We could have a binary state for assertions, and get rid of contradictory
19:35:33 [Sam_Shaw]: JS: I agree, but we should rewrite the terms
19:35:48 [Sam_Shaw]: JS: Lets add this to the agenda for the CG meeting thursday
19:36:59 [Sam_Shaw]: jugglinmike: What I'm hearing is, we like the terms I proposed, but we may not need three terms
19:37:18 [Sam_Shaw]: JS: It will make the testing easier if we just have two states/terms
19:38:36 [Sam_Shaw]: MK: Okay but if this task isn't on the critical path, I want to be conscious of that
19:38:43 [Sam_Shaw]: JS: This could speed up the process
19:39:00 [Sam_Shaw]: MK: But its not a blocker, we can talk about enhancements in the near future
19:39:34 [Sam_Shaw]: Michael Fairchild: Is there a third state where we publish a report with some of the data missing?
19:40:09 [Sam_Shaw]: JS: No, really, but we need to consider this.
19:40:42 [Sam_Shaw]: JS: If there is a situation where only 50% of test have been completed, what does that look like for a percent supported?
19:41:19 [Sam_Shaw]: MK: We made a decision to change the working mode, and to get rid of the three output terms
19:42:02 [Sam_Shaw]: MK: The question before we change the UI, is do we go from 3 to 2 states? Acceptable, not, contradictory
19:43:08 [Sam_Shaw]: TOPIC: w3c/aria-at - #946 - Disambiguating 'Test Plan Run' in the Working Mode
19:43:27 [Sam_Shaw]: MK: I'll comment on this issue and we can move it forward outside this meeting
19:43:36 [Sam_Shaw]: TOPIC: Review of rationale for omitting explicit references to automation from the Working Mode - we touched on this during the 2023-05-22 meeting and agreed that Matt's perspective was critical
19:44:24 [Sam_Shaw]: jugglinmike: This came up two weeks ago. We were talking about my task of describing automation how it layers on to the working mode, but the working modes doesn't describe automation
19:44:39 [Sam_Shaw]: jugglinmike: James was not convinced of the utility of organizing our work that way
19:44:59 [Sam_Shaw]: jugglinmike: As this has been a theme of my work, I want to make sure we are aligned on our direction
19:45:10 [Sam_Shaw]: JS: Yes I wasn't sure what we were trying to achieve.
19:46:22 [Sam_Shaw]: JS: For our tests, it doesn't matter if the responses are entered by a human or machine. But the results may need to be checked from a human. We are a long way away from automation checks responses and interprets them and providing their own verdicts
19:46:43 [Sam_Shaw]: JS: Even if we get to that point, in many many years, I still think its valuable to have a human check the responses.
19:47:19 [Sam_Shaw]: JS: The automation may be able to say a response is unexpected, but it wont be able to categorize how its unexpected
19:48:28 [Sam_Shaw]: MK: I asked Boaz about abstracting the working mode. I want to make sure the working mode states here are how the business things work. Its the process for generating the spec, but its not a operations manual
19:49:01 [Sam_Shaw]: MK: I think that there are some things about how the group currently uses the app that can be written into documentation.
19:49:52 [Sam_Shaw]: MK: I think later on we can decide what a human does, or a machine does, but is outside the set of principles of the work
19:50:47 [Sam_Shaw]: JS: That makes sense, but lets make that very clear. For someone new to the project, we want them to understand both angles, not just one dry article that describes who does what
19:50:59 [Sam_Shaw]: MK: I still think we should get the roles out there
19:52:17 [Sam_Shaw]: MK: The working mode does need to specify who does what "Directors need to approve this" The scope of the working mode needs to include scope of authority
19:53:14 [Sam_Shaw]: JS: Okay I agree. The work that happens day to day is more practical, how the app works, what it does well, etc. I do think there is a disconnect between the working mode and how we actually do things. This is partially what Mike is bringing up.
19:53:33 [Sam_Shaw]: JS: We need a document that outlines governance, and another document that defines how we work
19:54:02 [Sam_Shaw]: JS: The governance document is more abstract, and you can go directly to a implementation. There needs to be a step between
19:54:54 [Sam_Shaw]: MK: I agree, we are slowly building towards this. The wiki work I recently did to describe how we write tests, how we onboard people. We dont have much in the way of app documentation
19:57:05 [Sam_Shaw]: jugglinmike: There is one thing that comes to mind that is fundamental the work Im doing. When we talk about roles, who is responsible for intitating automation? I've been assuming thats a test admins job, if that's the case then we have to talk about what the test admin is doing. Theres another framing however that changes what we build, which is the testers responsibility, can matt assign louis some tests and then louis runs that automatio[CUT]
19:57:27 [Sam_Shaw]: MK: Its features design, we can say it both ways.
19:58:18 [Sam_Shaw]: MK: We could make a feature where a tester uses AT to generate a response, and then adjusts it to be correct, and submits it as part of a manual test
19:58:55 [Sam_Shaw]: MK: So right now I believe we said our MVP for Automation is, somebody, we didn't say who, is for a test plan run can we collect responses
19:59:20 [Sam_Shaw]: MK: The automation will know what AT to spin up, what the tests are, and run them
19:59:56 [Sam_Shaw]: MK: We can add to that, MVP Prime, if any of the responses, if there is a previous run of that same plan, if any differences exist flag those
20:00:44 [Sam_Shaw]: MK: Thats so we can identify regression, If a new version of chrome comes out, automation can recheck everything and say yep its still supported
20:01:18 [Sam_Shaw]: jugglinmike: so for the short term for me, Can I propose a change to the working mode that would capture a test admin to collect AT Responses?
20:01:23 [Sam_Shaw]: MK: I don't think we need that
20:02:00 [Sam_Shaw]: jugglinmike: right now the working mode just describes running the tests. We need to split up running tests and assigning verdicts. And we need to define the actors who will do these thigns
20:02:23 [Sam_Shaw]: JS: The more we abstract these details, the more it becomes vague.
20:02:42 [Sam_Shaw]: JS: If were saying this level of detail needs to go into another document, thats something else
20:03:40 [Sam_Shaw]: MK: So test admin can run tests, but there nothing in the working mode that says what a tester needs to do to run a test. If the first thing they do is press a button and AT runs the test.
20:04:25 [Sam_Shaw]: MK: The working mode doesn't care what buttons to press or what the scope of the test is. Running a test can be, I ran a test and got the same results as the AT. We can write a manual to describe that process, which is what we do now.
20:04:47 [Sam_Shaw]: jugglinmike: So what we are saying is there is no change to the working mode.
20:04:56 [Sam_Shaw]: MK: Yes I don't see a need to change.
20:06:08 [Sam_Shaw]: MK: The working mode says the goal of the work is, make judgements about a test, how are the screen readers behaving, acceptable or not? That is the role of the tester. The test admin role is to make sure they agree with what the testers are doing, and resolve when there are conflicts. The working mode doesn't say what buttons to press or how many characters to enter
20:06:32 [Sam_Shaw]: jugglinmike: So should we give running Automation to just test admin, or to everyone?
20:06:42 [Sam_Shaw]: MK: What ever you think is better and faster?
20:06:51 [Sam_Shaw]: JS: I don't think human testers need to be involved in that
20:07:43 [Sam_Shaw]: JS: The pattern we follow now, granted testers assign themselves to tests, but for the most part, we gather info on who is willing to do what tests, then we assign the tests and work to resolve conflicts. The test admin is the gatekeeper to make sure everything stays on track
20:08:12 [Sam_Shaw]: JS: We dont want people assigning to things we not ready to review
20:09:03 [Sam_Shaw]: JS: I see automation in a similar light, once in place it may make this easier. The more we use the system the more it may know what we want, but there still is a manual element of having humans run tests and review conflicts.
20:10:31 [Sam_Shaw]: MK: I'm good with a conservative approach, we should roll out the smallest, simplest, least risky/most useful approach. Lets not give to much power to everyone day 1
20:11:42 [Sam_Shaw]: jugglinmike: I'm envisioning, the test admin can see who has been assigned to a test plan, but now they have a new ability to say collect new responses for this tester.
20:12:00 [Sam_Shaw]: jugglinmike: As responses came in they would be entered in the correct places.
20:12:19 [Sam_Shaw]: jugglinmike: If we make space for the system to have errors, in that we can retry certain commands
20:13:03 [Sam_Shaw]: jugglinmike: in the case where there is an issue, we can have another tester run a particular assertion and compare results
20:14:07 [Sam_Shaw]: MK: I think so, We may want to do that like the test plan is in the queue, instead of assigning the test they just plan a "run" button that creates a unassigned data set, that when its done we can assign someone to it who will complete and validate the report.
20:15:42 [Sam_Shaw]: MK: Please put together a design proposal and lets go through it. I think you are on the right track
20:19:59 [jongund]: jongund has joined #aria-at
21:16:09 [jugglinmike]: Zakim, end the meeting
21:16:09 [Zakim]: As of this point the attendees have been jugglinmike, Sam_Shaw, James, Scholes, Matt_King, mmoss
21:16:11 [Zakim]: RRSAgent, please draft minutes
21:16:13 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/05-aria-at-minutes.html Zakim
21:16:20 [Zakim]: I am happy to have been of service, jugglinmike; please remember to excuse RRSAgent. Goodbye
21:16:20 [Zakim]: Zakim has left #aria-at
21:20:15 [jugglinmike]: RRSAgent, make log public
21:41:41 [jongund]: jongund has joined #aria-at
22:48:12 [jongund]: jongund has joined #aria-at