19:01:10 <RRSAgent> RRSAgent has joined #aria-at
19:01:14 <RRSAgent> logging to https://www.w3.org/2023/06/26-aria-at-irc
19:01:14 <Zakim> RRSAgent, make logs Public
19:01:15 <Zakim> please title this meeting ("meeting: ..."), Matt_King
19:01:32 <mmoss> mmoss has joined #aria-at
19:01:32 <Matt_King> MEETING: ARIA-AT Automation Subgroup Meeting
19:01:43 <Matt_King> CHAIR: Mike
19:01:49 <Matt_King> present+
19:03:06 <\join_torontomayors> \join_torontomayors has joined #aria-at
19:03:31 <jugglinmike> jugglinmike has joined #aria-at
19:03:48 <jugglinmike> present+ jugglinmike
19:03:53 <Sam_Shaw> scribe+
19:03:59 <mmoss> present+
19:05:29 <Sam_Shaw> Agenda: 1. Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group 2.Defining the workflow for automated AT response collection in ARIA-AT
19:05:50 <Sam_Shaw> We are not meeting next week as its a Holiday. Our next meeting will be July 17th
19:06:09 <Sam_Shaw> TOPIC: Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group
19:06:53 <Sam_Shaw> jugglinmike: We presented about the status and future of the AT Driver. Meeting notes can be found here: https://www.w3.org/2023/06/14-webdriver-minutes.html#t06
19:07:24 <Sam_Shaw> jugglinmike: I wanted to know if anyone had any lesson learned they could share from developing web driver bi di.
19:08:52 <Sam_Shaw> jugglinmike: I asked about the possibility of bringing the AT Driver spec into the proposal, so its more formal and gets on the standard track. David burns was very open to this. That working group is re-chartering in august so this is a good time to do this
19:09:45 <Sam_Shaw> jugglinmike: Another member of that group, Sam from Apple, said they liked to see discussion with implementers. I've spoken with Mac from NVDA, and we have it on our to do list to talk to vispero. Lola is going to work on setting up this conversations
19:10:11 <Sam_Shaw> jugglinmike: I'm happy to answer any questions on this now or in the future via email
19:10:20 <Sam_Shaw> MK: How many people attended?
19:10:25 <Sam_Shaw> jugglinmike: I would say 10
19:11:43 <Sam_Shaw> MK: Lets talk offline about how much involvement we need from Implementors. This is a time sensitive topic, we should discuss this with Vispero tomorrow
19:12:31 <Sam_Shaw> MK: We haven't talked about what level of commitment we want from Vispero
19:12:55 <Sam_Shaw> MK: We want to be careful about approaching them from different angles
19:13:17 <Sam_Shaw> jugglinmike: It may then be that we want to hold off and let Lola be more intentional about how to reach out to Vispero
19:13:31 <Sam_Shaw> TOPIC: Defining the workflow for automated AT response collection in ARIA-AT
19:13:57 <Sam_Shaw> jugglinmike: We began discussing this last week at the CG meeting
19:14:18 <Sam_Shaw> Here is the design document we have been iterating on: https://docs.google.com/document/d/10681gMmTM2KmVEw_Je-A4nmRvcqIkB6jgtJ8dt_Pp0M/edit
19:14:39 <Sam_Shaw> jugglinmike: Right now we are focused on the workflow proposal section
19:15:06 <Sam_Shaw> jugglinmike: What it could and should look like to operate the aria AT app in a world where collecting AT Responses is possible
19:15:51 <Sam_Shaw> jugglinmike: Where we paused last meeting, was a suggestion from Matt that instead of making this control of the system a capability of test admin, we should instead implement that as an ability of a tester, who could optionally invoke this system to collect responses
19:16:48 <Sam_Shaw> MK: The primary reason i was thinking that way was longer term strategy, to simplify things. I originally introduced the idea that the Bot is like another person, however then we discussed analyzing responses.
19:17:15 <Sam_Shaw> MK: It really started feeling like to me that this should be another tool available to anyone, and the job of the tool is too collect responses initially
19:17:24 <Sam_Shaw> James Scholes present+
19:18:11 <Sam_Shaw> JS: Right now we assign two or more testers to a test plan. They perform the tests and assign verdicts, this has proved to be prudent because not only do we end up with differing verdict assignments, but also end up with differing AT Responses
19:18:55 <Sam_Shaw> JS: If AT is collecting these responses, we are eliminating a second persons interpretations. We are just collecting two sets of verdicts
19:19:20 <Sam_Shaw> MK: I disagree, I wouldn't want a tester to not run the test themselves.
19:19:36 <Sam_Shaw> JS: So you are saying a job of tester is to compare two sets of test results
19:19:52 <Sam_Shaw> MK: They can say it bots results are correct, if not they can raise an issue
19:20:28 <Sam_Shaw> jugglinmike: If thats an expectation, than as a tester what's my motivation to run the test at all?
19:20:34 <Sam_Shaw> MK: It saves me time.
19:20:53 <Sam_Shaw> MK: I don't have to start the test, pause it, compare results
19:21:38 <Sam_Shaw> JS: I'm not sure thats true, right now the tester runs the test and track output. We would be asking them to run the test and compare the results
19:22:33 <Sam_Shaw> JS: We should assume some testers will want to note thier response and then compare them, and in that case testing would take longer. comparing two strings would take time
19:23:49 <Sam_Shaw> MK: If we wanted to we could certainly have one tester collect responses manually, and one collect AT responses, and not have them do any comparing
19:24:41 <Sam_Shaw> MK: Our test admin should work through the test to make sure the tests are designed to make them efficient
19:25:28 <Sam_Shaw> MK: Maybe the way we release this, is that testers at PAC and Me are the only ones who have the button to automatically collect responses.
19:26:07 <Sam_Shaw> MK: Then if there are conflicts with the manual testers we resolve them. Over time we can reveal this button to collect results with AT to more and more people
19:26:23 <Sam_Shaw> MK: Right now this feature will only be available for NVDA right Mike?
19:26:25 <Sam_Shaw> jugglinmike: Yes
19:27:11 <Sam_Shaw> JS: I just don't couple the time a manual tester to collect results and the time a AT collects result.
19:27:33 <Sam_Shaw> JS: That seems unnessary to have to wait for the AT to complete a test.
19:29:17 <Sam_Shaw> jugglinmike: It its just the Test admin is running tests with the automation thats simpler
19:31:24 <Sam_Shaw> JS: I'm concerned were already asking testers to do alot and this would add more confusion and complexity
19:31:54 <Sam_Shaw> jugglinmike: There also would be an increase in coordination to runs tests, depending on the timing of AT running the tests
19:32:07 <Sam_Shaw> MK: I was assuming running a test plan would take 5 mins?
19:32:17 <Sam_Shaw> JS: I don't know where that is coming from.
19:32:39 <Sam_Shaw> JS: If the tester is just reviewing verdicts, then there is a huge time savings
19:32:57 <Sam_Shaw> MK:  I don't want testers assigning verdicts without running plans themselves
19:33:14 <Sam_Shaw> MK: A tester would have to correctly interpret all of the text output, which seems like a leap
19:34:01 <Sam_Shaw> JS: I think that depends on the tester, some will recognize output that they hear everyday. However, with JAWS output, I wouldn't recognize output because I don't use it daily
19:34:26 <Sam_Shaw> MK: I don't think I can make a judgement about a verdict without running the test and getting the context
19:35:07 <Sam_Shaw> JS: Thats what I'm talking about, I would be comfortable assigning verdicts for NVDA, but not with JAWS.
19:35:58 <Sam_Shaw> JS: I don't think we should jump from a tester reading a response and assigning verdicts. We agree on this. We still have some testers who aren't daily users of AT, we couldn't rely on them to make these verdicts
19:36:58 <Sam_Shaw> MK: I think if the AT could run all the tests, have a test admin use the output to assign all the verdicts
19:37:36 <Sam_Shaw> MK: Then humans get involved with comparing results from AT and human testing. That is where the huge gain is, AT can run tests for every new version of browsers or AT that is released.
19:38:06 <Sam_Shaw> JS: I think we need to be aware of minor differences between how human tester and AT collect response, like a space at the end of the result.
19:39:47 <Sam_Shaw> MK: One way we could fix this fairly quickly, is have automation run new tests, then a test admin review a spreadsheet to compare the responses quickly and determine if there are real conflicts or just editorial ones. Maybe thats the first thing to build Mike
19:40:02 <Sam_Shaw> MK: Maybe the idea isn't to speed up human tests
19:40:25 <Sam_Shaw> MK: The human tests have all sorts of important aspects, we need to have the human fully engaged. Is that where we are aligned James?
19:41:15 <Sam_Shaw> JS: I would love to speed up human testing, but I don't think the proposal says that. I think we are aligned on the advantages of having AT re-running tests, where there are small changes or an update
19:41:22 <Sam_Shaw> MK: Lets agree on the goals of the MVP
19:41:52 <Sam_Shaw> MK: So if was just re-runs of existing plans, and an admin can assign a bot to it
19:42:06 <Sam_Shaw> MK: I'm trying to imagine how this works for the admin
19:42:17 <Sam_Shaw> MK: Lets just consider new AT versions for example
19:42:53 <Sam_Shaw> MK: You add a new AT version, that updates all candidate, draft, or recommended plans that exists to need a retest
19:43:12 <Sam_Shaw> MK: You can identify manually the tests to rerun
19:44:02 <Sam_Shaw> MK: We wouldn't need to automate this, you could review the recommended report column, open the dialog, add the missing ones to the test queue, when you do so you can have the option of running with the bot.
19:44:42 <Sam_Shaw> MK: Then it would show up in the test queue, then if it shows up with no conflicts maybe then we can publish automatically
19:45:08 <Sam_Shaw> MK: Maybe we still need a button to accept as complete
19:45:27 <Sam_Shaw> MK: We could just do that one use case, adding a new version of a test based on a new AT version
19:45:41 <Sam_Shaw> JS: I don't think we should map to a specific test case
19:47:03 <Sam_Shaw> MK: We would then have to figure out how the system know what comparison to make
19:47:31 <Sam_Shaw> MK: I think that it might be intuitively obvious to the system
19:48:27 <Sam_Shaw> MK: Mike is this making you nervous or excited?
19:49:15 <Sam_Shaw> jugglinmike: Its definitely a departure from where we started writing this document. I will need some time to rethink some of these details, I don't think there's to much to change architecturally
19:49:32 <Sam_Shaw> jugglinmike: There is some work to identify "familiar" responses
19:50:09 <Sam_Shaw> jugglinmike: One thing to confirm, acutally we will not be using this automated test collecting in draft state?
19:50:11 <Sam_Shaw> MK: Correct.
19:51:02 <Sam_Shaw> MK: Maybe in the future, when we create a test we first have a bot run the test so everyone can compare results to it. However not now, as we are hesitant about this
19:52:10 <Sam_Shaw> jugglinmike: another question, its clear that responses collected by the system that match previous results, then a verdict is assigned. If they don't match, should the test run that is incomplete, should that have the previous AT response, the automated response or be blank?
19:53:09 <Sam_Shaw> MK: I think it should be matching the original human response.
19:53:53 <Sam_Shaw> MK: We need a way to notate when responses have been collected with AT.
19:54:11 <Sam_Shaw> jugglinmike: Does the existing UI compare results against different versions of AT?
19:54:20 <Sam_Shaw> MK: No, the test plans are agnostics.
19:54:37 <Sam_Shaw> MK: Older tests wont be in the queue
19:55:34 <Sam_Shaw> JS: I think it makes sense to make it test agnostic, if for some reason we wanted run a test plan with an older version of AT that would be great. We shouldn't prioritize changes based on AT version
19:56:18 <Sam_Shaw> MK: I think there is great value for a test admin to run a test with AT, to compare against the results of another tester, but we are telling it not just collect results
19:56:32 <Sam_Shaw> MK: So you wouldn't be able to assign the bot to a test plan that hasn't been run
19:57:01 <Sam_Shaw> jugglinmike: Could a test plan be in the recommended phase with a previous runs?
19:57:16 <Sam_Shaw> MK: We are limiting this to test plans that have complete test plan runs by humans
19:57:33 <Sam_Shaw> JS: In an MVP do we need that limit? As its all test admin abilities
19:58:47 <Sam_Shaw> JS: We may not need to consider if a test plan has been run before, the bot would just not make any verdicts
20:01:24 <Sam_Shaw> MK: I think we need to figure out where the value and  goal of the MVP is to rerun test plans and collect results, what is the best way to scope it so that we are building as little UI as possible?
20:01:40 <Sam_Shaw> MK: My goal is to simplify delivery
20:02:55 <Sam_Shaw> MK: Is that a clear version for the MVP?
20:03:08 <Sam_Shaw> jugglinmike: Yes, I will need to update the design document and touch base with you again
20:12:47 <jugglinmike> Zakim, end the meeting
20:12:47 <Zakim> As of this point the attendees have been Matt_King, jugglinmike, mmoss
20:12:48 <Zakim> RRSAgent, please draft minutes
20:12:49 <RRSAgent> I have made the request to generate https://www.w3.org/2023/06/26-aria-at-minutes.html Zakim
20:12:57 <Zakim> I am happy to have been of service, jugglinmike; please remember to excuse RRSAgent.  Goodbye
20:12:57 <Zakim> Zakim has left #aria-at
20:13:06 <jugglinmike> rrsagent, make minutes public
20:13:06 <RRSAgent> I'm logging. I don't understand 'make minutes public', jugglinmike.  Try /msg RRSAgent help
20:13:37 <jugglinmike> rrsagent, make log public