19:01:10 RRSAgent has joined #aria-at 19:01:14 logging to https://www.w3.org/2023/06/26-aria-at-irc 19:01:14 RRSAgent, make logs Public 19:01:15 please title this meeting ("meeting: ..."), Matt_King 19:01:32 mmoss has joined #aria-at 19:01:32 MEETING: ARIA-AT Automation Subgroup Meeting 19:01:43 CHAIR: Mike 19:01:49 present+ 19:03:06 <\join_torontomayors> \join_torontomayors has joined #aria-at 19:03:31 jugglinmike has joined #aria-at 19:03:48 present+ jugglinmike 19:03:53 scribe+ 19:03:59 present+ 19:05:29 Agenda: 1. Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group 2.Defining the workflow for automated AT response collection in ARIA-AT 19:05:50 We are not meeting next week as its a Holiday. Our next meeting will be July 17th 19:06:09 TOPIC: Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group 19:06:53 jugglinmike: We presented about the status and future of the AT Driver. Meeting notes can be found here: https://www.w3.org/2023/06/14-webdriver-minutes.html#t06 19:07:24 jugglinmike: I wanted to know if anyone had any lesson learned they could share from developing web driver bi di. 19:08:52 jugglinmike: I asked about the possibility of bringing the AT Driver spec into the proposal, so its more formal and gets on the standard track. David burns was very open to this. That working group is re-chartering in august so this is a good time to do this 19:09:45 jugglinmike: Another member of that group, Sam from Apple, said they liked to see discussion with implementers. I've spoken with Mac from NVDA, and we have it on our to do list to talk to vispero. Lola is going to work on setting up this conversations 19:10:11 jugglinmike: I'm happy to answer any questions on this now or in the future via email 19:10:20 MK: How many people attended? 19:10:25 jugglinmike: I would say 10 19:11:43 MK: Lets talk offline about how much involvement we need from Implementors. This is a time sensitive topic, we should discuss this with Vispero tomorrow 19:12:31 MK: We haven't talked about what level of commitment we want from Vispero 19:12:55 MK: We want to be careful about approaching them from different angles 19:13:17 jugglinmike: It may then be that we want to hold off and let Lola be more intentional about how to reach out to Vispero 19:13:31 TOPIC: Defining the workflow for automated AT response collection in ARIA-AT 19:13:57 jugglinmike: We began discussing this last week at the CG meeting 19:14:18 Here is the design document we have been iterating on: https://docs.google.com/document/d/10681gMmTM2KmVEw_Je-A4nmRvcqIkB6jgtJ8dt_Pp0M/edit 19:14:39 jugglinmike: Right now we are focused on the workflow proposal section 19:15:06 jugglinmike: What it could and should look like to operate the aria AT app in a world where collecting AT Responses is possible 19:15:51 jugglinmike: Where we paused last meeting, was a suggestion from Matt that instead of making this control of the system a capability of test admin, we should instead implement that as an ability of a tester, who could optionally invoke this system to collect responses 19:16:48 MK: The primary reason i was thinking that way was longer term strategy, to simplify things. I originally introduced the idea that the Bot is like another person, however then we discussed analyzing responses. 19:17:15 MK: It really started feeling like to me that this should be another tool available to anyone, and the job of the tool is too collect responses initially 19:17:24 James Scholes present+ 19:18:11 JS: Right now we assign two or more testers to a test plan. They perform the tests and assign verdicts, this has proved to be prudent because not only do we end up with differing verdict assignments, but also end up with differing AT Responses 19:18:55 JS: If AT is collecting these responses, we are eliminating a second persons interpretations. We are just collecting two sets of verdicts 19:19:20 MK: I disagree, I wouldn't want a tester to not run the test themselves. 19:19:36 JS: So you are saying a job of tester is to compare two sets of test results 19:19:52 MK: They can say it bots results are correct, if not they can raise an issue 19:20:28 jugglinmike: If thats an expectation, than as a tester what's my motivation to run the test at all? 19:20:34 MK: It saves me time. 19:20:53 MK: I don't have to start the test, pause it, compare results 19:21:38 JS: I'm not sure thats true, right now the tester runs the test and track output. We would be asking them to run the test and compare the results 19:22:33 JS: We should assume some testers will want to note thier response and then compare them, and in that case testing would take longer. comparing two strings would take time 19:23:49 MK: If we wanted to we could certainly have one tester collect responses manually, and one collect AT responses, and not have them do any comparing 19:24:41 MK: Our test admin should work through the test to make sure the tests are designed to make them efficient 19:25:28 MK: Maybe the way we release this, is that testers at PAC and Me are the only ones who have the button to automatically collect responses. 19:26:07 MK: Then if there are conflicts with the manual testers we resolve them. Over time we can reveal this button to collect results with AT to more and more people 19:26:23 MK: Right now this feature will only be available for NVDA right Mike? 19:26:25 jugglinmike: Yes 19:27:11 JS: I just don't couple the time a manual tester to collect results and the time a AT collects result. 19:27:33 JS: That seems unnessary to have to wait for the AT to complete a test. 19:29:17 jugglinmike: It its just the Test admin is running tests with the automation thats simpler 19:31:24 JS: I'm concerned were already asking testers to do alot and this would add more confusion and complexity 19:31:54 jugglinmike: There also would be an increase in coordination to runs tests, depending on the timing of AT running the tests 19:32:07 MK: I was assuming running a test plan would take 5 mins? 19:32:17 JS: I don't know where that is coming from. 19:32:39 JS: If the tester is just reviewing verdicts, then there is a huge time savings 19:32:57 MK: I don't want testers assigning verdicts without running plans themselves 19:33:14 MK: A tester would have to correctly interpret all of the text output, which seems like a leap 19:34:01 JS: I think that depends on the tester, some will recognize output that they hear everyday. However, with JAWS output, I wouldn't recognize output because I don't use it daily 19:34:26 MK: I don't think I can make a judgement about a verdict without running the test and getting the context 19:35:07 JS: Thats what I'm talking about, I would be comfortable assigning verdicts for NVDA, but not with JAWS. 19:35:58 JS: I don't think we should jump from a tester reading a response and assigning verdicts. We agree on this. We still have some testers who aren't daily users of AT, we couldn't rely on them to make these verdicts 19:36:58 MK: I think if the AT could run all the tests, have a test admin use the output to assign all the verdicts 19:37:36 MK: Then humans get involved with comparing results from AT and human testing. That is where the huge gain is, AT can run tests for every new version of browsers or AT that is released. 19:38:06 JS: I think we need to be aware of minor differences between how human tester and AT collect response, like a space at the end of the result. 19:39:47 MK: One way we could fix this fairly quickly, is have automation run new tests, then a test admin review a spreadsheet to compare the responses quickly and determine if there are real conflicts or just editorial ones. Maybe thats the first thing to build Mike 19:40:02 MK: Maybe the idea isn't to speed up human tests 19:40:25 MK: The human tests have all sorts of important aspects, we need to have the human fully engaged. Is that where we are aligned James? 19:41:15 JS: I would love to speed up human testing, but I don't think the proposal says that. I think we are aligned on the advantages of having AT re-running tests, where there are small changes or an update 19:41:22 MK: Lets agree on the goals of the MVP 19:41:52 MK: So if was just re-runs of existing plans, and an admin can assign a bot to it 19:42:06 MK: I'm trying to imagine how this works for the admin 19:42:17 MK: Lets just consider new AT versions for example 19:42:53 MK: You add a new AT version, that updates all candidate, draft, or recommended plans that exists to need a retest 19:43:12 MK: You can identify manually the tests to rerun 19:44:02 MK: We wouldn't need to automate this, you could review the recommended report column, open the dialog, add the missing ones to the test queue, when you do so you can have the option of running with the bot. 19:44:42 MK: Then it would show up in the test queue, then if it shows up with no conflicts maybe then we can publish automatically 19:45:08 MK: Maybe we still need a button to accept as complete 19:45:27 MK: We could just do that one use case, adding a new version of a test based on a new AT version 19:45:41 JS: I don't think we should map to a specific test case 19:47:03 MK: We would then have to figure out how the system know what comparison to make 19:47:31 MK: I think that it might be intuitively obvious to the system 19:48:27 MK: Mike is this making you nervous or excited? 19:49:15 jugglinmike: Its definitely a departure from where we started writing this document. I will need some time to rethink some of these details, I don't think there's to much to change architecturally 19:49:32 jugglinmike: There is some work to identify "familiar" responses 19:50:09 jugglinmike: One thing to confirm, acutally we will not be using this automated test collecting in draft state? 19:50:11 MK: Correct. 19:51:02 MK: Maybe in the future, when we create a test we first have a bot run the test so everyone can compare results to it. However not now, as we are hesitant about this 19:52:10 jugglinmike: another question, its clear that responses collected by the system that match previous results, then a verdict is assigned. If they don't match, should the test run that is incomplete, should that have the previous AT response, the automated response or be blank? 19:53:09 MK: I think it should be matching the original human response. 19:53:53 MK: We need a way to notate when responses have been collected with AT. 19:54:11 jugglinmike: Does the existing UI compare results against different versions of AT? 19:54:20 MK: No, the test plans are agnostics. 19:54:37 MK: Older tests wont be in the queue 19:55:34 JS: I think it makes sense to make it test agnostic, if for some reason we wanted run a test plan with an older version of AT that would be great. We shouldn't prioritize changes based on AT version 19:56:18 MK: I think there is great value for a test admin to run a test with AT, to compare against the results of another tester, but we are telling it not just collect results 19:56:32 MK: So you wouldn't be able to assign the bot to a test plan that hasn't been run 19:57:01 jugglinmike: Could a test plan be in the recommended phase with a previous runs? 19:57:16 MK: We are limiting this to test plans that have complete test plan runs by humans 19:57:33 JS: In an MVP do we need that limit? As its all test admin abilities 19:58:47 JS: We may not need to consider if a test plan has been run before, the bot would just not make any verdicts 20:01:24 MK: I think we need to figure out where the value and goal of the MVP is to rerun test plans and collect results, what is the best way to scope it so that we are building as little UI as possible? 20:01:40 MK: My goal is to simplify delivery 20:02:55 MK: Is that a clear version for the MVP? 20:03:08 jugglinmike: Yes, I will need to update the design document and touch base with you again 20:12:47 Zakim, end the meeting 20:12:47 As of this point the attendees have been Matt_King, jugglinmike, mmoss 20:12:48 RRSAgent, please draft minutes 20:12:49 I have made the request to generate https://www.w3.org/2023/06/26-aria-at-minutes.html Zakim 20:12:57 I am happy to have been of service, jugglinmike; please remember to excuse RRSAgent. Goodbye 20:12:57 Zakim has left #aria-at 20:13:06 rrsagent, make minutes public 20:13:06 I'm logging. I don't understand 'make minutes public', jugglinmike. Try /msg RRSAgent help 20:13:37 rrsagent, make log public