W3C

– DRAFT –
Bi-Weekly Meeting of Assistive Technology Automation Subgroup of ARIA-AT Community Group

22 May 2023

Attendees

Present
-
Regrets
-
Chair
-
Scribe
Sam_Shaw

Meeting minutes

Murray Moss Present+

James Scholes present+

Todays discussion will focus on some issues Mike raised in Github

Should one tester be able to assign verdicts to AT responses collected by another?

https://github.com/w3c/aria-at/issues/936

JS: Would that verdict come from a machine or human?

jugglinmike: Both. We discussed doing this for machines before, but this could also be a verdict created from both machine and human

JS: I think that could be useful.

JS: I think it would be helpful to collaborate on tests in general. Does this cause more logistical concerns than it saves time?

JS: Right now, when a conflict is raised by a tester it creates a lot of tracking down and follow up discussion to resolve

JS: If we have a test that belongs to more than one human being, we would have to track down multiple people to resolve conflicts

jugglinmike: Can we just track what people are contributing?

JS: Yes, but now there can be conflicts within a test plan, but this would get more granular and we would have conflicts at the command level etc

jugglinmike: I'm not suggesting we segment tasks within tests, but segment verdicts from tests

jugglinmike: I see this as 1 tester would be responsible for collecting all of the test results, then another would be responsible for assigning the verdicts

jugglinmike: so if there is a conflict, it would be precise, and we could easily figure out who to talk to

JS: Okay, so you're not envisioning multiple people would be contributing within a single test?

jugglinmike: Yes I don't see a use for subdividing within a test plan

jugglinmike: Another question on the agenda today is whether or not we could have multiple people assigning verdicts

jugglinmike: We could specify in the working mode how many people we would want assigning verdicts

JS: If we go down this road, I think we need some fall back plan. Right now, we require two testers for each plan.

JS: To increase that need to 3 or 4 people, would increase the workload required for the project

JS: We could say, we need atleast two different people to work on a test

JS: The verdicts are the parts that people will be opinionated

JS: In that case it does become important to have two verdicts to compare

jugglinmike: To your point about getting more help, I think it makes its easier for new people to get involved.

jugglinmike: They wouldn't need to run a screen reader, or get the test plan to run

JS: In the UK we have a process for people to review accessibility policies, we've also seen this implemented in describing images in museums and seen it be very effective.

JS: I think we need to be careful about having people assign verdicts without knowledge of how the AT works

JS: We do have these kinds of expectations and standards for people to run tests. We don't put people through any kind of assessment though. We just take people with their credentials and what kind of AT they use on a daily basis.

JS: How would we assess if someone is capable to make verdict assignments?

jugglinmike: I'm glad to hear this James. It wasn't clear to me that was a consideration, so thank youj

JS: Absolutely, across the Accessibility industry, one of the hardest things people encounter is finding people who are sufficiently experienced to be opinionated.

JS: Even our testers at PAC who use NVDA daily, they don't necessarily have opinions on how it should work instead

js: We should have the expectation that for automated tests, anyone assigning verdicts is a daily user of the screenreader, or some other equivalent experience

jugglinmike: I've been wondering as well, what it means to revise responses that were collected by automation

jugglinmike: It could be the same as the manual tests from a process prospective, but their is also opinion

JS: Even if we believe that automation is collecting full and usable test data, we should have a screen reader user review the output as they could quickly identify gaps in responses. There still is some subjective feedback that is required, whether or not the output of the test is good, and whether or not it matches a expectation

JS: I don't think we could have a JAWS users review NVDA output

jugglinmike: I agree.

jugglinmike: I wonder though, even if we don't have a training program, is there a way we can have someone install some software in a way that proves they have the experienced required?

JS: It should be noted that we do provide some training to new testers, and after their first test runs, where we can provide feedback to improve their work and understanding of how AT or testing works

JS: We haven't ever had to turn someone down from testing. But we also have a small community in which we know each other to some degree.

JS: If we were to scale up, we would need some documentation around all of this

JS: I go back to my initial response though, by having the project need to assess verdicts created by a machine, we will have a ton more work if we need to assess verdicts for human tests aswell

jugglinmike: Maybe not, it may be a big lift initially

JS: I think if we made it a requirement to assign verdicts, the test would need to have been run by that tester themselves or a machine

JS: It will create some additional work to create a system to track and assign people to verdicts, and track who made each assignment

jugglinmike: In case it helps, the reason I'm approaching it this way, is a directive from Matt to not encode automation into the working mode

jugglinmike: I do think that it has value when you consider the implementation, I think its attractive from a development perspective to treat automation the same as you would any other tester

JS: I would want to discuss that with Matt, why aren't we writing automation into the working mode. It seems like we are going to encounter issues you are talking about if we don't

JS: I know and agree with Matt on avoiding scope creep. If we can guarentte that a machine will act in a specific way, then we are freed up from having to do a whole bunch of tasks like tracking who did what tests, and the overhead of communicating with all the testers

JS: Trying to treat a machine and human as the same, will be convincing in the end.

jugglinmike: I'm not totally sure myself, I've been trying to represent Matts opinion. The issues I raised are all around this abstraction, and figuring out how do I substantiate something like running tests without specifying whether a bot or human is doing it.

jugglinmike: this is just one aspect of that discussion.

jugglinmike: Another example is do people need more or less experience to run tasks? If we think about the motivations, it may help to split up this work

JS: This is similar to a interface design, some people will want a million configurations, some will want specific functionality. If we can differentiate Between what we know the project needs, and what the project could benefit from it would be helpful.

JS: We don't currently have a problem where someone is willing to make a verdict, but not a test. We are trying to solve the problem of bandwidth, we should be concerned with trying to solve issues that don't currently exists, unless resolving that issue results in use doing less work

JS: If we end up doing less work and solving a problem we didn't know we had, then great. If we solve problems we don't have yet, there we get into an imbalance and start working on things we don't need to

jugglinmike: The outcomes of these discussions can be, this is something we want eventually and this is what we need now

jugglinmike: We can consider these things as we are developing other features, to avoid rework

jugglinmike: I agree we need to talk with Matt to discuss the working mode and what his goals are with that

jugglinmike: whether or not we decide if automation needs to be explicitly mentioned in the working mode, is aside from need to decide some of this stuff for automation to work

JS: This project will not achieve our goals without automation, we need to figure out how to scale to be successful.

JS: So to not mention automation in the working mode, seems problematic

Minutes manually created (not a transcript), formatted by scribe.perl version 196 (Thu Oct 27 17:06:44 2022 UTC).

Diagnostics

No scribenick or scribe found. Guessed: Sam_Shaw

Maybe present: JS, jugglinmike

All speakers: JS, jugglinmike

Active on IRC: jugglinmike, Sam_Shaw