August 14, 2019 ARIA and Assistive Technology Community Group Teleconference -- 14 Aug 2019

<shimizuyohta> scribe: Yohta

Discuss next steps and contract work

<Jean-Francois_Hector> scribe: Jean-Francois_Hector

Preparation for onboarding Bocoup

MCK: High level plan is:

1. One week of ramp up, where the contractor will work full time to absorb what we've produced so far.

"Valery" / "Valérie" will be the primary contact, she's an engineer.

Matt to help her learn screen reader testing. She will study use cases and other documents.

She will join the meeting next week. We will talk about the kind of things that we need to resolve to start building a prototype.

2. For the following 8 weeks, she will work one day a week, so that we can keep up.

She will look into things like: if we use an open source test management system, what are its customisation possibilities. And other things.

3. After the 8 weeks, she has 3 weeks full time to build something.

Total contract is for 30 days of work from Bocoup. This will mostly be her, and some support from others.

Jon: Sounds good. Is she able to answer questions related to the ARIA Practices?

MCK: No, work related to ARIA Practices is under a different contract.
... Bocoup have a lot of experience in standards and testing. Including experience setting up testing for the ECMAScript standard, ahead of implementation.
... Regarding the preparation for onboading:

We want to consolidate information, including documents that are in Google Drive (eg. Yohta's work on assertions)

YOHTA: Will add a comment on Github issue 5 linking to these documents.

JF: To share the latest prototype via Github issue 6, and add a comment on the issue so that it's easy for Valerie to get up to speed.

MCK: Would be useful to reference other relevant projects on the wiki. e.g. a11ysupport.io, and JAWS ARIA bugs site that Steve Faulkner / TPG have set up.
... I will add a section on relevant projects in the 'Background' page of the Wiki.

<mck> JAWS ARIA support:

<mck> https://freedomscientific.github.io/VFO-standards-support/aria.html

MCK: Valerie's name is spelt like Valerie.

<mck> From WPT: https://github.com/web-platform-tests/wpt/tree/master/wai-aria

<mck> Power Mapper: https://www.powermapper.com/tests/

MCK: We should do more research / investigation into relevant projects. Will create a page for it in the Github. Let's create Github issues when we think that we should investigate another project.
... Hoping that this will increase our momentum, and have a functional prototype in 3 month time or less

During our weekly calls, we will work on the things that we want Valerie to investigate or do related to designing our prototype infrastructure.

E.g. do we want to use an open source test system, one off the shelf, or build one from scratch?

Even spending 1 day attempting to customise a piece of software will give us a lot of knowledge about how easy or hard it is.

So Valerie would investigate routes and report and discuss with the group

Re-using Michael's project (a11ysupport.io) is an option too.

Long term viability / scalability of the project is important.

We will probably need to describe more concrete use cases, as well as revisit the high-level ones that we've already described.

The high level use cases seem strong at the moment for web developers, but we might need to revisit the high-level use cases for testers and screen reader developers.

NVDA Testing of the assertion model

YOHTA: Sent a spreadsheet on the email list on August 5
... Using Matt's template as a base I tested menu bar using NVDA.

Encountered challenges and confusion that are useful to share.

One confusion was: Having comprehensive key comments

Another confusion was: How to score the result.

Another confusion was: How to interpret the output for different key commands.

JOE: I think that we need different rows for different key commands and results (rather than having different key comments in the same row)

YOHTA: Indeed, in my document, it's hard to know which output corresponds to which key command.

MCK: Our first approach was to have different rows for different key commands. But this made things complex. So we moved to a model where there's one assertion but different ways to test it.
... There's one assertion that a screen reader appropriately supports something in browse mode (and another one in input mode).

JOE: I believe that the extra level of granularity might be necessary.

MCK: We want to summarise output at an assertion level (rather than key command level).
... But also for testers: if I have one assertion for each key command, and if they have to say pass or fail for every key command, it might be too much work for them to have to document the success/failure for each key command

JOE: Think that it's better for the tester to record results granularily. So that they can say "I did this set".

MCK: Thinking that testers could capture results at a high level. If the assertion fails, it doesn't really matter how many key commends worked or didn't work. It just doesn't work.

JOE: Having filters on the data might help.
... For testers, would be useful for testers to be able to divide up work, where one tester does some key commands, and another tester does other key commands.
... We need to make it easy for people who have limited time. What happens if they have only managed to go through 28 tests out of 30. Should they not submit it?

MCK: In practice, we're talking about only a 3 minutes difference.

JOE: It seems like it'll be a big challenge to get all this testing done, without relying on community support.
... And testing might take more time for some people.

MCK: Our initial approach was to treat every key command as an assertion. If you do that, you might need to go finer: eg. boundary conditions.

There's a risk that if we add too much granularity we end up with too many assertions, and too daunting for testers.

<JoeHumbert> apologies, but I have to jump to another meeting

MCK: We've been trying to simplify things for testers. It'd be good if we could summarise things to 'I can read the check box, I can operate the checkbox, in both modes'
... List of keys are a way to remind testers. But if they are presented with a list of 150 assertions and have to do all of them, might be daunting.
... We do want testers to test every command that we put in the list.
... One of the challenges here is that someone has to assemble all this data (instructions). And the system will need to do a lot of correlation.

JFHL Is it realistic to have per-command data, if the assertions are not per-command?

MCK: All commands for a same assertion should announce the things expected by the assertion.

Let's say the tester says that the assertion has passed, then in our data model, it will be recorded that all commands are supported. But if the tester said 'partial support', we might need to prompt the user to say which command failed.

This should be possible to automate.

If the test result is support, then we've saved ourselves a lot of time.

If the test result is partial support, we should be able to go deeper.

It'd be great to record video for testing, and attach the video to the tests. So that we could review the veracity of the tests without having to redo the test.

JFH: I don't know which approach will work best (i.e. several commands per assertion, or several assertions). But I think that it's worth trying to simpler approach first.
... I like the idea that when an assertion passes (with several command), we don't need to do any more work (i.e. testing, and assertions preparation). Then we can go into more detail when an assertion doesn't pass.
... It'd be great to use automated testing frameworks to perform all the test instructions and give us short video recordings of all the outputs. Then we would watch the outputs and rate support.

,rasagent, make minutes

- DRAFT -

August 14, 2019 ARIA and Assistive Technology Community Group Teleconference

14 Aug 2019

Attendees

Contents

Discuss next steps and contract work

Preparation for onboarding Bocoup

NVDA Testing of the assertion model

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output