IRC log of aria-at on 2023-09-07

Timestamps are in UTC.

19:05:55 [RRSAgent]: RRSAgent has joined #aria-at
19:06:00 [RRSAgent]: logging to https://www.w3.org/2023/09/07-aria-at-irc
19:06:03 [jugglinmike]: rrsagent, make log public
19:06:08 [jugglinmike]: Zakim, start the meeting
19:06:08 [Zakim]: RRSAgent, make logs Public
19:06:10 [Zakim]: please title this meeting ("meeting: ..."), jugglinmike
19:06:23 [murray_moss]: present+
19:06:30 [jugglinmike]: MEETING: ARIA and Assistive Technologies Community Group Weekly
19:06:42 [Matt_King]: Matt_King has joined #aria-at
19:06:51 [jugglinmike]: present+ jugglinmike
19:06:51 [Sam_Shaw]: present+
19:06:54 [jugglinmike]: scribe+ jugglinmike
19:10:17 [jugglinmike]: Matt_King: No meetings next week because of TPAC
19:10:33 [jugglinmike]: Matt_King: The next meeting will be two weeks from today--September 21
19:10:41 [jugglinmike]: Matt_King: I've updated the W3C calendar, so everyone
19:10:50 [jugglinmike]: Matt_King: s calendar should be up-to-date
19:11:29 [jugglinmike]: Topic: Expectations for current automation user interface
19:11:57 [Sam_Shaw]: jugglinmike: What I want to raise is that there is an aspect of this we haven't discussed and which isn't well supported in the design of the automation UI
19:13:00 [Sam_Shaw]: jugglinmike: Basically the case concerns when we use automation in the future to collect results for test plans we already have data for. In these cases, we expect not all AT responses will match results. In this case we expect a human tester to come rerun the test, but only for the tests that the responses have changed
19:13:53 [Sam_Shaw]: jugglinmike: That is not new and doesn't need changing. In our current UI, we have all been discussing that at that moment, the test admin would assign the test plan to a bot
19:14:18 [Sam_Shaw]: jugglinmike: In that moment we don't have a way to get a second human being to verify those tests
19:14:32 [Sam_Shaw]: Matt_King: Can we run it again and assign it to another person?
19:14:54 [Sam_Shaw]: jugglinmike: Yes, thats what I had in mind, its alot of extra work for a test admin
19:16:03 [Sam_Shaw]: Matt_King: I guess my understanding is that this is a MVP, so we may run into issues. We still don't know how easy it will be for us to determine the difference between the bots results and a humans results, how much they wont match in substantive ways
19:16:33 [Sam_Shaw]: JS: That sounds okay to me
19:16:47 [Sam_Shaw]: present+ James_Scholes
19:16:55 [jugglinmike]: scribe+ jugglinmike
19:17:11 [jugglinmike]: Topic: Changes to unexpected behavior data collection and reporting
19:17:19 [Sam_Shaw]: present+ Isabel_Del_Castillo
19:17:34 [jugglinmike]: github: https://github.com/w3c/aria-at-app/issues/738
19:17:52 [jugglinmike]: Matt_King: Before we talk about the solution I've proposed, I want to make sure the problem is really well-understood
19:18:18 [jugglinmike]: Matt_King: The status quo is: after you answer all the questions in the form about the screen reader conveying the role and the name, etc
19:18:58 [jugglinmike]: Matt_King: ...there's a checkbox asking if other unexpected behaviors occured. Activating that reveals a bunch of inputs for describing the behavior
19:19:27 [jugglinmike]: Matt_King: In our reports, we have a column for required assertions, a column for optional assertiosn, and a column for unexpected behaviors
19:19:56 [jugglinmike]: Matt_King: When we surface this data in the APG (but really, any place we want to summarize the data), it lumps "unexpected behavior" in a separate category all on its own
19:20:04 [jugglinmike]: Matt_King: I see this as a problem for two reasons
19:20:39 [jugglinmike]: Matt_King: Number one: we're going to have too many numbers (four in total) once we move to "MUST"/"SHOULD"/"MAY" assertions
19:20:56 [jugglinmike]: Matt_King: Number two: people don't know how to interpret this information
19:21:35 [jugglinmike]: Matt_King: to that second point, there's a wide spectrum of "unexpected behaviors" in terms of how negatively they impact the user expecience
19:22:41 [jugglinmike]: mfairchild: I agree that's a problem
19:22:44 [jugglinmike]: present+ mfairchild
19:22:51 [jugglinmike]: present+ Hadi
19:23:14 [jugglinmike]: Hadi: so we don't have a way to communicate "level of annoyance"?
19:23:16 [jugglinmike]: Matt_King: That's right
19:24:10 [jugglinmike]: Matt_King: we might consider encoding the "level of annoyance" (even to the extent of killing the utility of the feature)
19:24:25 [jugglinmike]: mfairchild: This extends even to the AT crashing
19:24:56 [jugglinmike]: Matt_King: As for my proposed solution
19:25:31 [jugglinmike]: Matt_King: There are three levels of "annoyance": high, medium, and low.
19:26:16 [Joe_Humbert]: present+
19:26:17 [jugglinmike]: Matt_King: And there are three assertions associated with every test: "there are no high-level annoyances", "there are no medium-level annoyances", and "there are no low-level annoyances"
19:28:17 [jugglinmike]: Matt_King: That way, what we're today tracking as "unexpected behaviors" separate from assertions, would in the future be encoded and reported just like assertions
19:28:52 [jugglinmike]: Matt_King: Ignoring the considerations for data collection, do folks present today think this is a good approach from a reporting standpoint?
19:29:24 [jugglinmike]: Matt_King: Well, let's talk about data collection
19:29:40 [jugglinmike]: Matt_King: Let's say you have excess verbosity. Maybe the name on a radio group is repeated
19:30:39 [jugglinmike]: Matt_King: The way you'd say that on the form is you'd select "yes, an unexpected behavior occurred", then you choose "excess verbosity,"...
19:31:02 [jugglinmike]: Matt_King: ...next you choose how negative the impact is ("high", "medium", or "low"), and finally you write out a text description of the behavior
19:32:05 [jugglinmike]: Matt_King: It wouldn't be that each of the negative behaviors always fell into one specific bucket of "high", "medium" or "low". Instead, it's that each occurrence of an unexpected behavior would require that classification
19:33:11 [jugglinmike]: James_Scholes: I think it's a positive direction, but I wonder about how Testers will agree on the severity of these excess behaviors
19:33:46 [jugglinmike]: James_Scholes: I also wonder about sanitizing the plain-text descriptions
19:34:31 [jugglinmike]: Matt_King: I don't think we need to collect a lot of information on descriptions; I expect it'd be pretty short for most cases
19:34:48 [jugglinmike]: Matt_King: If you look at our data, these things are relatively rare. In a big-picture sense, anyway
19:35:12 [jugglinmike]: Matt_King: If you look at it in the reports--in terms of what we've agreed upon--they're extremely rare.
19:35:28 [jugglinmike]: Matt_King: I think that mitigates the impact of the additional work here
19:36:03 [jugglinmike]: Matt_King: But I think we need to work toward really solid guidelines, and be sensitive to it during the training of new Testers
19:37:38 [jugglinmike]: Hadi: My general concern is that once we start to categorize the so-called "annoyance" level, we might get into rabbit holes that we might not be able to manage
19:37:59 [jugglinmike]: Hadi: I think anything more than the required assertion should be considered as "annoyance"
19:38:29 [jugglinmike]: Hadi: e.g. if the AT repeats the name twice. Or it repeats even more times. Or it reads through to the end of the page.
19:38:41 [jugglinmike]: Hadi: Where do we say it crosses the line?
19:39:33 [jugglinmike]: Matt_King: I'm not prepared today to propose specific definitions for "high" "medium" and "low". We could do that on the meeting for September 21st
19:39:47 [jugglinmike]: Matt_King: But I also don't think we need that to move forward with development work
19:44:54 [howard-e]: https://datatracker.ietf.org/doc/html/rfc2119
19:46:22 [jugglinmike]: jugglinmike: I'm having trouble thinking of what "MAY NOT" means. And, checking RFC22119, it doesn't appear to be defined for normative purposes. I'll think offline about the impact this has on our design, if any
19:47:37 [jugglinmike]: Matt_King: We could reduce it to just two: must not and should not. That could be a good simplification. I'm starting to like that, just having two and not having the third
19:47:43 [jugglinmike]: mfairchild: I like that, too
19:47:53 [jugglinmike]: mfairchild: but does this go beyond the scope of the project?
19:48:32 [jugglinmike]: Matt_King: It's really clear from a perspective of interoperability, that one screen reader you can use with a certain pattern, and with another screen reader, you can't
19:49:18 [jugglinmike]: mfairchild: I agree. I think where we might get hung up is on the distinction between "minor" and "moderate" or "may" and "should"
19:50:03 [jugglinmike]: mfairchild: I think that, from a perspective of interoperability, we're really focused on severe impediments
19:50:56 [jugglinmike]: mfairchild: It's still subjective to a degree, but if we set the bar high enough, it won't be too distracting to actually make these determinations
19:51:12 [jugglinmike]: Matt_King: There might be some excess verbosity that we don't include in the report
19:51:42 [jugglinmike]: Matt_King: The best way for us to define these things is real-world practice
19:52:40 [jugglinmike]: Hadi: can you provide an example of "must not"
19:53:28 [jugglinmike]: Matt_King: For example, there's "must not crash." Or "must not change reading cursor position"
19:54:27 [jugglinmike]: Hadi: For example, imagine you are in a form that has 30 fields. When you tab to each field, it reads the instructions at the top of the page (which is two paragraphs long), that's just as bad as a crash!
19:54:58 [jugglinmike]: Matt_King: James_Scholes would you be comfortable moving to just two levels: "MUST NOT" and "SHOULD NOT"
19:55:56 [jugglinmike]: James_Scholes: The fewer categories, the less nuance we'll have in the data. That makes the data less valuable, but it also makes it easier for testers to make these determinations. I support it
19:57:26 [jugglinmike]: Matt_King: I'm going to make some changes to this issue based on our discussion today and also add more detail
19:58:32 [jugglinmike]: Matt_King: By the time of our next meeting (on September 21), I hope we can be having a discussion on how we categorize unexpected behaviors
19:59:08 [jugglinmike]: Zakim, end the meeting
19:59:08 [Zakim]: As of this point the attendees have been howard-e, murray_moss, jugglinmike, Sam_Shaw, James_Scholes, Isabel_Del_Castillo, mfairchild, Hadi, Joe_Humbert
19:59:11 [Zakim]: RRSAgent, please draft minutes
19:59:13 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/09/07-aria-at-minutes.html Zakim
19:59:20 [Zakim]: I am happy to have been of service, jugglinmike; please remember to excuse RRSAgent. Goodbye
19:59:20 [Zakim]: Zakim has left #aria-at