See also: IRC log
Eric: would like to address (1) scope of "web
application" in the context of WCAG-EM and (2) how we address web applications
in WCAG-EM
... what information is missing and where do we need to add information
Eric: web application is an application rendered via a browser
Ramon: what about Air that is installed locally
Katie: downloaded using HTTP
Shadi: WCAG 2.0 defines web content as that delivered via HTTP and rendered via a browser
Katie: develop a new version of the methodology after WCAG2ICT work is completed?
Shadi: need to make sure we address what WCAG defines as web content, and make sure we do not break usage for other contexts
Ramon: should add example of web applications
... currently no differentiation to web application
Vivienne: need to spell out what web application is, because some people consider a web application is not a website
<David_MacD_Lenovo> http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=23601§ion=text Government of Canada Web pages refer to static and dynamic Web pages and Web applications.
Ramon: impact of some success criteria on web
applications are different that for more static websites
... for example when screen is magnified may impact an applications with many
widgets more than a document with more text
Katie: would not want to single out and separate
specific requirements
... screen magnification also impacts other situations
Shadi: maybe makes sense to point out to evaluator particular success criteria that occur more frequently in the context of web applications
Vivienne: use a set of such criteria when
evaluating web applications
... more important than sampling because could be only few web pages
David: rather than defining web application just use the WCAG approach of calling it content
## Examples of Web Applications ##
calendar widget
online forms
word processor
scribe:
Ramon: something that performs an action
John: typically over several iterations of interaction
Jason: interaction where user provides some input and a response based on that output
John: the nature of that response is what differentiates them
Jason: sometimes thing applications don't
exist
... it is about content and interaction
... more dynamic than traditional static pages
... model not fundamentally differnet but usability may be different
John: in web applications some of the logic happens on the client side
Jason: discrete set of functionality that serves
a specific purpose?
... what about client side scripts that generate a series of pages
David: use "horse power" for cars even though no
horses pull the cars anymore
... maybe want to keep "web page" despite the new paradigms
Ramon: want to avoid that people think WCAG-EM is not applicable to what people perceive as a web application
Vivienne: current descriptions include web
applications as part of website
... maybe only need some more examples
Katie: test all the functionality on a page
Ramon: cannot test all the functionality on an
application like Google docs
... because possibly many thousand individual ones
... usually group the types of functionality
## in a web application lots of functionality and content may be compressed into a single web page
## there may also be lots of repitition of components (blocks)
## requirement for "complete transaction" is also frequently an issue
Vivienne: deciding the parameters of a web application, where it starts and where it ends
### Example of Web Applications
iTunes homepage is a browser
internet banking part of the bank website
webmail client
hotel or airline booking
(bank may not consider internet banking application as a website in itself)
(booking websites may have distinct search versus booking functionality)
live-time tickers like for scores or stock market
social networking applications
## discussion about dependency: on a traditional website there may be more easily separable areas (like "the math department") whereas in an application there may be dependcies, like the path to get to the particular part of an application (like an "app" on facebook)
tax calculator
<Ryladog> scribe:Ryladog
Topic, Revision of the current sampling approach
<David_MacD_Lenovo> test
<scribe> Agenda: Topic, Revision of the current sampling approach
EV: We do not have a random sample at the moment
in our methododlogy
... We define in in Scope section 3
... ◦3.3 Step 3: Select a Representative Sample■3.3.1 Step 3.a: Include
Common Web Pages of the Website ■3.3.2 Step 3.b: Include Exemplar Instances
of Web Pages ■3.3.3 Step 3.c: Include Other Relevant Web Pages ■3.3.4 Step
3.d: Include Complete Processes in the Sample
3.3 is Missing random sampling. We want to add it to 3.3
VC: Perfromed a test 25% of total sample size
could be random. 90% of the pages were represented in the structured pages
... Had a problem finding random pages that were not already in her structured
pages
... That could be very expensive, random sample would need to change for next
test
DM: Why would you exclude pages because it uses a template?
VC: Because they were all so similar
DM: I would not worry about overlap
... 4 things that a random sample will help with
number 1, like the tax filing example,1. they could miss something (to ensure who website)
number 2, inadverant parts that will be missed
number 3, coming....
EV: Why do random pages?
<shadi> http://lists.w3.org/Archives/Public/public-wai-evaltf/2012Sep/0066.html
RC: Newpaper, one news item, it is not really random
EV: The idea is that it should be random
RC: Every three month automatic testing, then every other three months is maual.
DM: Self evaluation, vs. IV&V
SAZ: With an honest evaluator performing a random
sampling, have a detective's nose........
... We pick out of a completely structured example, and random sample should
produce the same results........if the chose is truely random
VC: It would be good to compare structured and random results
RC: That could be difficult
... 1 home page, 3 landing pages and 1000 content pages
EV: The random sample should in no way be worse than the structured sample
SAZ: Any random selected pages should never perform worse than structured
VC: I have one example where random wont work.
... Australian Gov says that 10% of the pages need to be tested.
KHS: Should we recommend that identification of random sampling was performed with any methodology testing
VC: Does for research a % of failures
SAZ: I want to push the sampling responsibility
to the evaluator
... The result you provide should prevail
RC: For conformance we need to include sampling
DM: Gov of Canada, has an example/sample size requirements, is 10 or minutes 90%
JK: Department of statistics in Canada put this together. They have gone up to 68 pages for a site of 3,000 pages or +. If the TF is going to go this way, by + or - 5% or 10% - that might be reasonable
EV: I am not sure
JK: Standard deviation
VC: Of your 25% should be random
... Gregg V we thong suggested that number 25%
... Why random sample, it keeps you honest
EV: Purpose, Conformation of the outcome of your
structures sampling appraoch
... Comparison of 2 or more sites or instances of the same site
VC: Random sampling is better for re-assessments of the same site
SAZ: There are different gradients of sampling
semi-structured selection, under specific criteria - with out identifying the pages
Ramon: instructions of how to select the pages including randomness
Shadi: perhaps the term of 'variation'
David: tax department doesn't give you the
opportunity to choose the tax receipts they review - the whole point of not
being able to choose - takes that out of the equation. A lot of people are not
as passionate about accessibility - they just want the checkmark. There aren't
a lot of people who know
... there should be component of random sample - some automated or third party
selected
VC: What validity of a third party if you do it yourself?
<shadi> http://www.macorr.com/sample-size-calculator.htm
KL: Self assessment can work well
KHS: The result need to be the same - as long as the outcome is the same.
KL: Always x pages should be be always evaluated and x number of pages will be randomly chosen
VC: Austalia could use this methodology
RC: We do multi-sites evaluations just to compare
<shadi> Number of Pages in the Website / Sample Size
<shadi> 5 / 5
<shadi> 10 / 10
<shadi> 25 / 23
<shadi> 50 / 42
<shadi> 100 / 73
<shadi> 125 / 86
<shadi> 150 / 97
<shadi> 200 / 116
<shadi> 250 / 131
<shadi> 350 / 153
<shadi> 500 / 176
<shadi> 750 / 200
<shadi> 1000 / 214
Not to check conformance - but to see what the people do with the website
<shadi> http://www.macorr.com/sample-size-calculator.htm
<shadi> (85% - 95% confidence)
RC: We do random sampling with automated testing
EV: The difference when random with automated or
structured
... Purposes: Confirmation,
DM: Random sampling would be used as a validation of structured testing
SAZ: Current guidance is: Use and automated tool
and then pick your pages
... How are we going to change that to include 'random'
... Two things we need to tackle with random sampling: Over-site and
Intentional unearthing of errors
EV: Should we require random sampling
VC: We should require it
JK: We should require it
KHS: We should require it
VC: Over time this matters
RC: Unsure to require it
Sample should be representative
JK: For a small site the entire site is your
representative sample
... The who idea is to avoid the view - in Canada, every site needs to test
the home pages, pages with media,
DM: We should require it
<scribe> ACTION:Jason Kiss will check with Canadian Treasury Board Secretariat's' web-site statistical audit guidance folks - to help us make a determination on sample size [recorded in http://www.w3.org/2012/10/29-eval-minutes.html#action02]
RC: Small sample size just for PASS FAIL - not for conformance - then a full evaluation
Group reviews Sampling survey - 29 repondents
JK: Is Random Sampling used for economical ans staffing? Yes all agree
Eric: Step 5C is the performance score
... only compulsory is providing documentation
... how do you score? There are a few possitilities - total website, web page
or instance. What would it look like? Do we want to make it mandatory, or if
there is a score it must be between 1-10 etc
Katie: somewhere between green & yellow
Ramon: don't like global scores because they convey whether you have done it well or bad. If you give 90% and you have a disability, even that 90% is bad for those people. It tends to make people complacent - they feel they are pretty good.
Eric: the easiest - fail or not fail
Ramon: we use severity and frequency according to
the SC
... the global score tends to be for the visually impaired
Detlev: if the score is based on WCAG, then you measure the score based on the criteria. There could be an argument for a universal score. It may not serve user groups well, because of something like captioning in which case it would fail for them completely.
Ramon: the problem with the approaches in the EM
- eg keyboard accessibility, it seems like 99% accessibility, but it seems for
most people as completely accessible.
... say for an epileptic person some criteria would be a major fail
... very difficult to find out which criteria affect which group
David: that's why we don't use the word priority - it is very political
Ramon: we are part of a foundation for people with disabilities, we cannot discriminate between the different groups. We don't like scoring because of that. Any scoring has to be very clear that covers all ofd the possible types of disability.
Detlev: one example - the BITV test - you score
each success criteria for each page with a scale. If you have 95% it doesn't
mean one SC would fail completely, it means that for 1 criteria you have less
than ideal results. EG colours could be technically a failure (4.2:1), but it
is a nearly pass. These near passes then add up
... if there are 'accessibility killers' that even when you have a good score
and you find a keyboard trap it would be downgraded to unaccessible.
Eric: how did you get to the scale? You are scoring according to severity?
Detlev: we had a number of criteria that are critical. For every SC and every test that you can downgrade. We always have 2 testers for a final test - is this vital? Sites which have vital failures will seldom reach 95% anyway.
Ramon: we try to avoid any numbering of the score. We try to give a subjective opinion - you are doing good, bad, almost accessible, terrible. We don't want to put a number because when the company comes to us and says they have 10% this means that our website is so terrible that we can't do anythingt without spending a lot of money
Katie: you don't have to use numbers
Ramon: we use 2 columns severity - and frequency - each of them 1-3.
Shadi: what is the severity of a SC?
Ramon: it is a subjective analysis
Katie: these are the 4 critical failure points
Ramon: even if the alternative text is bad, it may not be a problem. But it is subjective.
Katie: we correlate with FQT & QA we customize - critical/serious/moderate/not as moderate. This has to be fixed now, next bill etc. Because you are tracking, you can use that level. We don't use numbers. Preferably I use critical/serious and moderate. If it is 1 alt text on 1 page, that would be minor.
Eric: do you look at Frequency?
Katie: yes, that comes up in tracking - which level, which SC,
Eric: you don't say green/orange/red/black
Katie: in my world we have laws attached - section 508 - deals witrh what you much fix - fix critical first
Ramon: a project asked this specifically for
priority table - we combined severity/frequency with impact that the fix would
have.
... we include as to whether it is easy or hard to fix. The priority may be
that the impact of fixing it would great and how easy it would be to fix.
David: priority, frequency impact and effort
<Detlev> Vivienne pass / fail / N.A. not tested
<Detlev> Vivienne: score of percentages across pages of pass/fail
<Detlev> Viviennw: that was for professional work
<Detlev> Vivienne: for research, create an avergage score across pages to be able to compare sites
<Detlev> Vivienne: needs quantitative score of liberies, retest to check if repairs have been done over time
<Detlev> Vivienne: so two different worlds
<Detlev> Vivienne: client love charts
<Detlev> Vivienne: for research, adding up violation (any 4 critical points in conformance criteria) get extra 5 points to add significance
<Detlev> Vivienne: in reporting a hint of wherther it is a global or individual probem (shared page problem)
<Detlev> Vivienne: for comm work per pagethere will be pass/fail for every SC which then get aggregated across pages
David: accessibility differs for each client
based on their goals. For web applications it does change differently. We had a
template with all of the SC and you had an example the first time you ran into
the error, we report that we've found an error - tells the client to go through
and fix those issues
... report has an executive summary which summarizes - whether they have the
skills to fix things, top/priority issues - if you fix these 5 issues a whole
bunch of stuff gets better. Then has a table with 1,2,3 level priorities -
right away, next - based on effort and impact. Fix these and the site gets
better quicker. Don't provide a score - except Government of Canada. They have
to report to the courts.
... eg 1.3.1 has a huge impact - if it has the same weight it messes up the
severity of the impactg
Ramon: if they pass most of A, they get a score of 100, but when they try to go to the higher level, it lowers the score
Katie: priority for this methodology should be on compliance level.
Ramon: now the levels are more reflective on the difficulty, the unusualness, ability to comply
David: if you're going to give a score and a percentage, some people are adamant they get a score so they can compare to other organisations. Some points are more important to some people than others. Some people on the WCAG group will question the scoring method.
Shadi: 1.3.1 occurs so frequently. it can come up 100times on a page - tables/headings etc. Out of those 100 occurrences, how many were failed?
David: usually if they just get 1 wrong, they may be doing it all over the site
Shadi: then it will occur systematically so the numbers will go up
Katie: you would still get a fail overall
Shadi: is it real world that pages are really good quality and just the 1.3.1 is badly marked up
David: 1.3.1 happens disproportionately in terms of the websites.
Shadi: it occurs so often on a page. The more complexity you put in, the more subjectivity that comes in, actually lowers the value of the outcome. There is the notion that the more complex the scoring system, the more parameters it has, the more inclined it is towards subjective - the more easy to vias it.
Katie: maintain simplicity
David: he gave a client a report based on the TBS and they got 95%, and they think it's great. But all of the errors are in 1.3.1.
described adding the errors per page and dividing by the number of pages for an average score per page
David: similar to Government of Canada
Shadi: described the 3 different approaches
... 3rd one - instance: is more like Vivienne's example. For 131 you could
have 100 where 70 passed and 30 failed and you can work out an average of total
number of errors/ over total possible and gives you a ratio
Detlev: would the last one be a way of differentiating between critical and minor failures?
Shadi: you could have a website that is unusable
which could score the same as a website with a lot of decorative images tagged
with alt text etc.
... what is the purpose of scoring? How to take to court - depending on
score.
... we need to state clearly that the scores are indicative and used to
motivate the developer. EG you're on orange, you invested $5 and now you're on
level x and you can see the value of the money you've invested
Katie: the world I'm in couldn't care about what
you do right - they want to know how much trouble they are in
... only clear violation should be counted
Ramon: these approaches have the same problem - if you pass from A to AA, the percentages change and the results look worse
you can state both A and AA scores
Katie: identify which level you are trying for compliance with. You can say 100% for A and 70% for AA
Ramon: what is considered critical - contrast may
be critical for me
... you can't say anything is not critical
Detlev: we need to work with WCAG - has to be set according to that
David: there are new tools to enhance the contrast
Shadi: what are you trying to address
Eric: at the moment it is just pass/fail - WCAG already describes it. Do we want to add things like severity and impact.
shadi: those questions are part of the reporting,
First of all you have the conformance - pass/fail. Then you come out with the
report - which depends upon you as an evaluation commission as to how much
detail you want - what needs fixing,k the frequency of the issues, and as
eye-candy, and optionally you can get a score and it doesn't mean that the
website is 80% accewssible.
... this score is just an indicator - this circumstance, on this date, on
these pages. So you can compare your own progress. Helps you track your own
progress.
Katie: let's add those 4 critical requirements.
Ramon: they interfere with the other content
Katie: doesn't that make them critical then?
Detlev: if we have this kind of separation and
keep it simple on the pass fail basis and the score is additional and should
not be taken as a value of the accessibility of the page. Is there a way of
reflecting those imbalance say with 1.3.1? We have 7 different checkpoints for
1.3.1 - split into several bits so the score has more weight, same with
1.1.1
... they wanted to give a higher weight for some of the checkpoints say 1.3.1
- tries to give a relative weight. It then aggregates the results.
Katie: we have to make the same assumptions WCAG has made - things change. such as less priority on tables, more on interactive controls. We need to be careful of specific ways things are done.
Shadi: the seriousness can show itself with the
number of occurrences - there are killers and no matter how good the page is.
e.g. getting stuck
... the first 2 types - per website and per page may be too coarse. If you
want a score you will have to have a form of evaluation that counts every
occurrence.
Detlev: where do you put your time - counting and looking at every image
Eric: we just indicate if there errors, and give them a few examples.
Katie: you need to decide based on business cost as well
Ramon: we say - global issue - lists without the proper markup. You can go and fix them. If we find 3 wrong in 30 pages, we don't test all of the pages to look for more, we assume that the developers don't know how to do them and they need to go and check them themselves.
David: regarding Shadi's point - you count up the instances and see how many pass or fail. Can you come up with a % number without doing that. Is that where you want to spend you accessibility budget counting up what is right.
Detlev: it is much more important to be able to hone in those things that are vital - eg search button image with no alt text.
Shadi: is the conclusion - the first 2 - per website or per page is too coarse - more misleading than beneficial. Other is per occurrence, but the administrative overhead is too high - you are using budget to count those things that work. Perhaps provide a hybrid - certain checkpoints have more points and have a point system.
David: the 4 points were picked because if you fail them, you can't access other content
Shadi: to drop the scoring completely is the 4th option
Katie: what about per page?
Shadi: on page 1 - AA have 38 possible success criteria and you sample 10 pages. On page 1 you fulfilled 14, on page 2 you fulfilled X, You get an average per page.
Detlev: it can severely dilute problems - picking
more pages. The more pages you check, the less major the impact of the 1 huge
problem on 1 page.
... if it is an issue just on 1 page, then this is an indication of the
overall impact
Shadi: we would need to see how sensitive the score is towards changes in the sample.
Detlev: it is an issue if you use a score because people want to get a seal (90%) or (95%) for really good. It gives them an impetus for increasing the number of pages tested to water down the results
Shadi: we have to make sure the score is not a measure of conformance andnot a measure of accessibility.
Katie: then we have to say very clearly what it is
shadi: only used for looking at your own performance - we need to be really clear.
Session No. 4 : Requirements for uniform accessibility support
Lookin at Step 3.1.4 Step 1.d
Eric: example of range of AT 5 browsers, 3 types of Assistive technology
People may look at different scenarios (UA/AT) to make things work
Ramon: example of using one Screen reader instead of another
Eric: for some websites, zou don't have a choice (tax web site)
Ramon: W3C#s definition of accessibility support is loose
Katie: WCAG says you choose the technologies and AT to ensure it works across the site
<David_MacD_Lenovo> http://lists.w3.org/Archives/Public/public-wai-evaltf/2012Sep/0020.html my comments are here... do not think consistent support should be required...
Shadi: you could have a site accessible with one set of tools and another with another set of tools .. but it doesn't happen often
Ramon: Expains problem wit h PDF not accessible on the Mac - does that constitute sufficient accessibility (as you may install a virtual machine)
WCAG was deliberate in not nailing down the required level of support due to fast changing technologies
David: As long as something works it is sufficient (scribe: not sure if that is Davides position rendered correctly)
Katie: not offering alternatives for PDF that work on Mac would be a failure
Ramon / David seem to disagree
Vivienne: Australian Government would require an alternative version for PDFs (but this is no tthe WCAG position)
David: Leaving technologies out was conscious decision because it could otherwise have created disincentives for technology developers
Shadi: Partial lack of support cannot be the benchmark for establishing accessibiliy support
Vivienne: Australian government have applications withing websites only work in Windows - was a Government decision
Shadi: any other examples with conflicting sets of technologies for different parts of a site?
Vivienne: There were instances where Firefox did things that did not work in other browsers / AT
Detlev: Case of WAI-ARIA not being avaliable to many peope at the worplace
David: Makes case for not requiring any
technologies
... example of may different web teams at a large depertment or company, it is
difficult to get all teams do the same things and apply the same set of UA/AT
in their tests
Katie: testing methodology should not only test a site if the level of technology is even, - it is not the role of technology to mandate it, ressults are nevertheless helpful - uniformity not required
Shadi: take back this point to WCAG Working group
Katie: still important to list what has been used for testing
Shadi: Differences in accessibility support that have been discovered should enter reporting (where are the weaknesses)
Create new issue: Develop a concept for reporting accessibility support across the website
Katie: we should mandate a minimum set (number of tools) used
Shadi: Step 1d Define the context of website use
- define tools used in testing - that may need to be changed
... you may cometo a piece of content that only works in another context -
what is the impact of that?
Katie: Multiple operating systems should be considered - may be based on data on most commonos, browsers, AT used
Shadi: Step 1d came up to define a baseline for
the developer
... Definig techniques fine, but they may be extended by technologies used in
the site as they are discovered
... similar approach for tools -- start with the mot common, then extend to
other tools AT to see if it works there
Detlev: Does that mena as long as any tool AT out there supports it, it meets WCAG conformance?
Shadi: yes, technically, thoughit is not best practive, should be noted in the report
Vivienne: make suggestion for better practice
Detlev is that WCAG WG position<ß
David: at least for one page the same tools
should work throughout, seems to be WCAG WG position
... Discernable sections odf a website (sub sites, etc) should work together
on the same AT
... not swap within tasks
... Uniform AT support across subsites / chunks / tasks of a website might be
WG position
Shadi: Maybe WCAG WG can define it more clearly,
not do it in EVAL TF - ask for WCAG WG opinion here
... agreement that uniform level of AT support is at least per page, and per
function / widget, transaction
David: Shadi, write down as bullet points and put it into a WCAG WG survey to clarify this
Ramon: web site owner can bypass uniformity rewquirement by commissioning two different evaluations
Vivienne: Library and website may have different levels of AT support, library cozuld be singled out for testing
Ramon: problem is AT support / tools clash
David: We could make statement in Understanding
document to explain AT support
... Problem of large organisatons where that uniformity is not possible
Shadi: Uniformity may hold back technology development if new parts of a site have to keep in line with the old status
Vivienne: Government agencies purchase parts form others which look and work completely different from the rest
Shadi: We should make sure that a site does not need completely different sets of tools to access the site
<David_MacD_Lenovo> set of Web pages:.. collection of Web pages that share a common purpose and that are created by the same author, group or organization
<David_MacD_Lenovo> Note: Different language versions would be considered different sets of Web pages.
Katie: the market in the end determines what will be used
David: as a proposal to present to WCAG for uniform AT support
Shadi: Question to WCAG WFG: What is the WG postion for the intent of accessibility support for 1) within individual web pages 2) complete processes 3) sets of pages, 4) across entire collections of pages (web sites)?
Katie: Examples: a) form on a web site that only works on the Mac, b) a calendar widget that only works in Firefox c) WAI-ARIA roles that are only supported in specific AT
Detlev: for may SC it does not matter which tool platform has been used
Katie: mandating using more than on tool is still useful
<shadi> issue: Develop a concept for reporting accessibility support across the website
<trackbot> Created ISSUE-10 - Develop a concept for reporting accessibility support across the website ; please complete additional details at http://www.w3.org/WAI/ER/2011/eval/track/issues/10/edit .
<shadi> ACTION: eric to draft question to WCAG WG about the intent of accessibility support [recorded in http://www.w3.org/2012/10/29-eval-minutes.html#action03]
<trackbot> Created ACTION-7 - Draft question to WCAG WG about the intent of accessibility support [on Eric Velleman - due 2012-11-05].
David: operators are mutually talking about API so things mighr improve
Different positions whether WCAG-EM should be independent of technology change or not
Katie: Developers find new ways of AT support
Eric: Wrap up of today
David: Good process, will feed back into larger group for decisions
Vivienne: Goold discussions, moredetails how everyone does things and the differnet views
Katie: Was great, good
... no different expectations for tomorrow good that we came to formulate
clear questions for WCAG WG (about AT support) - a survey of what people use
would be interesting
Ramon: Good learning experience, reflects on many things in Technosite, similar to discussions there - only negative point is concern about excluding minorities that have specific requirements - drawing a line may include them
Katie: agrees that all disabilities (not just blindness) should be included
Detlev: Lively discussion, can stay that way today
Shadi: was a good discussion in a rather small
group, good to have high level discussion - teleconferencing woukdn#t have
cutit for that tyype of discussion - tomorrow we need to focus more on
presentational side of methodology
... Facilites for CSUN in March
Discussiob about CSUN
Eric: Is happy with input and discussions, good food for thought for editor draft - could have more hannds-on work tomorrow
[End of minutes]