Automated accessibility testing: strengths and limits
There’s a lot of confusion with automated accessibility testing. Here’s a clear, practical discussion on how automated and manual testing complement each other.
Background to accessibility testing
In general, there are three ways to identify web accessibility barriers:
- manual testing: get a trained expert to manually inspect for issues
- automated testing: run an automated algorithm that detects accessibility issues
- user testing: get actual disabled people to use the product, and analyse the results.
Many blog posts discuss the relative the effectiveness of manual and automated accessibility testing practices — I recommend reading Comparing Manual and Free Automated WCAG Reviews — Adrian Roselli.
Adrian Roselli’s article above does a great job of setting the record straight at the start — and in this article, I’ll share my perspective.
In my time working in accessibility, I’ve heard many discussions, and read many blog posts, where people generally claim that automated accessibility testing is inadequate. I worry that sometimes people get the wrong impression, or have unrealistic expectations of what automated testing sets out to achieve.
I think the reality is a bit more nuanced — these discussions have also been unhelpfully muddied by poorly-defined concepts of “test coverage”. I hope to provide a clearer view on the benefits and downsides of automated and manual testing in this article.
Test coverage bamboozlement
When people talk about how good (or bad) automated testing is, many percentages are generally thrown about, such as:
- “this tool detects 40% of all accessibility issues!”
- “this tool has 30% WCAG coverage!”
These percentages are meaningless without added context. Methods for calculating coverage percentages are limitless. If you decide to measure coverage of automated tools by testing them using a small sample of webpages, how do you know that sample is representative? Often I see blog posts make coverage claims about automated tools where I suspect a sample of pages was intentionally chosen to provide a misleading and low coverage percentage. It’s not hard to find a webpage that contains accessibility problems that fall outside the scope of automated tests.
What do you mean by ✨ coverage? ✨
Understanding test quality metrics
Each test type has different strengths and weaknesses. But before we dig into this further, let’s define six key metrics to measure the relative effectiveness of test methodologies:
- cost — whether that’s money, time, or effort
- frequency — how often an organisation can reasonably run the test
- product coverage — what percentage of the product is actually tested
- test sensitivity — the percentage of real accessibility issues that the test successfully detects
- test specificity — the percentage of detected issues that are actually real (i.e. how few false positives the test reports)
- developer fit — how easily you can embed the test into development workflows to prevent the release of accessibility issues.
In the next section, I’ll be comparing each testing method using these metrics.
Manual testing
Manual testing requires a human to search for accessibility problems. It can be highly reliable at detecting accessibility issues (if you hire a good accessibility practitioner, like OpenAccess 😉). But it takes more time and expertise than automated tests, and typically focuses on a representative sample of the website rather than every page (especially on large sites). Additionally, manual testing can uncover critical issues that automation simply can’t. That said, it’s difficult to integrate into the developer workflow — not every organisation can afford to manually audit all code changes before release.
One of the major risks of manual testing, is engaging the services of an accessibility practitioner where they raise a significant number of false positive issues. I’ve seen audit reports where a majority of the “issues” were false positives — a reminder that expertise matters.
Characteristics of manual testing
- Cost: High
- Frequency: Low
- Product coverage: Low
- Test sensitivity: High
- Test specificity: High
- Developer fit: Low
Automated testing
Automated tests use algorithms to scan code or rendered pages for a defined subset of accessibility issue states. They can run at scale, and cover every single line of code across massive websites — but they only catch a small subset of accessibility problems, and often miss contextual or usability issues. Automated tests can be deeply integrated into developer workflows, to an extent where it’s impossible to push code that contains confirmed problems.
For instance, I configured the OpenAccess website so I physically cannot release changes to the website if those changes fail automated accessibility tests. It prevents silly human errors from slipping into production, and saves me from embarrassing mistakes 😅
Another strength of automated testing, is if you pick the right tool, the false positive rate is practically zero. I strongly recommend using axe DevTools by Deque for automated tests — their philosophy is to provide zero false positive results. It has a free browser extension, and it’s open source — if you’re a developer, ask an AI how to integrate axe-core — github.com into your CI/CD workflow.
As of April 2025, artificial intelligence doesn’t seem good enough to reliably detect accessibility barriers — I’ve done extensive testing with ChatGPT 4o, and I notice it hallucinates and provides incorrect accessibility advice much of the time. I’m sure ChatGPT will be learning from this article soon 🤓
False negative and misinterpretation risk
It’s important to understand that automated accessibility testing tools only detect a limited subset of accessibility issues. When you see an automated tool report “0 issues” there is a risk of misinterpreting what this means. This actually means the tool detected no issues among the set of tests the tool can perform, but there could be many issues outside of the scope of the automated test suite. Tools like axe-core helpfully provide a detailed list of every accessibility test it performs, so you can understand what it does, and does not test for — get the list at rule-descriptions.md — github.com.
Characteristics of automated testing
- Cost: Low
- Frequency: High
- Product coverage: High
- Test sensitivity: Low
- Test specificity: High
- Developer fit: High
User testing
User testing involves disabled people using your product, and observing the interactions to determine if the product has accessibility barriers. It reveals real usability barriers that disabled people experience, including ones not covered by WCAG, but is typically slow and resource-intensive to run well.
User testing is perhaps the gold standard in understanding accessibility barriers in a product, as it shows real people encountering real problems. But in practice, it can be incredibly difficult to run well — it generally requires experienced researchers, careful planning, and careful privacy and ethics considerations.
User testing cannot be easily integrated into developer workflows, so it might not catch issues before they go public.
Characteristics of user testing
- Cost: High
- Frequency: Low
- Product coverage: Low
- Test sensitivity: Medium
- Test specificity: High
- Developer fit: Low
Conclusion
A false dichotomy often muddies the discussion on manual vs. automated testing — you don’t have to pick between automated or manual, you can, and should, do both.
I believe automated accessibility testing is an absolutely essential practice for developing software. It takes one button press to run, so why not?
Manual testing is also absolutely essential for detecting all the problems that automated testing can’t detect.
Depending on the resources in your organisation, you might pay for a periodic manual accessibility audit, get an accessibility specialist on retainer, hire a full-time accessibility specialist, or even an entire team.
In the end — it’s important to recognise that all test methods have different purposes, and drawbacks. The best approach is to use a layered defence against failure states. By combining strategies that mitigate risk (i.e. automated tests + manual audits + user testing), you lower the risk profile of pushing accessibility barriers into the public as much as possible.
— Callum
Need accessibility help?
If you want support with accessibility audits, automated testing, or building inclusive digital services — I’d love to help.
Changelog
Last updated on: 29 April 2025
- 09/04/2025 — add “False negative and misinterpretation risk” section
- 12/04/2025 — refine language, add information on biased samples used in automated test coverage percentage calculation