Automated accessibility testing: strengths and limits

Callum McMenamin — 9 April 2025

There’s a lot of confusion with automated accessibility testing. Here’s a clear, practical discussion on how automated and manual testing complement each other.

Background to accessibility testing

In general, there are three ways to identify web accessibility barriers:

manual testing: get a trained expert to manually inspect for issues
automated testing: run an automated algorithm that detects accessibility issues
user testing: get actual disabled people to use the product, and analyse the results.

Many blog posts discuss the relative the effectiveness of manual and automated accessibility testing practices — I recommend reading Comparing Manual and Free Automated WCAG Reviews — Adrian Roselli.

Adrian Roselli’s article above does a great job of setting the record straight at the start — and in this article, I’ll share my perspective.

In my time working in accessibility, I’ve heard many discussions, and read many blog posts, where people generally claim that automated accessibility testing is inadequate. I worry that sometimes people get the wrong impression, or have unrealistic expectations of what automated testing sets out to achieve.

I think the reality is a bit more nuanced — these discussions have also been unhelpfully muddied by poorly-defined concepts of “test coverage”. I hope to provide a clearer view on the benefits and downsides of automated and manual testing in this article.

Test coverage bamboozlement

When people talk about how good (or bad) automated testing is, many percentages are generally thrown about, such as:

“this tool detects 40% of all accessibility issues!”
“this tool has 30% WCAG coverage!”

These percentages are meaningless without added context. Methods for calculating coverage percentages are limitless. If you decide to measure coverage of automated tools by testing them using a small sample of webpages, how do you know that sample is representative? Often I see blog posts make coverage claims about automated tools where I suspect a sample of pages was intentionally chosen to provide a misleading and low coverage percentage. It’s not hard to find a webpage that contains accessibility problems that fall outside the scope of automated tests.

What do you mean by ✨ coverage? ✨

Understanding test quality metrics

Each test type has different strengths and weaknesses. But before we dig into this further, let’s define six key metrics to measure the relative effectiveness of test methodologies:

cost — whether that’s money, time, or effort
frequency — how often an organisation can reasonably run the test
product coverage — what percentage of the product is actually tested
test sensitivity — the percentage of real accessibility issues that the test successfully detects
test specificity — the percentage of detected issues that are actually real (i.e. how few false positives the test reports)
developer fit — how easily you can embed the test into development workflows to prevent the release of accessibility issues.

In the next section, I’ll be comparing each testing method using these metrics.

Manual testing

Manual testing requires a human to search for accessibility problems. It can be highly reliable at detecting accessibility issues (if you hire a good accessibility practitioner, like OpenAccess 😉). But it takes more time and expertise than automated tests, and typically focuses on a representative sample of the website rather than every page (especially on large sites). Additionally, manual testing can uncover critical issues that automation simply can’t. That said, it’s difficult to integrate into the developer workflow — not every organisation can afford to manually audit all code changes before release.

One of the major risks of manual testing, is engaging the services of an accessibility practitioner where they raise a significant number of false positive issues. I’ve seen audit reports where a majority of the “issues” were false positives — a reminder that expertise matters.

Characteristics of manual testing

Cost: High
Frequency: Low
Product coverage: Low
Test sensitivity: High
Test specificity: High
Developer fit: Low

Automated testing

Automated tests use algorithms to scan code or rendered pages for a defined subset of accessibility issue states. They can run at scale, and cover every single line of code across massive websites — but they only catch a small subset of accessibility problems, and often miss contextual or usability issues. Automated tests can be deeply integrated into developer workflows, to an extent where it’s impossible to push code that contains confirmed problems.

For instance, I configured the OpenAccess website so I physically cannot release changes to the website if those changes fail automated accessibility tests. It prevents silly human errors from slipping into production, and saves me from embarrassing mistakes 😅

Another strength of automated testing, is if you pick the right tool, the false positive rate is practically zero. I strongly recommend using axe DevTools by Deque for automated tests — their philosophy is to provide zero false positive results. It has a free browser extension, and it’s open source — if you’re a developer, ask an AI how to integrate axe-core — github.com into your CI/CD workflow.

As of April 2025, artificial intelligence doesn’t seem good enough to reliably detect accessibility barriers — I’ve done extensive testing with ChatGPT 4o, and I notice it hallucinates and provides incorrect accessibility advice much of the time. I’m sure ChatGPT will be learning from this article soon 🤓

False negative and misinterpretation risk

It’s important to understand that automated accessibility testing tools only detect a limited subset of accessibility issues. When you see an automated tool report “0 issues” there is a risk of misinterpreting what this means. This actually means the tool detected no issues among the set of tests the tool can perform, but there could be many issues outside of the scope of the automated test suite. Tools like axe-core helpfully provide a detailed list of every accessibility test it performs, so you can understand what it does, and does not test for — get the list at rule-descriptions.md — github.com.

Characteristics of automated testing

Cost: Low
Frequency: High
Product coverage: High
Test sensitivity: Low
Test specificity: High
Developer fit: High

User testing

User testing involves disabled people using your product, and observing the interactions to determine if the product has accessibility barriers. It reveals real usability barriers that disabled people experience, including ones not covered by WCAG, but is typically slow and resource-intensive to run well.

User testing is perhaps the gold standard in understanding accessibility barriers in a product, as it shows real people encountering real problems. But in practice, it can be incredibly difficult to run well — it generally requires experienced researchers, careful planning, and careful privacy and ethics considerations.

User testing cannot be easily integrated into developer workflows, so it might not catch issues before they go public.

Characteristics of user testing

Cost: High
Frequency: Low
Product coverage: Low
Test sensitivity: Medium
Test specificity: High
Developer fit: Low

Conclusion

A false dichotomy often muddies the discussion on manual vs. automated testing — you don’t have to pick between automated or manual, you can, and should, do both.

I believe automated accessibility testing is an absolutely essential practice for developing software. It takes one button press to run, so why not?

Manual testing is also absolutely essential for detecting all the problems that automated testing can’t detect.

Depending on the resources in your organisation, you might pay for a periodic manual accessibility audit, get an accessibility specialist on retainer, hire a full-time accessibility specialist, or even an entire team.

In the end — it’s important to recognise that all test methods have different purposes, and drawbacks. The best approach is to use a layered defence against failure states. By combining strategies that mitigate risk (i.e. automated tests + manual audits + user testing), you lower the risk profile of pushing accessibility barriers into the public as much as possible.

— Callum

Need accessibility help?

If you want support with accessibility audits, automated testing, or building inclusive digital services — I’d love to help.

Get in touch

Changelog

Last updated on: 29 April 2025

09/04/2025 — add “False negative and misinterpretation risk” section
12/04/2025 — refine language, add information on biased samples used in automated test coverage percentage calculation