AIMAC Founders Joe Devon, Eamon McErlean Talk first round of results in new interview
Last May, I published a piece here which featured interviews with Joe Devon and Eamon McErlean. The men are co-founders of the AI Model Accessibility Checker (AIMAC), an API described as “[evaluating and comparing] how well coding-focused large language models (LLMs) generate accessible code [by providing] benchmarks for companies to test and demonstrate the accessibility of their models’ output.” The tool, built in collaboration with ServiceNow at which McErlean serves as vice president and global head of digital accessibility and globalization, is “an open-source, extensible evaluation framework” that assesses the accessibility attributes of models based on artificial intelligence such as OpenAI’s ChatGPT, Google’s Gemini, and others.
To kick off 2026, Devon and McErlean have released the results of the first round of AIMAC testing. The impetus was to glean insight to how accessible (or not) AI models were at accessibly generating code for people with disabilities. Notably, the findings reveal a considerably wide performance delta between the most popular models.
"We prompted the top AI models to build web pages across 28 categories and audited them for accessibility,” the GAAD Foundation wrote in prefacing its findings. “We published every generated page, side by side, so you can see how different models tackled the same design challenge. We even measured em-dash usage.”
The GAAD Foundation, established in 2021, is chaired by Devon. He, alongside Jennison Asuncion, co-founded Global Accessibility Awareness Day (GAAD) 15 years ago, back in 2011. Global Accessibility Awareness Day for 2026 will be on Thursday, May 21.
In a brief interview conducted over email this week, McErlean told me the AIMAC’s findings show “it’s obvious that some companies are considering accessibility as they build out their LLMs while others are continuing to not prioritize it.” Moreover, he emphasized proper context is crucial; he said the AIMAC is specifically benchmarking AI systems “designed to generate code.” McErlean also acknowledged chatbots’ potential as bonafide assistive technologies in “advancing accessibility,” noting ServiceNow recently released an accessibility-minded chatbot internally in an effort to help employees get up to speed on answering accessibility-oriented questions.
“Historically, technology advanced without fully considering people with disabilities, leaving the accessibility community to retrofit solutions after the fact,” McErlean said. “As new technologies, especially AI, accelerate innovation, accessibility must be treated as a first-class requirement, not an afterthought. To address this, we launched an initiative that meets AI researchers where they are, challenging them against defined accessibility benchmarks that objectively measure the conformance of AI-generated code.”
OpenAI topped the proverbial charts in the AIMAC’s rankings, with the company’s GPT 5.2 Pro, GPT 5.2, and GPT 5.1 Codex models assuming the top three positions. In fact, OpenAI placed in five of the top ten. In terms of accessibility, OpenAI is putting its money where its mouth is when it comes to staunch support—a notion which tracks with what I’ve reported about the company in the not-so-distant past. Indeed, OpenAI’s prevalence in the AIMAC’s rankings came to no surprise from Devon.
“When [OpenAI] did the huge ChatGPT 4.0 release, they were launch partners with Be My Eyes,” he said of OpenAI’s strong showing. “It was clear they understood that paying attention to the needs of people with disabilities will result in the best models.”
What did surprise Devon, however, was Google’s lackluster showing, calling it a “real head-scratcher.” Gemini 3.0 Pro ranked dead last, at 36 of 36, though their lighter models fared better. Elsewhere, Devon expressed disappointment and dismay at Claude-maker Anthropic’s performance. He said it was “disappointing” to discover Anthropic fare so mediocrely considering the company “openly says that they are leaders in ethical AI.” More broadly, Devon explained to me he’s opened the lines of communication with people at AI companies who have bespoke accessibility teams, saying the staff at these places are “open and interested in getting feedback.”
That said, patience is a virtue. Rome wasn’t built in a day, after all.
“We won’t know until [AI companies] release their next models if the message [regarding improvements] trickled up to the right people,” Devon said. “We have to give it a little time, but I suspect we will see in the end that they all pay attention to the benchmark and will get better results.”
When asked about what the AIMAC’s initial round of results portend for the future of digital accessibility, McErlean said software developers are utilizing AI to generate code at “unprecedented speed” and, as such, it’s of critical importance that “accessibility is embedded from the start—otherwise, the negative impact will scale just as quickly.” Equally important, he added, is that people—designers, engineers, product managers—spend ample time conversing with the disability community in an effort to “truly understand the real-world impact of inaccessible software.” Doing so will result in helping workers “recognize the barriers created by poor accessibility, but also see when UI and UX are designed to be accessible, everyone benefits.”
For Devon’s part, the AIMAC isn’t sitting still. It’s going to evolve. “We’re prepping for future benchmarks since AI moves so quickly,” he said. “Clearly, we need a benchmark around agentic AI coding as well as how accessible the AI platforms are.”