Apple, Columbia University Researchers discuss Accessibility-Minded ‘SceneScout’ Project
Marcus Mendes reports for 9to5 Mac on a project from Apple and Columbia University aimed at using artificial intelligence to make navigation and street-crossing more accessible to the Blind and low vision. The prototype, called SceneScout, is described on Apple’s Machine Learning Research blog as “a multimodal large language model (MLLM)-driven AI agent that enables accessible interactions with street view imagery.”
The paper was written by Gaurav Jain, Leah Findlater, and Cole Gleason.
“People who are blind or have low vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape,” the trio wrote in the paper’s introduction. “While most tools focus on in-situ navigation, those exploring pre-travel assistance typically provide only landmarks and turn-by-turn instructions, lacking detailed visual context. Street view imagery, which contains rich visual information and has the potential to reveal numerous environmental details, remains inaccessible to BLV people.”
SceneScout relies upon Apple Maps APIs alongside the aforementioned LLM to “provide interactive, AI-generated descriptions of street view images,” Mendes wrote.
As the researchers explain, SceneScout supports two modes: Route Preview and Virtual Exploration. The former is intended to “[enable] users to familiarize themselves with visual details along a route” while the latter is meant to “[enable] free movement within street view imagery.” Mendes notes SceneScout also features “a GPT–4o-based agent within real-world map data and panoramic images from Apple Maps.”
According to Mendes, the SceneScout study, comprising 10 Blind and low vision people, found both SceneScout modes highly lauded by participants. The more open-ended Virtual Exploration mode in particular was praised for providing information “they would normally have to ask others about.” The study’s participants all were well-versed in using screen readers and all worked in the tech industry, Mendes wrote.
As ever, however, there were as many shortcomings as there were advances.
“A technical evaluation [of SceneScout] shows that most descriptions are accurate (72%) and describe stable visual elements (95%) even in older imagery, though occasional subtle and plausible errors make them difficult to verify without sight,” the researchers said of the problems that cropped up. “We discuss future opportunities and challenges of using street view imagery to enhance navigation experiences.”
At 30,000 feet, the SceneScout project is encouraging because, dangerous hallucinations aside, it does prove further potential of artificial intelligence as an assistive technology. As SceneScout is iterated and refined on, it’s plausible the technology could be incorporated somewhere else so as to be available to a more “mainstream” contingent of Blind users such as myself. If SceneScout someday is able to enable fuller agency and autonomy in travel for the Blind and low vision community, then the tool will have reached self-actualization in a way that would make Maslow proud. Put another way, SceneScout theoretically could someday be as impactful to the Blind and low vision community for foot travel as Waymo’s autonomous vehicles are today for driving distances. While SceneScout and Waymo diverge in methodology, the common goal—greater accessibility for disabled people—do undoubtedly converge.
It’s also worth mentioning SceneScout’s scope is similar to that of Apple Design Award winner Oko, as well as NYU’s Commute Booster app for navigating New York City’s subway system. Both pieces of software leverage AI to varying degrees in order to make travel (and transit) for accessible to Blind and low vision people. In a nutshell, the Commute Booster app is designed to rectify the myriad issues inherent to the so-called “middle mile”—the oftentimes treacherous part of one’s journey between departure and destination, which can be really tricky for Blind people to navigate successfully.