Classic brain test reveals AI's biggest weakness

Artificial intelligence systems can write essays, answer questions, and solve complex problems. But new research suggests they may have a hard time doing what humans do every day: staying focused on the task at hand when we’re distracted.

Researchers led by Suketu Patel tested several key AI models through a well-known psychological experiment called the Stroop task. The results revealed significant differences between the way AI systems process information and the way the human brain manages attention.

What is a Stroop task?

The Stroop task is a classic psychological test that has been used for decades to study attention, concentration, and self-control.

In the test, color words such as “red,” “blue,” and “green” are displayed in colored ink. Sometimes the text and ink color match. For example, the word “red” might appear in red ink. They can also be contradictory, such as the word “red” printed in blue ink.

Participants are asked to name the ink color rather than reading the word itself.

It sounds easy, but it can be difficult because reading words is an automatic habit for most people. Your brain must resist the urge to read the words and instead focus on identifying the color of the ink.

Psychologists often use this task to measure what is known as executive control, a set of mental processes that help people regulate their attention, avoid distractions, and stay focused on a goal.

Test the AI’s attentiveness

The researchers wanted to see if modern large-scale language models (LLMs) could handle this challenge as well as humans.

LLM is the AI system behind tools like ChatGPT, Claude, and Gemini. They are trained on vast amounts of text, learning language patterns and often being able to produce surprisingly human-like responses.

When given a short list of five color words, the AI system generally performed well, even when the words and colors did not match.

However, the situation changed dramatically as the list became longer.

GPT-4o achieved 91% accuracy when processing 5 words. After 10 words, the accuracy drops to 57%. When the list expanded to 40 words, accuracy dropped to just 15%.

Claude 3.5 Sonnet maintained stable performance throughout the 20-word list, but then dropped sharply, dropping to 24% accuracy for the 40-word list.

Researchers observed a similar pattern with GPT-5, Claude Opus 4.1, and Gemini 2.5.

When the AI loses focus

This task becomes even more difficult when matching color words and non-matching color words appear simultaneously in the same list.

In this situation, performance was even worse. In some cases, accuracy for mismatched items decreased to nearly zero.

According to the researchers, the AI model had a hard time maintaining instructions to identify the color of the ink. Instead, they increasingly defaulted to reading the words themselves.

In other words, the system seemed unable to consistently suppress the responses it had been most severely trained to produce.

This finding is particularly interesting because humans face similar conflicts. Generally, people are much better at reading words than naming ink colors. However, despite this bias, most people are able to maintain high accuracy and stable performance even in the face of long lists of contradictory words and colors.

Human attention and machine attention

This research highlights important differences between humans and artificial intelligence.

Modern AI systems can produce superior language and reasoning abilities, but the underlying mechanisms differ from the attention processes found in biological brains.

Humans are often able to remain focused on a specific goal while eliminating competing information. This result suggests that current AI models may struggle with this type of cognitive control as tasks become increasingly demanding.

The researchers argue that the performance collapse seen in these experiments points to fundamental limitations of today’s large-scale language models. Although AI may mimic human behavior, its ability to maintain attention appears to be very different from how humans do it.

This finding is a reminder that even the most advanced AI systems still have weaknesses. This is especially true for tasks that require resisting distractions and staying focused on long sequences of information.

Source link

Visited 3 times, 3 visit(s) today