Psychologists have long debated whether the human mind can be explained by a single unifying theory, or whether different functions such as attention and memory need to be studied separately. Now, artificial intelligence (AI) is entering that discussion, offering new ways to explore how the mind works.
In July 2025, a study was published in July 2025. nature We introduced an AI model called “Centaur”. Built on standard large-scale language models and refined using data from psychology experiments, Centaur was designed to simulate human cognitive behavior. It reportedly performed well across 160 tasks including decision-making, business management, and other mental processes. The results attracted widespread attention and were seen as a potential step toward AI systems that can more broadly replicate human thinking.
New research raises questions
More recent research published in National Science Open refute those claims. Researchers at Zhejiang University argue that Centaur’s apparent success may be due to overlearning. In other words, instead of understanding the task, the model may have learned to recognize patterns in the training data and reproduce expected answers.
To test this idea, the researchers created several new evaluation scenarios. In one example, the original multiple-choice prompt describing a specific psychological task was replaced with the instruction, “Please choose option A.” If the model really understood the task, it would consistently choose option A. Instead, Centaur continued to select the “correct answer” from the original dataset.
This behavior suggests that the model is not interpreting the meaning of the question. Rather, they “guessed” the answer based on the statistical patterns they learned. The researchers compared this to students who achieved high scores by memorizing the test format without actually understanding the content.
Why is this important for AI evaluation?
This finding highlights the need for caution when evaluating the capabilities of large-scale language models. Although these systems are very effective at fitting data, their “black box” nature makes it difficult to know how they arrive at their output. This can lead to problems such as hallucinations and misunderstandings. Careful and varied testing is essential to determine whether a model actually has the skills it seems to demonstrate.
The real challenge: language understanding
Although Centaur was presented as a model capable of simulating cognition, its biggest limitation appears to be language understanding. Specifically, they have trouble recognizing and responding to the intent behind questions. This research suggests that achieving true language understanding may be one of the most important challenges in developing AI systems that can more fully model human cognition.

