Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    HHS Ebola Trial, Letartortide, and Suicide Treatment: Morning Rounds

    June 23, 2026

    Strict height preferences in dating are linked to sexist attitudes, new study finds

    June 23, 2026

    Overview of Elevance Health’s efforts to streamline clinical reviews

    June 23, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Health Magazine
    • Home
    • Environmental Health
    • Health Technology
    • Medical Research
    • Mental Health
    • Nutrition Science
    • Pharma
    • Public Health
    • Discover
      • Daily Health Tips
      • Financial Health & Stability
      • Holistic Health & Wellness
      • Mental Health
      • Nutrition & Dietary Trends
      • Professional & Personal Growth
    • Our Mission
    Health Magazine
    Home » News » AI models completely fail classic psychological tests as cognitive demands increase
    Mental Health

    AI models completely fail classic psychological tests as cognitive demands increase

    healthadminBy healthadminJune 23, 2026No Comments7 Mins Read
    AI models completely fail classic psychological tests as cognitive demands increase
    Share
    Facebook Twitter Reddit Telegram Pinterest Email


    New research provides evidence that while advanced artificial intelligence models process language with remarkable skill, they have a very hard time with tasks that require the kind of sustained focus and conflict resolution found in human attention.

    The study, published in PNAS Nexus, shows that as cognitive demands increase, the ability of these programs to override automatic responses completely collapses. This finding suggests that artificial intelligence systems currently lack the fundamental executive control needed to develop true artificial general intelligence.

    To understand these findings, it helps to examine how modern artificial intelligence works. Programs like ChatGPT rely on a framework called Transformer Architecture. The system uses a special attention mechanism that allows the model to assign weights to different parts of the text and predict which words will come next based on statistical patterns.

    Suketu Patel is a doctoral candidate in Comparative and Cognitive Psychology at the City University of New York Graduate Center. Patel and his colleagues conducted the study in Jin Huang’s lab at Queens University in New York. He noted that the initial public acceptance of modern language models inspired the research team to investigate the software’s true cognitive capabilities.

    “When ChatGPT came out, a lot of the excitement centered around its ability to complete tasks, theory of mind, and emotional intelligence,” Patel says. “Still, they were prone to hallucinations and confabulations. LLM performance was strong in some tasks but surprisingly weak in others. We needed a standard attention task to rigorously investigate these systems and compare them to biological attention.”

    Human attention is a complex process supported by multiple interconnected brain networks. “The Stroop task is appropriate because the success of LLM relies on transformer attention mechanisms,” Patel said. “In humans, attention consists of three separate but overlapping systems: vigilance, orienting, and executive control. So we decided to test whether these models had all three.”

    First introduced in the 1930s, the Stroop task measures how well subjects can process contradictory information. In the standard version, participants see the word “BLUE” printed in red ink and have to say the name of the ink color instead of reading the text. “It’s worth emphasizing that the Stroop task is not a test of thinking or high-level reasoning,” Patel said. “It specifically targets conflict resolution and control.”

    The automatic human response is to simply read the word itself, but overcoming this requires active mental repression. “The core idea is that human word reading is essentially automatic, and highly trained pre-reactions become what we call strong responses, and are the ones that fire the most strongly first,” Patel explained. “AI is in a similar position, as it is much better trained to read words than color names.”

    The researchers investigated two major artificial intelligence models: OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet. The models received a picture prompt and were asked to read the presented word text or name the physical color of the text. The team tested the program using five different conditions, including words printed in matching colors, non-matching colors, mixed conditions, neutral office words, and strings of the letter “X.”

    To test how well the program could sustain attention, the scientists varied the number of words displayed in each image, ranging from 1 to 40 words. “Target maintenance is the ability to hold fast to instructions and continue to follow them under any circumstances while excluding interfering information,” Patel said. “Humans develop this ability over time. AI can certainly follow instructions and achieve goals, but it does so in fundamentally different ways, and those differences become more pronounced as the context becomes longer or contains contradictory information.”

    When processing short lists of one or five words, the artificial intelligence model performed nearly as well as a human. They achieved high accuracy in the word reading task, but their performance decreased slightly during nonmatching color name trials. However, as the list became longer, the performance of both models in the mismatched condition completely collapsed.

    Using a list of five words, GPT-4o correctly named ink colors 91% of the time on non-matching trials. This accuracy plummeted to just 1% for both the 20- and 40-word lists. Claude 3.5 Sonnet remained stable slightly longer, but ultimately the accuracy dropped to just 10% on the 40-word mismatched list.

    During these failures, the model completely abandoned the color naming instruction and defaulted back to reading text. “We were surprised at how accuracy degrades at relatively small context sizes, where the list is around 10 words,” says Patel. “What made this remarkable was the contrast with the nonword condition, namely XXXX, where accuracy was nearly perfect. This gap highlights how LLM’s automatic reading behavior, just like humans, requires meaningful words.”

    The researchers suggest that the reason artificial models experience such failures is because their programming lacks the forced monitoring capabilities found in the human brain. “Our central argument is that this limitation is due to the lack of an explicit mechanism for top-down modulation,” Patel told PsyPost. “This is a case where rules or goals actively enforce priorities between competing expressions from the start, and constraints can be maintained by suppressing priorities rather than deprioritizing them.”

    Without this mental override, the model would be overwhelmed by basic programming habits. “This study shows that the ability to detect and resolve conflicts at the signal level is reduced because the transformer’s attention can only impose soft constraints on its automatic reading, rather than hard constraints like those provided by executive control mechanisms,” Patel added.

    New artificial intelligence systems may try to circumvent this problem by using additional programming layers. “Scaffolding techniques found in modern AI systems include the use of tools, thinking, and code generation to replace missing components, but each is still bolted to the underlying model that propagates errors,” Patel said.

    Relying on external code to solve tests fundamentally misses the point of cognitive assessment. “This is why strategies that avoid inhibiting the reading of strong words defeat the purpose of the Stroop task,” Patel explained. “Some of the models we studied were inconsistent in whether they reached the code, but once the code was executed, they tended to completely solve the task.”

    The scientists address this issue extensively in their report, pointing out that relying on code generation is not true cognitive control. “Shortcutting a task through chain-of-thought reasoning or code generation is really just avoiding it, glossing over signal-level deficiencies that become important as goals become more complex,” Patel said. “Humans can cheat in exactly the same way. They can verbalize their answers, blur their vision, or use tools that prevent them from reading the words. Each of those moves invalidates the rating.”

    The study has certain methodological limitations, and the researchers note that the model could ultimately pass similar tests through brute force pattern recognition. “We do not argue that LLMs cannot perform this task,” Patel said. “With more training data, we could certainly handle even larger contexts.”

    “But it will be a task-specific kind of gating, achieved through pure exposure, rather than a general form of control that does not rely on intense training,” Patel added. “It is also noteworthy that very few tasks share the specific dynamics of the Stroop task, in which one response (the reading) is so strongly preactivated that it competes with the instructed response (color naming).”

    These findings challenge current assumptions within the technology industry. “Thus, the Stroop task is not just a measure of task performance, but a diagnostic of the structural constraints of the LLM,” Patel says. “The bitter lesson, and the implicit bet behind expanding to larger scale models towards artificial superintelligence (ASI), is that this gating mechanism, called executive control in neuroscience, will emerge from greater scale and data without a dedicated architecture.”

    Future developments in artificial intelligence may require more than simply increasing data processing speeds and expanding text databases. “We started looking at ways to incorporate execution control directly into current AI architectures,” Patel said. “We believe this is an essential component of long-term instructional follow-up: the ability to stay on task through complex interactions over time.”

    The study, “Executive Control Deficiencies in Transformer Attention,” was authored by Suketu Chandrakant Patel, Hongbin Wang, and Jin Fan.



    Source link

    Visited 2 times, 2 visit(s) today
    Share. Facebook Twitter Pinterest LinkedIn Telegram Reddit Email
    Previous ArticleScientists open a million-year-old time capsule hidden underground in New Zealand
    Next Article One of the world’s most popular herbicides may be fueling deadly superbugs
    healthadmin

    Related Posts

    Strict height preferences in dating are linked to sexist attitudes, new study finds

    June 23, 2026

    Researchers map specific empathic blind spots found in psychopathic personalities

    June 22, 2026

    How a new predictive model accurately predicted the outcome of the 2024 presidential election

    June 22, 2026

    New study finds mental health policy is a key deciding factor for voters

    June 22, 2026

    Positive conversation leaves a temporary neural echo in the brain network of mother and child

    June 22, 2026

    Harsh childhood environments shape future reproduction, but not necessarily in the way evolutionary theory predicts

    June 22, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Categories

    • Daily Health Tips
    • Discover
    • Environmental Health
    • Exercise & Fitness
    • Featured
    • Featured Videos
    • Financial Health & Stability
    • Fitness
    • Fitness Updates
    • Health
    • Health Technology
    • Healthy Aging
    • Healthy Living
    • Holistic Healing
    • Holistic Health & Wellness
    • Medical Research
    • Medical Research & Insights
    • Mental Health
    • Mental Wellness
    • Natural Remedies
    • New Workouts
    • Nutrition
    • Nutrition & Dietary Trends
    • Nutrition & Superfoods
    • Nutrition Science
    • Pharma
    • Preventive Healthcare
    • Professional & Personal Growth
    • Public Health
    • Public Health & Awareness
    • Selected
    • Sleep & Recovery
    • Top Programs
    • Weight Management
    • Workouts
    Popular Posts
    • 1773313737_bacteria_-_Sebastian_Kaulitzki_46826fb7971649bfaca04a9b4cef3309-620x480.jpgHow Sino Biological ProPure™ redefines ultra-low… March 12, 2026
    • pexels-david-bartus-442116The food industry needs to act now to cut greenhouse… January 2, 2022
    • 1773729862_TagImage-3347-458389964760995353448-620x480.jpgDespite safety concerns, parents underestimate the… March 17, 2026
    • 1773209206_futuristic_techno_design_on_background_of_supercomputer_data_center_-_Image_-_Timofeev_Vladimir_M1_4.jpegMulti-agent AI systems outperform single models… March 11, 2026
    • 1774403998_image_28620e4b6b0047f7ab9154b41d739db1-620x480.jpgGait pattern helps distinguish between Lewy body… March 24, 2026
    • the-pros-and-cons-of-paleo-dietsThe Pros and Cons of Paleo Diets: What Science Really Says April 16, 2025

    Demo
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss

    HHS Ebola Trial, Letartortide, and Suicide Treatment: Morning Rounds

    By healthadminJune 23, 2026

    Get the health information and medications you need every weekday with STAT’s free newsletter Morning…

    Strict height preferences in dating are linked to sexist attitudes, new study finds

    June 23, 2026

    Overview of Elevance Health’s efforts to streamline clinical reviews

    June 23, 2026

    One of the world’s most popular herbicides may be fueling deadly superbugs

    June 23, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    HealthxMagazine
    HealthxMagazine

    At HealthX Magazine, we are dedicated to empowering entrepreneurs, doctors, chiropractors, healthcare professionals, personal trainers, executives, thought leaders, and anyone striving for optimal health.

    Our Picks

    One of the world’s most popular herbicides may be fueling deadly superbugs

    June 23, 2026

    AI models completely fail classic psychological tests as cognitive demands increase

    June 23, 2026

    Scientists open a million-year-old time capsule hidden underground in New Zealand

    June 23, 2026
    New Comments
      Facebook X (Twitter) Instagram Pinterest
      • Home
      • Privacy Policy
      • Our Mission
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.