Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Monty Python got it wrong about medieval diseases

    March 13, 2026

    144 checks and cognitive effects

    March 13, 2026

    Study finds two types of colon polyps can increase risk of colon cancer five times

    March 13, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Health Magazine
    • Home
    • Environmental Health
    • Health Technology
    • Medical Research
    • Mental Health
    • Nutrition Science
    • Pharma
    • Public Health
    • Discover
      • Daily Health Tips
      • Financial Health & Stability
      • Holistic Health & Wellness
      • Mental Health
      • Nutrition & Dietary Trends
      • Professional & Personal Growth
    • Our Mission
    Health Magazine
    Home » News » Scientists built the most difficult AI test ever, and the results were surprising
    Nutrition Science

    Scientists built the most difficult AI test ever, and the results were surprising

    healthadminBy healthadminMarch 13, 2026No Comments4 Mins Read
    Share
    Facebook Twitter Reddit Telegram Pinterest Email


    Researchers have noticed a growing problem as artificial intelligence systems begin to score very high on academic benchmarks that have been used for years. The tests once imposed on machines were no longer difficult enough. Well-known assessments such as the Massive Multi-Task Language Understanding (MMLU) exam, which was previously considered too demanding, are no longer able to adequately measure the capabilities of today’s advanced AI models.

    To solve this problem, a global group of nearly 1,000 researchers, including professors at Texas A&M University, developed a new type of test. Their goal was to build a broad, difficult, and human-expertise test that current AI systems still struggle to handle.

    The result is Humanity’s Last Exam (HLE), a 2,500-question assessment covering mathematics, humanities, natural sciences, ancient languages, and a wide range of highly specialized academic fields. For details of the project, please see natureadditional information about the exam is available at lastexam.ai.

    Among the many contributors is Dr. Tung Nguyen, associate professor of education in the Department of Computer Science and Engineering at Texas A&M. Nguyen helped write and refine many exam questions.

    “When an AI system starts performing really well on human benchmarks, it’s tempting to think that it’s getting closer to human-level understanding,” Nguyen says. “But HLE reminds us that intelligence is about more than just pattern recognition; it’s about depth, context, and expertise.”

    The purpose of the exam is not to fool or defeat human test takers. Instead, the goal was to carefully identify areas where AI systems still fall short.

    A global effort to test the limits of AI

    Experts from around the world created and reviewed the questions included in Humanity’s Last Exam. Each question is carefully designed so there is one clear and verifiable answer. The questions are also formulated in such a way that they cannot be quickly answered by a simple internet search.

    Topics originate from high-level academic issues. Some are translating ancient Palmyrene inscriptions, others are identifying tiny anatomical structures in birds, and others are analyzing detailed features of Biblical Hebrew pronunciation.

    The researchers tested every question against leading AI systems. If the model was able to answer a question correctly, that question was removed from the final exam. This process ensured that the tests remained just slightly beyond what current AI systems can reliably solve.

    Initial testing confirmed that the strategy works. Even powerful AI models struggled on exams. GPT-4o’s score reached 2.7 percent and Claude 3.5 Sonnet reached 4.1 percent. OpenAI’s o1 model performed slightly better at 8%. The most capable systems to date, such as Gemini 3.1 Pro and Claude Opus 4.6, reach accuracy levels of around 40-50 percent.

    Why we need new AI benchmarks

    Nguyen explained that the problem with AI overcoming old tests is more than a technical concern. He contributed 73 of the 2,500 questions published on HLE, the second most among contributors, and wrote the most questions related to mathematics and computer science.

    “Without accurate assessment tools, policymakers, developers and users risk misunderstanding what AI systems can actually do,” he said. “Benchmarking provides a basis for measuring progress and identifying risks.”

    The researchers say that high scores on tests originally designed for humans do not necessarily indicate true intelligence. These benchmarks primarily measure how well an AI can complete specific tasks created for human learners, rather than capturing deeper understanding.

    A tool, not a threat

    Despite its dramatic name, “Humanity’s Last Test” does not suggest that humanity is obsolete. Instead, it focuses on the vast amount of knowledge and expertise that remains uniquely human.

    “This is not a competition with AI,” Nguyen said. “This is a way to understand where these systems are strong and where they struggle. That understanding will help us build safer and more reliable technology. And, importantly, it will remind us why human expertise remains important.”

    Building long-term AI benchmarks

    Humanity’s Last Test is designed to serve as a durable and transparent benchmark for future AI systems. To help with this goal, the researchers made some questions public, but kept most hidden so the AI ​​model couldn’t simply memorize the answers.

    “So far, humanity’s last test is one of the clearest assessments of the gap between AI and human intelligence. Despite rapid advances in technology, that gap remains large,” Nguyen said.

    large-scale international research effort

    Mr. Nguyen emphasized that the scale of the project demonstrates the value of cooperation across sectors and countries.

    “What made this project extraordinary was its scale,” he says. “Experts from almost every field contributed. It wasn’t just computer scientists; it was historians, physicists, linguists, medical researchers. It’s that diversity that reveals exactly the gap in today’s AI systems. Perhaps ironically, it’s humans working together.”



    Source link

    Visited 1 times, 1 visit(s) today
    Share. Facebook Twitter Pinterest LinkedIn Telegram Reddit Email
    Previous ArticleExtreme male brain theory about autism applies more strongly to women
    Next Article RFK Jr., Flu Shot, Dementia, Elf Bar: Morning Round
    healthadmin

    Related Posts

    Monty Python got it wrong about medieval diseases

    March 13, 2026

    Study finds two types of colon polyps can increase risk of colon cancer five times

    March 13, 2026

    A surprising new way to spread bacteria without propellers

    March 13, 2026

    Scientists unravel the 20-year nuclear mystery behind gold formation

    March 13, 2026

    Scientists discover universal temperature curve governing all life

    March 13, 2026

    A black hole and a neutron star collided in a strange elliptical orbit

    March 13, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Categories

    • Daily Health Tips
    • Discover
    • Environmental Health
    • Exercise & Fitness
    • Featured
    • Featured Videos
    • Financial Health & Stability
    • Fitness
    • Fitness Updates
    • Health
    • Health Technology
    • Healthy Aging
    • Healthy Living
    • Holistic Healing
    • Holistic Health & Wellness
    • Medical Research & Insights
    • Mental Health
    • Mental Wellness
    • Natural Remedies
    • New Workouts
    • Nutrition
    • Nutrition & Dietary Trends
    • Nutrition & Superfoods
    • Nutrition Science
    • Pharma
    • Preventive Healthcare
    • Professional & Personal Growth
    • Public Health
    • Public Health & Awareness
    • Selected
    • Sleep & Recovery
    • Top Programs
    • Weight Management
    • Workouts
    Popular Posts
    • the-pros-and-cons-of-paleo-dietsThe Pros and Cons of Paleo Diets: What Science Really Says April 16, 2025
    • Improve Mental Health10 Science-Backed Practices to Improve Mental Health… March 11, 2025
    • How Healthy Living Is Transforming Modern Wellness TrendsHow Healthy Living Is Transforming Modern Wellness… December 3, 2025
    • daily vitamin D needsWhy Sunlight Is Crucial for Your Daily Vitamin D Needs June 12, 2025
    • "The Best Daily Health Apps to Track Your Wellness Goals"The Best Daily Health Apps to Track Your Wellness… August 15, 2025
    • Healthy Living: Expert Tips to Improve Your Health in 2026Healthy Living: Expert Tips to Improve Your Health in 2026 November 16, 2025

    Demo
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss

    Monty Python got it wrong about medieval diseases

    By healthadminMarch 13, 2026

    In medieval Denmark, the place of burial often reflected a person’s wealth and status. Christians…

    144 checks and cognitive effects

    March 13, 2026

    Study finds two types of colon polyps can increase risk of colon cancer five times

    March 13, 2026

    Lilly, Pfizer Science Wind, Astellas Pharma – Fierce Pharma Asia

    March 13, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    HealthxMagazine
    HealthxMagazine

    At HealthX Magazine, we are dedicated to empowering entrepreneurs, doctors, chiropractors, healthcare professionals, personal trainers, executives, thought leaders, and anyone striving for optimal health.

    Our Picks

    Lilly, Pfizer Science Wind, Astellas Pharma – Fierce Pharma Asia

    March 13, 2026

    People with social anxiety are less likely to experience an emotional glow after sex

    March 13, 2026

    RFK Jr., Flu Shot, Dementia, Elf Bar: Morning Round

    March 13, 2026
    New Comments
      Facebook X (Twitter) Instagram Pinterest
      • Home
      • Privacy Policy
      • Our Mission
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.