Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Scientists finally reveal how this Alzheimer’s drug actually works

    March 17, 2026

    Artificial intelligence struggles to consistently evaluate scientific facts

    March 17, 2026

    The tobacco smoking rate is below 10%. The CDC didn’t tell me.

    March 17, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Health Magazine
    • Home
    • Environmental Health
    • Health Technology
    • Medical Research
    • Mental Health
    • Nutrition Science
    • Pharma
    • Public Health
    • Discover
      • Daily Health Tips
      • Financial Health & Stability
      • Holistic Health & Wellness
      • Mental Health
      • Nutrition & Dietary Trends
      • Professional & Personal Growth
    • Our Mission
    Health Magazine
    Home » News » Artificial intelligence struggles to consistently evaluate scientific facts
    Mental Health

    Artificial intelligence struggles to consistently evaluate scientific facts

    healthadminBy healthadminMarch 17, 2026No Comments7 Mins Read
    Artificial intelligence struggles to consistently evaluate scientific facts
    Share
    Facebook Twitter Reddit Telegram Pinterest Email


    Generative artificial intelligence programs can be written fluently, but basic scientific descriptions remain difficult to accurately and consistently evaluate. Recent research shows that if you ask artificial intelligence the exact same question multiple times, it often returns completely different answers. These results are rutgers business reviewhighlighting the limitations of current automated reasoning and the continued need for human oversight.

    Generative artificial intelligence is a type of technology that is trained on large text databases to generate human-like sentences. Today, millions of people use these applications every day for everything from marketing to software development. The software writes in an authoritative tone that often makes it sound right, even when it’s completely wrong. Some well-known consulting firms have even faced public embarrassment for relying on automated reports containing fabricated data.

    Despite these known flaws, many companies are partnering with technology vendors to incorporate these tools into their daily operations. Professionals frequently utilize automated software to analyze data, answer customer questions, and summarize research. Researchers wanted to know whether the logical abilities of these programs actually matched their impressive vocabularies. They designed tests to see if the technology could reliably evaluate rigorous business concepts.

    Mesut Cisek, an associate professor in the Department of Marketing and International Business at Washington State University, led the study. His co-authors include Sevincgul Ulu of Southern Illinois University, Can Uslay of Rutgers University, and Kate Karniouchina of Northeastern University. The team designed an experiment to test the software’s ability to interpret academic literature.

    Researchers collected 719 scientific hypotheses from nine open-access business journals published since 2021. A hypothesis is a formal, testable prediction about how two or more things will interact in the real world. For example, a statement might predict that a certain type of advertising will increase consumer spending.

    The team presented these statements to ChatGPT, a very popular automated text generator. The program was asked to determine whether each statement was ultimately true or false based on actual research data. To test the program’s stability, the researchers sent the exact same prompt for each statement 10 separate times.

    The entire experiment was performed twice to track the progress of the technology over time. The first test was conducted in mid-2024 using an older version of the software. The researchers repeated the entire process using an updated version of the application in mid-2025.

    The results revealed a slight improvement in overall accuracy, but the raw numbers were highly misleading. The software selected the correct answer 76.5 percent of the time in 2024 and 80 percent of the time in 2025. Since there are only two possible answers to a question, a completely blind guess will be correct half of the time.

    When researchers mathematically adjusted the scores to account for random guesses, actual performance dropped significantly. The effective accuracy rate was only around 60%. The software basically barely got a passing grade when it came to predicting actual scientific discoveries.

    The program performed very poorly when evaluating ideas that the original researchers found to be false. The software correctly identified these unsupported statements only 16.4% of the time in 2025. The program exhibited a strong bias towards agreeing with the statements entered, acting as a compliant assistant rather than an objective analyst. This tendency to blindly confirm existing ideas creates an echo chamber that can mislead decision makers.

    Consistency has proven to be an even bigger problem for automated systems. The software often contradicted itself if you asked the same question 10 times in a row. In some cases, the program would jump back and forth between true and false on successive trials.

    “We’re not just talking about accuracy, we’re talking about consistency, because if you ask the same question over and over again, you’re going to get different answers,” Cicek said. In 2025, the program provided identical answers on all 10 attempts, but only on 73% of the utterances. For more than a quarter of the questions, the software gave at least one wrong answer out of 10 attempts.

    The lack of a stable response pattern makes the software very unreliable for individual searches. Once a user asks a question, a simple refresh of the page can result in a completely different answer. “There were some cases where five were true and five were false,” Cicek said.

    The researchers also categorized test questions by logical difficulty. The software handled direct causal relationships best, where one event leads directly to another. The most difficult part was the conditional statement, an idea that relies on changing a variable to true.

    These results suggest that the program relies on recognizing common word patterns rather than actually understanding concepts. It is possible to imitate the structure of a logical argument without understanding the underlying meaning or context. Although this system has a high degree of linguistic fluency, it lacks true theoretical flexibility. When faced with complex scenarios, technology is unable to adapt its reasoning.

    Software remains tied to pattern recognition rather than true understanding. “They just memorize it and can give you some insight, but they don’t understand what you’re talking about,” Cicek said. The apparent improvement over the past year appears to be due to improved text processing rather than deeper cognitive abilities.

    For managers and analysts, these limitations pose significant risks. The findings reveal that automated systems are currently too shallow to handle high-stakes decisions on their own. As the text produced by these programs becomes smoother, users can easily miss hidden conceptual flaws.

    Researchers advise experts to use artificial intelligence for speed, not substitution. Marketing teams may use text generators to brainstorm ideas or quickly summarize long reports. However, human experts must intervene to verify whether the logic is consistent with real market evidence.

    Experts also need to iterate and validate automated insights. Asking the same questions multiple times can help uncover underlying biases and instabilities in the software. Conclusions generated by artificial intelligence should be treated as diagnostic clues rather than absolute facts.

    The authors advocate building organizational literacy around automation tools. Employees need to understand exactly what these programs do well and where they fail. Organizations must train their staff to audit the reasoning behind automated answers, rather than simply trusting the numerical output.

    The ultimate goal is to create a hybrid system that combines human intelligence and automated speed. In this configuration, the software handles the structural analysis while humans retain interpretive judgment. This balanced approach ensures that technology supports human understanding rather than replacing it.

    The authors noted that the experiment had some minor limitations. This study assumes that all published and peer-reviewed findings are either completely true or false, ignoring the nuances of real-world science. Scientific discoveries can include a variety of results that do not easily fit into strict binary categories.

    The team also limited consistency testing to 10 iterations per question using a single software platform. Future studies will need more repetitions to confirm these patterns. Researchers should also test different artificial intelligence programs to see if the flaw is universal.

    Despite these limitations, research suggests that users should remain vigilant. Human judgment is still required to check these increasingly common digital systems. “Always be skeptical,” Cicek says. “I’m not against AI. I’m using AI. But we have to be very careful.”

    The study, “Unstable Intelligence: GenAI Struggles with Accuracy and Consistency,” was authored by Mesut Cicek, Sevincgul Ulu, Can Uslay, and Kate Karniouchina.



    Source link

    Visited 1 times, 1 visit(s) today
    Share. Facebook Twitter Pinterest LinkedIn Telegram Reddit Email
    Previous ArticleThe tobacco smoking rate is below 10%. The CDC didn’t tell me.
    Next Article Scientists finally reveal how this Alzheimer’s drug actually works
    healthadmin

    Related Posts

    New brain-scanning method safely tracks how Alzheimer’s drugs work in living patients

    March 17, 2026

    Actively open-minded thinking can protect you from political extremism more than liberal ideology

    March 17, 2026

    Common antidepressants show promise in treating post-orgasmic illness syndrome

    March 17, 2026

    Tragic reverse timeline reveals red flags of incel violence

    March 16, 2026

    High skin carotenoid levels in young children predict better motor skill and language development

    March 16, 2026

    Cannabidiol may help treat severe alcohol dependence and protect against brain damage

    March 16, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Categories

    • Daily Health Tips
    • Discover
    • Environmental Health
    • Exercise & Fitness
    • Featured
    • Featured Videos
    • Financial Health & Stability
    • Fitness
    • Fitness Updates
    • Health
    • Health Technology
    • Healthy Aging
    • Healthy Living
    • Holistic Healing
    • Holistic Health & Wellness
    • Medical Research
    • Medical Research & Insights
    • Mental Health
    • Mental Wellness
    • Natural Remedies
    • New Workouts
    • Nutrition
    • Nutrition & Dietary Trends
    • Nutrition & Superfoods
    • Nutrition Science
    • Pharma
    • Preventive Healthcare
    • Professional & Personal Growth
    • Public Health
    • Public Health & Awareness
    • Selected
    • Sleep & Recovery
    • Top Programs
    • Weight Management
    • Workouts
    Popular Posts
    • the-pros-and-cons-of-paleo-dietsThe Pros and Cons of Paleo Diets: What Science Really Says April 16, 2025
    • Improve Mental Health10 Science-Backed Practices to Improve Mental Health… March 11, 2025
    • How Healthy Living Is Transforming Modern Wellness TrendsHow Healthy Living Is Transforming Modern Wellness… December 3, 2025
    • "The Best Daily Health Apps to Track Your Wellness Goals"The Best Daily Health Apps to Track Your Wellness… August 15, 2025
    • daily vitamin D needsWhy Sunlight Is Crucial for Your Daily Vitamin D Needs June 12, 2025
    • Healthy Living: Expert Tips to Improve Your Health in 2026Healthy Living: Expert Tips to Improve Your Health in 2026 November 16, 2025

    Demo
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss

    Scientists finally reveal how this Alzheimer’s drug actually works

    By healthadminMarch 17, 2026

    Lecanemab, sold as Leqembi, is a monoclonal antibody treatment for Alzheimer’s disease that targets and…

    Artificial intelligence struggles to consistently evaluate scientific facts

    March 17, 2026

    The tobacco smoking rate is below 10%. The CDC didn’t tell me.

    March 17, 2026

    NIH Grant Award for Workplace Mental Health: Morning Rounds

    March 17, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    HealthxMagazine
    HealthxMagazine

    At HealthX Magazine, we are dedicated to empowering entrepreneurs, doctors, chiropractors, healthcare professionals, personal trainers, executives, thought leaders, and anyone striving for optimal health.

    Our Picks

    NIH Grant Award for Workplace Mental Health: Morning Rounds

    March 17, 2026

    Overcoming Healthcare Access Barriers: The Promise of Digital Health Innovations

    March 17, 2026

    NASA’s Webb photographs strange brain-shaped nebula around dying star

    March 17, 2026
    New Comments
      Facebook X (Twitter) Instagram Pinterest
      • Home
      • Privacy Policy
      • Our Mission
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.