Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Brains with ADHD exhibit sleep-like activity even when awake

    March 17, 2026

    Repairing tooth infections may improve blood sugar levels and heart health

    March 17, 2026

    DNA origami vaccines could be the next leap forward beyond mRNA

    March 17, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Health Magazine
    • Home
    • Environmental Health
    • Health Technology
    • Medical Research
    • Mental Health
    • Nutrition Science
    • Pharma
    • Public Health
    • Discover
      • Daily Health Tips
      • Financial Health & Stability
      • Holistic Health & Wellness
      • Mental Health
      • Nutrition & Dietary Trends
      • Professional & Personal Growth
    • Our Mission
    Health Magazine
    Home » News » Research reveals limitations of large-scale language models in medical diagnosis
    Discover

    Research reveals limitations of large-scale language models in medical diagnosis

    healthadminBy healthadminMarch 17, 2026No Comments4 Mins Read
    Research reveals limitations of large-scale language models in medical diagnosis
    Share
    Facebook Twitter Reddit Telegram Pinterest Email



    Artificial intelligence (AI) is rapidly transforming healthcare. AI systems can now detect diabetic eye disease from retinal photographs and analyze CT images for signs of early-stage lung cancer or stroke.

    Now, in hospitals across the country and around the world, special algorithms are silently assisting doctors, prioritizing urgent scans and alerting them to subtle abnormalities that may go unnoticed. These specialized AI tools are often trained on millions of accurately classified medical images and are increasingly being integrated into real-world clinical settings.

    At the same time, another form of AI, large-scale language models (LLMs), is gaining public attention. Widely accessible systems such as ChatGPT and Claude can analyze both text and images. In theory, these capabilities should be suitable for medical tasks, but can general-purpose AI platforms be trusted when it comes to medical diagnosis?

    A new study led by Milan Thoma, Ph.D., associate professor at the New York Institute of Technology College of Osteopathic Medicine (NYITCOM), suggests otherwise. as seen in academic journals algorithmToma and co-authors, including NYITCOM senior development security operations engineer Mihir Matalia and medical student Sungjoon Hon, tested the reliability of the world’s most advanced multimodal LLMSs: GPT-5, Gemini 3 Pro, Llama 4 Maverick, Grok4, and Claude Opus 4.5 Extended.

    The researchers provided each AI model with the same CT brain scan that showed obvious intracranial pathology. The models were then asked to analyze the images like radiologists to identify the imaging technique used, the location of the lesion in the brain, the primary diagnosis, key features, and potential alternative diagnoses. Overall, the findings reveal a basic diagnostic error rate of 20% across AI models and concerning variability in interpretation and assessment.

    Initially, the models yielded promising results, with all five correctly identifying the images as CT brain scans. The four models also detected an important finding: ischemic stroke near the left middle cerebral artery. However, some people made the fundamental mistake of misclassifying a stroke as a hemorrhage on the opposite side of the brain. In actual clinical practice, this error can have a significant impact on patient health, as ischemic stroke and hemorrhagic stroke require different treatments.

    Even among the four AI models that came up with a correct diagnosis, the explanations were very different. Some people offer different interpretations of when the stroke first occurred. Others did not agree on a different diagnosis or additional brain areas affected or calcifications. Next, the researchers introduced a novel surprise. We asked each AI model to score the diagnostic descriptions of other AI models. This cross-evaluation revealed further discrepancies, with some models being evaluated more harshly than others. One model even believed that this finding indicated a chronic brain abnormality rather than an acute stroke, and therefore systematically deducted points from other models’ responses.

    In recent years, Toma has published more than 30 peer-reviewed studies on AI in medical diagnostics and healthcare and two books on the subject.

    Our research highlights important differences in the AI ​​landscape. Most successful medical AI tools are task-specific algorithms, trained on large datasets of labeled medical images and validated against very specific diagnostic tasks. However, large-scale language models are not optimized for diagnostics and are built for linguistics and conversation. Therefore, they produce explanations that sound authoritative, even if their underlying interpretations are wrong or contradictory. ”

    Dr. Milan Thoma, Associate Professor, New York Institute of Technology College of Osteopathic Medicine (NYITCOM)

    Toma and his co-authors conclude that the future of healthcare AI is likely to combine both specialized diagnostic systems and language models. However, while LLM is useful for clinical documentation, summarizing reports, or communicating with patients, oversight by a medical professional remains non-negotiable for all diagnostic interpretations.

    sauce:

    New York Institute of Technology

    Reference magazines:

    Hon, S. Others. (2026). Chat is not diagnosis: Diagnostic variability and fundamental errors in multimodal LLM interpretation in radiology. algorithm. DOI: 10.3390/a19030170. https://www.mdpi.com/1999-4893/19/3/170



    Source link

    Visited 1 times, 1 visit(s) today
    Share. Facebook Twitter Pinterest LinkedIn Telegram Reddit Email
    Previous ArticleStudy reveals dual role of PFK enzyme in metabolism and cell cycle progression
    Next Article Despite safety concerns, parents underestimate the risks of teen driving
    healthadmin

    Related Posts

    Refeyn launches MyMass instrument to simplify sample quality assessment in structural biology

    March 17, 2026

    Biodegradable sanitary napkins made from water hyacinth pass safety and absorbency tests in new study

    March 17, 2026

    Despite safety concerns, parents underestimate the risks of teen driving

    March 17, 2026

    Study reveals dual role of PFK enzyme in metabolism and cell cycle progression

    March 17, 2026

    Weekly buprenorphine injections improve opioid abstinence during pregnancy

    March 16, 2026

    New polygenic risk score improves outcome prediction for metabolic diseases

    March 16, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Categories

    • Daily Health Tips
    • Discover
    • Environmental Health
    • Exercise & Fitness
    • Featured
    • Featured Videos
    • Financial Health & Stability
    • Fitness
    • Fitness Updates
    • Health
    • Health Technology
    • Healthy Aging
    • Healthy Living
    • Holistic Healing
    • Holistic Health & Wellness
    • Medical Research
    • Medical Research & Insights
    • Mental Health
    • Mental Wellness
    • Natural Remedies
    • New Workouts
    • Nutrition
    • Nutrition & Dietary Trends
    • Nutrition & Superfoods
    • Nutrition Science
    • Pharma
    • Preventive Healthcare
    • Professional & Personal Growth
    • Public Health
    • Public Health & Awareness
    • Selected
    • Sleep & Recovery
    • Top Programs
    • Weight Management
    • Workouts
    Popular Posts
    • the-pros-and-cons-of-paleo-dietsThe Pros and Cons of Paleo Diets: What Science Really Says April 16, 2025
    • Improve Mental Health10 Science-Backed Practices to Improve Mental Health… March 11, 2025
    • How Healthy Living Is Transforming Modern Wellness TrendsHow Healthy Living Is Transforming Modern Wellness… December 3, 2025
    • "The Best Daily Health Apps to Track Your Wellness Goals"The Best Daily Health Apps to Track Your Wellness… August 15, 2025
    • daily vitamin D needsWhy Sunlight Is Crucial for Your Daily Vitamin D Needs June 12, 2025
    • Healthy Living: Expert Tips to Improve Your Health in 2026Healthy Living: Expert Tips to Improve Your Health in 2026 November 16, 2025

    Demo
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss

    Brains with ADHD exhibit sleep-like activity even when awake

    By healthadminMarch 17, 2026

    New research published in JNeurosci We’re investigating how short periods of sleep, like brain activity…

    Repairing tooth infections may improve blood sugar levels and heart health

    March 17, 2026

    DNA origami vaccines could be the next leap forward beyond mRNA

    March 17, 2026

    A rare supernova from 10 billion years ago may reveal the secrets of dark energy

    March 17, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    HealthxMagazine
    HealthxMagazine

    At HealthX Magazine, we are dedicated to empowering entrepreneurs, doctors, chiropractors, healthcare professionals, personal trainers, executives, thought leaders, and anyone striving for optimal health.

    Our Picks

    A rare supernova from 10 billion years ago may reveal the secrets of dark energy

    March 17, 2026

    Refeyn launches MyMass instrument to simplify sample quality assessment in structural biology

    March 17, 2026

    Biodegradable sanitary napkins made from water hyacinth pass safety and absorbency tests in new study

    March 17, 2026
    New Comments
      Facebook X (Twitter) Instagram Pinterest
      • Home
      • Privacy Policy
      • Our Mission
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.