Can one AI analyze all medical scans? MedVersa shows promise across multiple image processing tasks

A large-scale new multimodal AI system trained on tens of millions of medical images could unify fragmented radiology tools and help doctors interpret scans and generate reports more efficiently.

Research: MedVersa: A generalist-based model for diverse medical image processing tasks. Image credit: Thitisan / Shutterstock

In a recent study published in the journal NEJM AIresearchers introduced “MedVersa,” a generalist artificial intelligence (AI) model that can capture and interpret a wide range of medical imaging modalities and task types. Unlike traditional AI models trained for specific, narrow tasks, MedVersa is built on tens of millions of medical image instances to detect medical conditions and generate reports within a unified analytical framework.

Encouragingly, when comparing MedVersa’s performance to that of human radiologists in blinded evaluations of chest radiograph reports, the model often produced reports that were judged to be clinically equivalent to human-written reports, particularly for scans containing normal findings, and significantly reduced the time human radiologists spent documenting findings. Taken together, these results suggest that MedVersa is a promising step toward developing a new generation of integrated multimodal foundation models that may help integrate the currently fragmented ecosystem of AI tools currently used in clinical practice.

Background: Fragmentation of medical artificial intelligence tools

With recent advances in computational power and artificial intelligence (AI) model logic, some of these tools have been approved for use in the medical field, but their use is often fragmented. A model trained on an X-ray dataset can accurately detect pneumonia in a patient’s chest radiograph, but cannot use MRI or ultrasound data for overall patient assessment.

These “expert” models often struggle to adapt to complex clinical workflows where multiple data types are involved in diagnosing a patient. Computational biologists have attempted to address this contradiction by introducing the concept of Generalist Medical Artificial Intelligence (GMAI).

Their goal was to create a “foundation model” (similar to the “agent” technology employed in ChatGPT, Google Gemini, and other large-scale language models (LLMs)) that can handle multimodal input and output. Unfortunately, previous attempts to realize this concept have focused primarily on text-based input and have proven unable to resolve the complex visual tasks essential to radiology.

MedVersa Multimodal AI Model Development

This study aimed to address this capability gap by designing a radiology-specific generalist AI model, “MedVersa,” that can capture, annotate, diagnose, report, and document multimodal clinical image data. The model was trained using MedInterp, a large dataset that aggregates 91 public datasets. This dataset includes images, bounding box annotations, segmentation masks, captions, and other visual-linguistic surveillance signals used in a variety of image processing tasks, and includes more than 29 million medical instances.

The model features a unique architecture that uses a trained LLM as an “orchestrator” to evaluate the user’s requirements (e.g., “Where is the patient’s tumor?”) and dynamically select the appropriate internal vision module within the MedVersa framework to execute the request. Unlike previous GMAIs, which were primarily text-based, MedVersa is designed to generate text responses or deploy specialized “vision modules” for object detection or segmentation.

As a result, MedVersa can simultaneously process diverse inputs such as 2D X-rays, 3D CT and MRI scans, and patient history text. Following model training, MedVersa’s performance was validated against two different traditional competitors across nine different imaging tasks: 1. an approved expert AI model, and 2. a board-certified radiologist (n = 10).

Evaluation framework and comparative testing

The performance evaluation required an expert (AI model or human radiologist) to review reports generated by humans, ChatGPT-4o, and MedVersa for chest X-ray examinations. Importantly, the experts were blinded to the data source. Performance was scored based on the clinical accuracy and assessment efficiency (time taken to complete the assessment and generate the report) of the expert’s output.

Findings: Performance across imaging tasks

Findings show that MedVersa’s GMAI architecture competes with, and often outperforms, traditional “gold standard” expert models on many object detection and segmentation metrics.

When evaluating model report generation, on the BLEU-4 test (higher numbers are better and measures text similarity), MedVersa achieved a score of 17.8 compared to MAIRA’s 14.2, BiomedGPT’s 12.0, and Med-PaLM M’s 11.5. In the RadCliQ test (lower is better and measures deviation from human clinical reports), MedVersa achieved a score of 2.71 compared to MAIRA’s 3.10 and BiomedGPT’s 3.25. Med-PaLM M reported a slightly better RadCliQ score (2.67), which was statistically indistinguishable from MedVersa.

Comparison with human radiologist reports

When compared to human experts, researchers found that MedVersa reports were clinically equivalent to human-written reports in 64% of cases. For scans with normal findings, this equivalence increased to 91%. However, for scans with abnormal findings with more complex pathology, equivalence was much lower, with human-written reports often preferred by peer-review radiologists.

Researchers also demonstrated that using MedVersa as an assistant, physicians can complete report writing workflows more quickly. This reduced report creation time and, importantly, resulted in fewer “urgent” discrepancies (errors requiring immediate attention) than the reports produced by GPT-4o (20% reduction in 5-10 minute reporting intervals).

Conclusion: Towards an integrated clinical AI assistant

This study reveals that MedVersa is an important step toward developing integrated clinical assistants, rather than relying on traditional, fragmented AI tools. Its architecture, which leverages LLM to tune specialized vision tools, allowed this new model to achieve performance comparable to or better than specialized AI models across several tasks, while significantly streamlining and accelerating the workflow of expert human radiologists.

However, this study found that while MedVersa was superior for routine cases, board-certified radiologists were still preferred for complex and unusual cases with complex pathology, highlighting the importance of expert supervision. The authors also note that broader generalizability across imaging modalities remains an ongoing challenge, as some of the non-thoracic X-ray datasets included in the study were dominated by segmentation tasks rather than full diagnostic interpretation.

Therefore, while this study validates that MedVersa is a strong proof of concept, future GMAI models will need to be trained on expanded datasets that include more modalities (e.g., genetic information and electronic health records (EHRs)) to fully realize the potential of AI-assisted, human expert-mediated patient care.

Reference magazines:

Zhou, H.-Y., Acosta, J. N., Adithan, S., Datta, S., Topol, E. J., and Rajpurkar, P. (2026). MedVersa: A generalist-based model for diverse medical image processing tasks. NEJM AI. DOI – 10.1056/aioa2500595. https://ai.nejm.org/doi/full/10.1056/AIoa2500595

Source link

Visited 13 times, 1 visit(s) today

What's Hot

Salmon predators are rapidly increasing in Alaska as ocean temperatures rise

How do cognitive abilities and logical intuition evolve during middle school and high school?

A small mitochondrial protein may explain the health benefits of the Mediterranean diet

Can one AI analyze all medical scans? MedVersa shows promise across multiple image processing tasks

Brain-Gut Health Initiative supports AI-assisted diagnosis of mental illness

Study finds different types of crystalloids to be equally effective against pediatric sepsis

US infant mortality rate remains higher than other high-income countries

Ultra-processed foods are associated with measurable reductions in human attention span

The constant beating of the heart inhibits the growth of tumors within heart tissue

Molecular map reveals how Down syndrome causes changes in brain development

Salmon predators are rapidly increasing in Alaska as ocean temperatures rise

How do cognitive abilities and logical intuition evolve during middle school and high school?

A small mitochondrial protein may explain the health benefits of the Mediterranean diet

Harvard scientists link gut bacteria and depression through hidden inflammatory triggers

Our Picks

Harvard scientists link gut bacteria and depression through hidden inflammatory triggers

People view the term “sex worker” much more positively than “prostitute” or “prostitute”

Scientists have discovered that Africa is closer to collapse than we thought

Subscribe to Updates

What's Hot

Can one AI analyze all medical scans? MedVersa shows promise across multiple image processing tasks

Background: Fragmentation of medical artificial intelligence tools

MedVersa Multimodal AI Model Development

Evaluation framework and comparative testing

Findings: Performance across imaging tasks

Comparison with human radiologist reports

Conclusion: Towards an integrated clinical AI assistant

Related Posts