Neither radiologists nor multimodal large language models (LLMs) can easily distinguish between “deepfake” X-ray images generated by artificial intelligence (AI) and real X-ray images, according to research published today. RadiologyJournal of the Radiological Society of North America (RSNA). The findings highlight the potential risks associated with AI-generated X-ray images, as well as the need for tools and training to protect the integrity of medical images and help medical professionals detect deepfakes.
The term “deepfake” refers to videos, photos, images, or audio recordings that appear real but are created or manipulated using AI.
Our research shows that these deepfake X-rays are realistic enough to fool radiologists, the most highly trained medical imaging experts, even though they are aware that AI-generated images exist. This creates a high-stakes vulnerability that exposes patients to fraud lawsuits if, for example, a faked fracture cannot be distinguished from a real one. Significant cybersecurity risks also exist if hackers gain access to hospital networks and inject synthetic images to manipulate patient diagnoses or cause widespread clinical disruption by compromising the fundamental trustworthiness of digital medical records. ”
Michael Tordjman, MD, lead study author and postdoctoral fellow at the Icahn School of Medicine at Mount Sinai, New York
Seventeen radiologists from 12 different centers in six countries (USA, France, Germany, Turkey, United Kingdom, and United Arab Emirates) participated in the retrospective study. Their professional experience ranges from 0 to 40 years. Half of the 264 X-ray images included in the study were real, and the other half were generated by AI. Radiologists were evaluated on two different image sets with no overlap between the datasets. The first dataset included real images and ChatGPT-generated images of multiple anatomical regions. The second dataset contained chest X-ray images. Half were real, and the other half was created by RoentGen, an open-source generative AI diffusion model developed by researchers at Stanford Medicine.
Even though radiologist readers were unaware of the true purpose of the study, after ranking the technical quality of each ChatGPT image, only 41% spontaneously identified the AI-generated image when asked if they noticed anything unusual. After being informed that the dataset contained synthetic images, radiologists had an average accuracy of 75% in distinguishing between real and synthetic X-rays.
Individual radiologists’ performance in accurately detecting images generated with ChatGPT ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs, GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta), ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was not able to accurately detect all deepfakes, but it was able to detect most deepfakes by a significant margin when compared to Google and Meta LLM.
Radiologist accuracy in detecting RoentGen synthetic chest X-rays ranged from 62% to 78%, and LLM model performance ranged from 52% to 89%.
There was no correlation between the years of experience of the radiologist and the detection accuracy of synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists.
This study identified common characteristics of synthetic X-rays.
“Deepfake medical images often look too perfect,” says Dr. Torjman. “The bones are overly smooth, the spine is unnaturally straight, the lungs are overly symmetrical, the pattern of blood vessels is overly uniform, and fractures appear unusually clean and consistent, but are often confined to one side of the bone.”
Recommended solutions to clearly distinguish between real and fake images and prevent tampering include implementing advanced digital protection measures such as invisible watermarks that embed ownership and identity data directly into images, and automatically attaching cryptographic signatures linked to technicians when images are captured.
“We’re potentially only seeing the tip of the iceberg,” says Dr. Tordjman. “The next logical step in this evolution is AI generation of synthetic 3D images such as CT and MRI. Establishing educational datasets and detection tools is critical now.”
The authors of this study published a selected deepfake dataset with interactive quizzes for educational purposes.
sauce:
Radiological Society of North America
Reference magazines:
Tordziman, M. others. (2026). The rise of deepfake medical images: Diagnostic accuracy of radiologists in detecting ChatGPT-generated radiographs. Radiology. DOI: 10.1148/radiol.252094. https://pubs.rsna.org/doi/10.1148/radiol.252094

