Artificial intelligence (AI) is increasingly being considered as a tool to support clinical decision-making, but its actual performance in pediatric diagnosis remains unclear. Pediatric research studies using authentic clinical cases now report that advanced AI models outperform clinicians in diagnostic accuracy, especially for rare diseases, with combined human-AI approaches having the highest overall success. The findings highlight the potential of AI as a complementary tool to improve diagnostic accuracy and patient outcomes.
Accurate diagnosis in pediatric medicine can be particularly challenging, especially when rare diseases present with subtle or overlapping symptoms. Early uncertainty in diagnosis can delay treatment and increase the risk of complications. Artificial intelligence (AI) has shown promise in the medical field, but most research to date has relied on simplified or curated cases rather than real-world clinical data. This leaves important gaps in our understanding of how large-scale language models work in everyday clinical settings, where decisions are often made with limited information.
Against this backdrop, a research team led by Dr. Cristian Launes of Sant Joan de Deu Hospital in Barcelona, Spain, evaluated the performance of the AI model using real pediatric clinical cases. Research published in journals pediatric survey (March 25, 2026) compared four advanced language models with 78 pediatric clinicians across 50 cases including both common conditions and rare diseases.
Dr. Launes is a clinical professor and pediatrician at the Sant Joan de Déu Hospital in Barcelona, specializing in pediatric infectious diseases, especially respiratory viruses. His expertise includes respiratory viral infections, pediatric epidemiology, and infectious disease research.
To reflect actual clinical practice, the researchers used patient summaries based on the first 72 hours of presentation. Each case was evaluated multiple times to examine both diagnostic accuracy and consistency. Performance was evaluated based on whether the correct diagnosis appeared as the top prediction or within the top five suggestions.
The results showed that the most advanced AI models achieved higher diagnostic accuracy than clinicians overall. This benefit was particularly evident in rare disease cases, where the AI system was more likely to identify the correct diagnosis that the clinician initially missed. However, clinicians demonstrated strengths in certain complex and context-sensitive scenarios, highlighting the differences in how humans and AI approach diagnostic reasoning.
Importantly, this study did not evaluate real-time, interactive “human-AI” diagnostic workflows. Instead, the researchers estimated potential complementarity using a prespecified “combined” approach that asks whether the correct diagnosis appeared in either the clinician’s or the model run’s top five list. Under this estimate, the best-performing pairing reached a top-5 match accuracy of 94.3%, suggesting that clinicians and AI may be able to provide different correct hypotheses for difficult cases, especially for rare diseases. “Our results suggest that AI can be evaluated as a second opinion under clinician supervision, especially in difficult cases involving rare diseases,” said Dr. Launes. “Rather than replacing clinicians, these tools may help broaden the range of differential diagnosis and reduce the likelihood of missed diagnoses, as long as results are interpreted critically and within a robust surveillance framework.”
From a governance perspective, medical diagnostic decision support systems are generally considered high-risk applications under European Union AI legislation. This classification implies expectations regarding risk management, data governance, transparency, human oversight, and cybersecurity. The authors emphasize that any clinical use is advisory and should include clear accountability, monitoring, and safeguards to address the risk of variability and misleading results.
The researchers also observed that additional clinical information improved diagnostic performance in both groups. Accuracy improved when more detailed data, such as laboratory and imaging test results, was included. This finding highlights the importance of continuous clinical assessment and suggests that AI systems may be most effective when integrated into evolving, information-rich workflows.
The interaction between data quality and diagnostic performance is important. AI systems perform best when they are part of an ongoing clinical process. In this process, clinicians iteratively collect, validate, and curate an evolving clinical picture to feed the model, with continuous reassessment and human oversight. This is not a one-time input/output tool. ”
Dr. Cristian Launes, Hospital Sant Joan de Deu, Barcelona
These findings highlight the potential of AI-assisted tools to support earlier and more accurate diagnosis, especially for rare diseases where expertise may be limited. In the long term, integrating AI into clinical workflows could enable more collaborative, data-driven decision-making while also fostering closer collaboration between clinicians, engineers, and policy makers.
Overall, this study demonstrates that advanced AI models can outperform clinicians in certain pediatric diagnostic tasks, especially rare diseases, while achieving maximum benefit when used in conjunction with human expertise. Although challenges remain, such as response variability and the need for adequate monitoring, our findings demonstrate a promising role for AI as a support tool in pediatric care.
sauce:
Barcelona Sant Joan de Deu Hospital
Reference magazines:
Lounes, C. others. (2026). Large-scale language models for pediatric diagnosis: Performance evaluation using real clinical notes from common and rare cases. pediatric survey. DOI: 10.1002/ped4.70053. https://onlinelibrary.wiley.com/doi/10.1002/ped4.70053

