As artificial intelligence (AI) becomes more commonplace in the medical field, from record keeping to assisting with medication decisions, researchers at the Icahn School of Medicine at Mount Sinai are asking important questions. “How well can AI withstand intense workloads at healthcare system scale?”
The new study was published online March 9. npj health system (https://doi.org/10.1038/s44401-026-00077-0) suggests that the answer does not depend on the AI itself, but on how the AI is designed.
Researchers have found that healthcare AI systems work much better when tasks are distributed across multiple specialized AI “agents” — software systems that can perform complex tasks, learn, and adapt — rather than relying on a single all-purpose agent. The researchers say this multi-agent approach provides stable performance as requests increase and significantly reduces computing costs and latency.
For healthcare organizations, our findings point to smarter ways to use AI. By assigning various tasks such as finding patient information, extracting data, and checking medication doses to specialized AI agents, you can help your system run faster and more reliably while keeping costs down. Ultimately, this type of design could allow healthcare teams to spend less time on administrative tasks and more time focusing on patients. ”
Girish N. Nadkarni, MD, MPH, senior study author, Barbara T. Murphy, Windreich Dean of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Eileen and Arthur M. Fishberg, PhD, Professor of Medicine, Icahn School of Medicine, Chief AI Officer, Mount Sinai Health System
As part of the study, researchers compared two approaches to clinical AI. One is a single system responsible for handling many different clinical tasks, and the other is a coordinated network of specialized AI agents overseen by a central “orchestrator.” The team used state-of-the-art language models to assess performance across common clinical functions such as information retrieval, data extraction, and medication calculations under simulated real-world conditions involving up to 80 concurrent tasks.
“What we found is that AI systems behave very similarly to humans,” says study lead author Dr. Eir Klan, formerly of the Icahn School of Medicine. “If you ask one system to do too many different things at once, performance suffers. But when one orchestrator agent divides the work among specialized agents, the system becomes accurate, responsive, and much more efficient even under high demands.”
The tuned multi-agent system maintained superior accuracy levels while using significantly less computing resources (up to 65 times less) than the single-agent design. According to the researchers, the study simulated real-world clinical “traffic” in which many types of tasks arrive at once and compete for attention.
“Our findings show that smart adjustments are more than just a technical preference,” says Dr. Cran. “It can make the difference between an AI system that continues to function smoothly and one that begins to fail under the pressure of real-world clinical workloads.”
Next, the research team plans to test these tailored AI systems directly in a clinical setting using real-time patient data. If successful, this approach could help shape how hospitals and health systems scale AI in the future, enabling them to handle peak workloads without sacrificing quality or safety.
The researchers emphasize that this effect does not come automatically, and even advanced AI can fall short if the system is poorly designed or implemented. “Medicine does not perform one task at a time,” says Dr. Nadkarni. “Hospitals face ongoing overlapping demands, especially during busy periods. Our findings show that the future of healthcare AI is not a single super-intelligent system, but a coordinated team of focused agents that work together to scale securely, manage costs, and support real-world clinical operations.”
“If a single agent handles everything, you can’t track where things went wrong. The orchestrator records every step, which tool was called, what was returned, and how the answer was assembled. With 80 concurrent tasks, the single agent’s accuracy dropped to 16%, whereas with 65 It consumed twice as much computing power. And there’s no way to understand why. That kind of transparency is not an option in medicine,” said Mahmoud Omar, MD, second author and visiting scholar in medicine. Windreich county. “This is more important than ever-Agent AI is no longer a research concept. Tools like OpenAI’s Operator Mode, Claude’s Cowork, and similar platforms are putting autonomous agents directly into the hands of clinicians and patients. As adoption accelerates, the architecture behind these systems must be auditable from the start.”
The paper is titled “Compared to single agents, coordinated multi-agents maintain accuracy under clinical-scale workloads.”
The authors of the study described in the journal are Eyal Klang, Mahmud Omar, Ganesh Raut, Reem Agbareia, Prem Timsina, Robert Freeman, Lisa Stump, Alexander Charney, Benjamin S. Glicksberg, and Girish N. Nadkarni.
This research was supported in part by Clinical and Translational Science Award (CTSA) grant UL1TR004419 from the National Center for the Advancement of Translational Science. Research reported in this publication was also supported by the National Institutes of Health Office of Research Infrastructure under award numbers S10OD026880 and S10OD030463.
sauce:
Mount Sinai Health System
Reference magazines:
Sound, E. Others. (2026). Tuned multiagents maintain accuracy under clinical-scale workloads compared to single agents. npj health system. DOI: 10.1038/s44401-026-00077-0. https://www.nature.com/articles/s44401-026-00077-0

