According to a major review, AI can help clinicians work faster and more accurately, but only if the system is built around real-world clinical workflows, coordinated trust, and clear accountability.

Research: Human-AI collaboration in healthcare: A scoping review. SWK Stock / Shutterstock
New scoping review available as journal article npj digital medicine We discuss recent evidence regarding the utility of human-artificial intelligence (AI) collaboration in healthcare.
background
Applications of artificial intelligence (AI) in healthcare are rapidly increasing across clinical tasks such as medical documentation, triage and task prioritization, image interpretation, and care coordination.
However, in medical settings that involve important decision-making, the usefulness of AI cannot be assessed simply by comparing the performance of the AI system with that of clinicians. Therefore, collaboration between humans and AI systems working under meaningful human supervision is required in healthcare settings, where patient safety, professional accountability, and situational decision-making are paramount.
Health-related policy and regulatory frameworks, such as the World Health Organization (WHO), the European Union’s AI Act, and the US Food and Drug Administration (FDA), emphasize that the implementation of AI systems in critical healthcare settings should be monitored and guided by qualified experts and supported by human oversight measures to minimize risks to health, safety, and fundamental rights.
In this scoping review, the authors analyzed recent evidence on human-AI collaboration in healthcare, focusing on assessing the effectiveness of AI across clinical tasks. Technical, human, and organizational determinants of successful collaboration. and ethics, safety, and governance requirements for responsible collaboration.
Main findings
This review included a total of 140 empirical studies published from January 1, 2015 to October 27, 2025, extracted from 17,463 records. Overall, these studies reported several benefits of human-AI collaboration in healthcare. However, the authors noted that these benefits are difficult to compare across settings.
Focusing on three key areas, our analysis reveals that the effectiveness of this collaboration is task-dependent, with trust, workflow integration, and training being key determinants of collaboration success. Notably, this analysis highlights that there remains a gap between governance expectations for human oversight and the assessments these studies assess.
Regarding the assessment of AI effectiveness across clinical tasks, the analysis revealed clear task dependencies. The effectiveness of human-AI collaboration varies depending on the task context, highlighting the need to emphasize different outcome measures. This analysis also showed that effectiveness is typically assessed using short-term, task-level metrics rather than patient or system outcomes.
Regarding the technical, human, and organizational determinants of successful collaboration, the analysis reveals that collaboration is associated with several benefits, including improved performance, faster work, and greater acceptance, depending on the system’s workflow suitability and task distribution.
The clearest and most consistently reported benefits were observed when AI was used for specific, well-defined tasks, such as prioritizing cases, highlighting areas, and drafting text, and when clinicians were held accountable for the final decisions.
The largest and most standardized evidence for human-AI collaboration has been obtained for diagnostic interpretation, whereas smaller and more diverse evidence has been obtained for task screening and prioritization, therapeutic decision making, and management or documentation tasks.
The majority of studies analyzing diagnostic interpretation report the benefits of implementing human-AI collaboration systems. Studies analyzing other clinical problem areas also reported positive findings more often than neutral or negative findings.
The analysis identified accountability and patient safety as the most frequently discussed issues among ethics, safety, and governance requirements. However, these issues are rarely considered in major assessments, highlighting the gap between policy expectations for human surveillance and what research actually tests.
significance
This review highlights the increasing importance of human-AI collaboration as a key means for the safe and effective implementation of AI-based systems in healthcare. However, despite rapid growth, the evidence base remains inconsistent in task types, study designs, and conceptualizations of collaboration.
Considering the review results, the authors recommended that the assessment of collaboration effectiveness be made more task and context specific. Effectiveness should not be considered as a single construct across clinical and administrative tasks. Future research should evaluate human-AI collaboration using outcome measures that not only reflect accuracy and efficiency, but also demonstrate workflow impact, cognitive burden, and patient and system outcomes.
Several human and organizational factors can facilitate successful human-AI collaboration, including trust alignment, interface design, workflow integration, and training. Systems that apply AI to specific, well-defined tasks while allowing clinicians to retain ultimate responsibility are most likely to benefit from collaboration.
The authors also stated that accountability and patient safety should be considered as central ethical measures in future research. Human oversight alone will not be enough unless supported by transparency, challengeability, traceability, and clear organizational governance regarding how AI impacts real-world decision-making.
As a scoping review, the authors did not perform a formal risk of bias assessment or publication bias assessment, which limits the certainty and comparability of the results and does not provide a pooled estimate of clinical effectiveness. The authors also noted that this review was limited to English-language studies, did not include a specialized gray literature search, and focused on controlled diagnostic interpretation studies that may have overrepresented positive findings.
Collectively, these findings provide a foundation for more task-specific, longitudinal, and governance-aware evaluations of human-AI collaboration in healthcare.
Click here to download your PDF copy.

