Human-AI teams improve healthcare only when clinicians stay in control

by · News-Medical

A major review finds that AI can help clinicians work faster and more accurately, but only when systems are built around real clinical workflows, calibrated trust, and clear accountability.

Study: Human-AI Collaboration in Healthcare: A Scoping Review. SWKStock / Shutterstock

A new scoping review available as an article in press in the journal npj Digital Medicine discusses recent evidence on the utility of human-artificial intelligence (AI) collaboration in healthcare.

Background

The application of artificial intelligence (AI) in healthcare is rapidly increasing across clinical tasks, including medical documentation, triage and task prioritization, image interpretation, and care coordination.

However, in healthcare settings involving critical decision-making, the usefulness of AI cannot be assessed solely by comparing AI system performance with that of clinicians. A collaboration between humans and AI systems that works under meaningful human supervision is therefore required in healthcare settings where patient safety, professional accountability, and context-specific decision-making matter the most.

Health-relevant policy and regulatory frameworks, including the World Health Organization (WHO), the European Union’s AI Act, and the U.S. Food and Drug Administration (FDA), have emphasized that implementation of AI systems in critical healthcare settings should be monitored and guided by qualified professionals and supported by human-oversight measures in order to minimize risks to health, safety, and fundamental rights.

In this scoping review, the authors analyzed recent evidence on human-AI collaboration in healthcare, focusing on the evaluation of AI effectiveness across clinical tasks; the technical, human, and organizational determinants of successful collaboration; and the ethical, safety, and governance requirements for an accountable collaboration.

Key Observations

The review included a total of 140 empirical studies published between 1 January 2015 and 27 October 2025, drawn from 17,463 records. Overall, these studies reported several benefits of human-AI collaboration in healthcare; however, the authors noted that these benefits are hard to compare across settings.

The analysis, focusing on three main domains, revealed that the effectiveness of this collaboration is task-dependent and that trust, workflow integration, and training are major determinants of successful collaboration. Notably, the analysis highlighted a persistent gap between governance expectations for human oversight and the evaluations these studies assess.

Regarding the evaluation of AI effectiveness across clinical tasks, the analysis revealed clear task dependence: the effectiveness of human-AI collaboration varies across task contexts, highlighting the need to emphasize different outcome measures. The analysis also indicated that the effectiveness was usually assessed using short-term task-level metrics rather than patient or system outcomes.

Regarding the technical, human, and organizational determinants of successful collaboration, the analysis revealed that collaboration is associated with several benefits, including performance enhancement, faster work, and greater acceptance, depending on the system’s workflow fit and task distribution.

The clearest and most consistently reported benefits were observed when AI was used for specific, well-bounded tasks, such as prioritizing cases, highlighting areas, or drafting text, and clinicians retained responsibility for the final decision.

The largest and most standardized evidence on human-AI collaboration was obtained for diagnostic interpretation, while smaller and more heterogeneous evidence was obtained for screening and prioritizing tasks, therapeutic decision-making, and administrative or documentation tasks.

The majority of studies analyzing diagnostic interpretation reported benefits of implementing human-AI collaboration systems. Studies analyzing other clinical task domains also frequently reported more positive than neutral or negative findings.

Among ethical, safety, and governance requirements, the analysis identified accountability and patient safety as the most frequently discussed matters. However, these matters were rarely considered in the main evaluation, highlighting the gap between policy expectations for human oversight and what studies actually test.

Significance

The review highlights the increasing prominence of human-AI collaboration as an important mode for safe and effective implementation of AI-based systems in healthcare. However, despite rapid growth, the evidence base remains inconsistent across task types, study designs, and conceptualizations of collaboration.

Given the review findings, the authors recommended that the evaluation of collaboration effectiveness be more task- and context-specific. Effectiveness should not be considered as a single construct across clinical and administrative tasks. Future studies should evaluate human-AI collaboration using outcome measures that not only reflect accuracy and efficiency but also demonstrate workflow impact, cognitive burden, and patient and system outcomes.

Several human and organizational factors, including trust calibration, interface design, workflow integration, and training, potentially drive the success of human-AI collaboration. Systems that allow clinicians to retain final responsibility while applying AI to specific, well-bounded tasks are most likely to benefit from collaboration.

The authors also stated that accountability and patient safety must be considered as core ethical measures in future studies. Human supervision alone is insufficient unless it is supported by transparency, contestability, traceability, and clear organizational governance over how AI influences decisions in practice.

As a scoping review, the authors did not conduct a formal risk-of-bias or publication bias assessment, which limits the certainty and comparability of the findings and does not provide a pooled estimate of clinical effectiveness. The authors also noted that the review was restricted to English-language studies, did not include a dedicated grey-literature search, and focused on controlled diagnostic interpretation studies, which may over-represent positive findings.

Overall, these findings provide a foundation for more task-specific, longitudinal, and governance-aware evaluation of human-AI collaboration in healthcare.

Download your PDF copy by clicking here.

Journal reference: