Clinical trial evaluates generative AI support tool in primary care

· News-Medical

A large real-world clinical trial has found that a generative AI-powered support tool used to support frontline clinicians was safe and improved the quality of clinical decision-making but did not significantly change short-term patient outcomes.

The study, published today in Nature Medicine is one of the first randomized controlled trials worldwide to test whether generative AI can improve patient-level outcomes, rather than just clinician performance or simulated cases.

The trial involved more than 9,600 patients attending 16 primary care clinics in Kenya, and was delivered by experts at the University of Birmingham supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre: Birmingham.

Clinicians were randomly assigned to use an electronic medical record system with or without an integrated AI consult tool that provided real-time diagnostic and treatment suggestions. The AI system, known as 'AI Consult', was a large language model–based clinical decision support tool embedded directly within the existing electronic medical record system.

During consultations, the tool worked in the background by:

  • Analysing information entered by the clinician into the medical record
  • Generating context‑specific diagnostic and treatment suggestions, aligned with Kenyan national clinical guidelines
  • Flagging potential concerns using a simple color‑coded alert system (green, yellow or red)

Clinicians retained full autonomy; they were not required to follow the AI's advice, and retained responsibility for all diagnosis, prescribing and referral decisions. The AI interface was not visible to patients, helping preserve normal patient–clinician interaction.

What we found is reassuring but also sobering. The technology appears safe and clearly improves aspects of clinical decision-making, but translating those gains into measurable patient benefit is much more challenging, particularly in everyday primary care."This is one of the first studies to rigorously ask the hardest question about AI in healthcare: whether it actually improves outcomes for patients.

Serious outcomes such as hospitalisation or death are rare in primary care, meaning extremely large studies – potentially involving more than 100,000 patients – would be needed to detect modest effects.

"What this study shows is that AI can be integrated safely into real clinical workflows, without undermining patient trust or clinician autonomy – which is a critical foundation for any future impact."

Findings: safety, quality and costs

Researchers found no statistically significant difference in treatment failure within 14 days between patients seen with AI-supported care and those receiving standard care (2.2% vs 2.0%). The study found no evidence of harm, with similar rates of hospitalisation and death in both groups.

While the AI tool did not produce measurable improvements in short-term patient outcomes, it significantly improved the quality of clinical documentation and treatment planning, as assessed by an independent panel of experienced clinicians who were blinded to whether AI had been used.

Patient satisfaction was the same in both groups, suggesting that AI support did not alter patients' experience of care.

The study also found that, although overall antibiotic prescribing rates were similar, antibiotic‑related costs were lower in the AI‑supported group, due to more cost-conscious prescribing choices.

Although the trial was conducted in Kenya, the researchers emphasize that the findings have global relevance, including for high-income health systems.

Professor Richard Riley, Professor of Biostatistics at the University of Birmingham and senior author, said: "Robust trials like this are so important to establish the real impact of using AI in practice. They help set realistic expectations of what AI can actually contribute within existing care pathways, and helps guide where future investment and research effort should be focused. Generalisability of our findings to higher-income settings, where baseline standards of care are already high, needs to be evaluated."

Source:

University of Birmingham

Journal reference: