Study identifies cost-effective strategies for using AI in health systems

· News-Medical

A study by researchers at the Icahn School of Medicine at Mount Sinai has identified strategies for using large language models (LLMs), a type of artificial intelligence (AI), in health systems while maintaining cost efficiency and performance.

The findings, published in the November 18 online issue of npj Digital Medicine [DOI: 10.1038/s41746-024-01315-1], provide insights into how health systems can leverage advanced AI tools to automate tasks efficiently, saving time and reducing operational costs while ensuring these models remain reliable even under high task loads.

"Our findings provide a road map for health care systems to integrate advanced AI tools to automate tasks efficiently, potentially cutting costs for application programming interface (API) calls for LLMs up to 17-fold and ensuring stable performance under heavy workloads," says co-senior author Girish N. Nadkarni, MD, MPH, Irene and Dr. Arthur M. Fishberg Professor of Medicine at Icahn Mount Sinai, Director of The Charles Bronfman Institute of Personalized Medicine, and Chief of the Division of Data-Driven and Digital Medicine (D3M) at the Mount Sinai Health System.

Hospitals and health systems generate massive volumes of data every day. LLMs, such as OpenAI's GPT-4, offer encouraging ways to automate and streamline workflows by assisting with various tasks. However, continuously running these AI models is costly, creating a financial barrier to widespread use, say the investigators.

The study involved testing 10 LLMs with real patient data, examining how each model responded to various types of clinical questions. The team ran more than 300,000 experiments, incrementally increasing task loads to evaluate how the models managed rising demands.

Along with measuring accuracy, the team evaluated the models' adherence to clinical instructions. An economic analysis followed, revealing that grouping tasks could help hospitals cut AI-related costs while keeping model performance intact.

The study showed that by specifically grouping up to 50 clinical tasks-;such as matching patients for clinical trials, structuring research cohorts, extracting data for epidemiological studies, reviewing medication safety, and identifying patients eligible for preventive health screenings-;together, LLMs can handle them simultaneously without a significant drop in accuracy. This task-grouping approach suggests that hospitals could optimize workflows and reduce API costs as much as 17-fold, savings that could amount to millions of dollars per year for larger health systems, making advanced AI tools more financially viable.

"Recognizing the point at which these models begin to struggle under heavy cognitive loads is essential for maintaining reliability and operational stability. Our findings highlight a practical path for integrating generative AI in hospitals and open the door for further investigation of LLMs' capabilities within real-world limitations," says Dr. Nadkarni.

One unexpected finding, say the investigators, was how even advanced models like GPT-4 showed signs of strain when pushed to their cognitive limits. Instead of minor errors, the models' performance would periodically drop unpredictably under pressure.

Next, the research team plans to explore how these models perform in real-time clinical environments, managing real patient workloads and interacting directly with health care teams. Additionally, the team aims to test emerging models to see if cognitive thresholds shift as technology advances, working toward a reliable framework for health care AI integration. Ultimately, they say, their goal is to equip health care systems with tools that balance efficiency, accuracy, and cost-effectiveness, enhancing patient care without introducing new risks.

Source:

Mount Sinai Health System

Journal reference: