ChatCPR instructor delivers near-perfect CPR guidance in proof-of-concept study
by Dr. Priyom Bose, Ph.D. · News-MedicalA new AI-powered CPR instructor achieved near-perfect guideline adherence in simulated and retrospective 911-call testing, suggesting a scalable tool that could support bystanders during cardiac arrest while still requiring real-world clinical validation.
Study: An Artificial Intelligence–Enabled Cardiopulmonary Resuscitation Instructor. Image Credit: Tualek Photographer / Shutterstock
AI CPR Instruction Background
Each year in the US, there are approximately 350,000 out-of-hospital cardiac arrests (OHCAs), with survival rates at just 9%. Although bystander cardiopulmonary resuscitation (CPR) substantially increases survival, only 41.7% of OHCA cases receive bystander CPR, largely due to inadequate training and a lack of intervention support. While 65% of adults report having received CPR training, only 18% received training in the past 2 years, and 2% in the past year.
Dispatcher-assisted CPR (T-CPR), introduced in the 1980s, increases 30-day survival odds by 60% compared with no bystander CPR, but it is less effective than immediate bystander-initiated CPR. Major barriers to the rapid identification of cardiac arrest and to the delivery of etiology-specific protocols remain. Dispatchers take a median of 75 seconds to recognize OHCA, with chest compressions commencing at a median of 176 seconds after call initiation, primarily due to delays in recognition and in issuing instructions.
Delays in cardiac arrest recognition and variability in dispatcher performance can hinder timely, effective intervention. Furthermore, current systems lack the ability to rapidly adapt to evolving guidelines or consistently deliver tailored, high-quality instructions to diverse populations. AI, especially large language models, provides an opportunity to address these gaps by enabling real-time, standardized, and continuously updatable CPR guidance. By leveraging AI, it may be possible to improve both the speed and quality of bystander intervention, potentially supporting better outcomes in out-of-hospital cardiac arrest.
ChatCPR Simulation Study Design
The current study evaluated six publicly available AI models for their ability to deliver guideline-based CPR instructions in simulated OHCA scenarios. Researchers also developed and preliminarily evaluated ChatCPR, an AI CPR instructor, benchmarking it against human dispatchers using both simulated and archival 911 calls.
CPR instruction quality was measured using a 27-point checklist distilled from professional guidelines, notably the 2024 American Heart Association and American Academy of Pediatrics recommendations. Conditional logic was applied, for example, to assess whether instructors used automated external defibrillators (AEDs) appropriately. Validity was supported through expert review, adherence to best practices, and interrater reliability. Criteria were grouped as 13 minimally viable core actions and the full 27-item maximally effective checklist, representing optimal quality.
The six AI models were tested across five emergency scenarios, including four with confirmed cardiac arrest and one non-arrest scenario, to assess both instruction delivery and the ability to withhold CPR when not indicated. Scripted bystander lines were sequentially submitted to each model via application programming interface (API), with a standard prompt directing step-by-step CPR guidance.
ChatCPR was built using the open-source Llama 3.3 70B Instruct Turbo model for transparency, reproducibility, and scalability. The system was configured with T-CPR training materials for 911 dispatchers, reviewed and refined by physician authors. Prompts were iteratively optimized for clarity and logical sequencing. Performance was repeatedly tested and refined, then guideline adherence was compared to that of human dispatchers using 12 publicly available, transcribed 911 calls in which CPR guidance was requested and provided.
Guideline-Adherent CPR Performance Findings
Interannotator reliability across both simulated and actual 911 T-CPR scenarios was exceptionally high. Among six top generative AI models, the average adherence to minimally viable CPR criteria was 89.7%. Claude and Grok achieved the highest scores at 97.1%, followed by GPT-4o at 94.1%; Gemini trailed at 79.4%. For maximally effective criteria, overall adherence dropped to 69.8%, with Llama lowest at 61.3%, while Claude, Grok, and GPT-4o led with 73.8%, 73.8%, and 75.0%, respectively.
In identical test scenarios, the CPR-focused instructional agent achieved perfect adherence to both criteria sets. Compared to its foundation model, Llama 3.3, it showed a 1.2-fold relative performance gain for minimally viable criteria and a 1.6-fold gain for maximally effective criteria. The greatest gains were observed in AED instruction, CPR positioning, and pediatric CPR technique. The agent also exceeded the top baseline AI by 1.4-fold in terms of maximally effective criteria, consistently providing comprehensive, guideline-compliant instructions. These results highlight that targeted training and customization can substantially improve AI performance for specific clinical tasks.
Dispatcher Comparison and 911 Call Results
In 12 real-world 911 CPR calls, the agent averaged 100% for minimally viable and 98.9% for maximally effective criteria, missing only one item related to the recommended sequence of patient assessment. This finding supports the agent’s performance on held-out real 911 call transcripts, although prospective real-world validation is still needed.
Human dispatchers achieved 84.5% for the minimally viable criteria and 62.8% for the maximally effective criteria. The instructional agent outperformed dispatchers by 15.5 and 36.1 percentage points, corresponding to relative improvements of 20% and 60%, respectively. This indicates that the instructional agent not only surpasses generic AI models but also outperforms trained human dispatchers on checklist-based guideline adherence, particularly for advanced CPR guidance.
The agent’s largest advantages over dispatchers were in patient responsiveness checks, initial compression instructions, compression quality, AED use, full recoil, and correct CPR positioning. In all direct comparisons, evaluators rated the agent higher than dispatchers for both criteria sets during 911 calls.
AI CPR Public Health Implications
An early evaluation of the newly developed AI CPR instructor agent found that it delivered guideline-based instructions that surpassed those of dispatcher-assisted coaching in retrospective, checklist-based comparisons. These results indicate that on-demand, accurate AI coaching could help close critical gaps in bystander readiness during OHCA events. However, the study was text-based and retrospective, and did not test live bystander behavior, survival outcomes, false-positive CPR instructions in non-arrest emergencies, or performance under real-world stress, noise, and connectivity constraints. Ongoing development, clinical validation, and strategic integration into public health will be vital to realizing this technology’s potential to improve survival outcomes.
Download your PDF copy by clicking here.
Journal reference:
- Desai, N. et al. (2026) An Artificial Intelligence–Enabled Cardiopulmonary Resuscitation Instructor. JAMA Internal Medicine. doi:10.1001/jamainternmed.2026.1552, https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2848650