Millions of People Trust AI Chatbots With Health Questions. New Studies Show Why That’s Super Risky

Why polished medical answers from AI can mislead patients in risky ways.

19 Mar 2026, 11:45 by Tudor Tarita · ZME Science

Credit: Pexels

If you ask an AI chatbots a health question, the replies come fast. They sound calm, polished, and authoritative. That’s exactly the problem: they’re very convincing, but they’re not always right.

A growing body of research suggests that chatbots don’t even need to hallucinate to mislead people. They can steer users wrong by giving answers that feel medically solid while missing the context that safe care requires. Recent studies published in several scientific journals suggest that these systems can appear trustworthy even when their advice is woefully unhelpful.

Real People Ask Messy Questions

At Duke University School of Medicine, Monica Agrawal and colleagues are studying how people actually use health chatbots. Their team built HealthChat-11K, a dataset of 11,000 real-world health conversations spanning 21 medical specialties, to see where these exchanges fall apart.

The answer, unsurprisingly, wasn’t in textbook-style questions; the AIs tend to do well with those. But real patients rarely ask texxtbook questions. They ask messy ones, based on assumptions that may already be wrong. They ask leading questions that quietly smuggle in a bad premise.

For instance, a patient might say, “I think I have this certain diagnosis. What are the next steps I should take for that diagnosis?” Or ask for the dosage of a drug before anyone has established that the drug is the right one. Without realizing, the user is priming the AI for a bad answer.

Agrawal said the systems are built in a way that makes this worse. “The objective is to provide an answer the user will like,” she said. “People like models that agree with them, so chatbots won’t necessarily push back.”

Then, There’s the Sycophancy

A medical chatbot can validate the user’s framing instead of questioning it.

Scientists brew the future with AI that ‘tastes’ Belgian beer to make it better
A Hobbyist Accidentally Hacked 7000 DJI Robot Vacuums Using a PlayStation Controller
French AI helps tax officials spot un-declared pools from the air
How a simple poem can trick AI models into building a bomb

In one case from the Duke research, a user asked how to perform a medical procedure at home. The chatbot warned that only professionals should do it. Then it gave step-by-step instructions anyway.

Get smarter every day...

Stay ahead with ZME Science and subscribe.

Daily Newsletter
The science you need to know, every weekday.

Weekly Newsletter
A week in science, all in one place. Sends every Sunday.
No spam, ever. Unsubscribe anytime. Review our Privacy Policy.

Thank you! One more thing...

Please check your inbox and confirm your subscription.

A clinician ideally would not handle that exchange the same way.

“When a patient comes to us with a question, we read between the lines to understand what they’re really asking,” Ayman Ali said. “We’re trained to interrogate the broader context. Large language models just don’t redirect people that way.”

That is the gap the Duke team is trying to measure. A chatbot may answer the literal question, but the real question may be ill-posed. A clinician often notices that the real question is different.

This fits a broader concern in AI research: sycophancy—the chatbot bends towards the user’s biases, instead of checking whether the premise is sound.

“Sycophancy essentially means that the model trusts the user to say correct things,” Jasper Dekoninck, a data science PhD student at ETH Zurich, told Nature. Marinka Zitnik, a biomedical informatics researcher at Harvard, warned that AI sycophancy “is very risky in the context of biology and medicine, when wrong assumptions can have real costs.”

Clinical-Sounding Misinformation

The Lancet Digital Health study tested that problem at scale. Researchers probed 20 large language models with more than 3.4 million prompts containing false medical content drawn from social media, fabricated scenarios, and real hospital discharge notes that had been edited to include one false recommendation.

Across all models and datasets, the systems accepted fabricated medical content in 31.7% of base prompts. Hospital notes with inserted false advice produced the highest failure rate: 46.1%. Social media misinformation produced a much lower rate, 8.9%.

The pattern shows that style can matter as much as substance. The models were much more likely to go along with false advice when it appeared in formal, clinical language. The paper’s authors wrote that discharge-note text “written in formal, clinical, and declarative language” produced the highest susceptibility rates across models.

That helps explain one of the study’s most disturbing examples: fake recommendations such as drinking cold milk daily for esophageal bleeding or using “rectal garlic insertion for immune support.” When bad advice looked like it belonged in a medical record, the models often treated it as credible.

Oddly enough, many classic internet persuasion tricks had the opposite effect. Logical fallacies such as appeals to popularity often made the models more skeptical, not less. The researchers argue that chatbots have learned to distrust some of the rhetorical patterns common online, but not the polished tone of clinical documentation.

In other words, they are better at spotting sketchy internet language than bad advice dressed up to sound like medicine.

Useful Tool, Bad Authority

Credit: Pexels.

The Nature Medicine study found that people using chatbots performed no better than a control group using ordinary resources such as web searches or their own judgment.

Agrawal’s advice is practical. Use a medical chatbot as a first pass, not a final answer. Check the sources it cites. Trust the underlying source, not the fluent tone of the response. A safer use case is to upload a reliable medical paper or guideline and ask the chatbot to explain it, rather than asking it to generate treatment advice from scratch.

Even Agrawal says she has used AI for quick medical information herself. That’s not surprising. These tools are convenient, and convenience is powerful. But you should be use skepticism and treat AI as a tool, not an authority.

At the end of the day, medicine isn’t just about information. It depends on context and judgment.