Anthropic admits it dumbed down Claude when trying to make it smarter

System changes and bugs overlapped to create the impression of general decline

by · The Register

Claude users who complained about the AI service producing lower-quality responses over the past month weren’t imagining it.

Anthropic on Thursday published the results of a company investigation that found three distinct changes in March and April made things worse for customers using Claude Code, the Claude Agent SDK, and Claude Cowork.

Claude's API, the company says, was not affected.

Claude users complained bitterly about the quality of Claude's output during March and April, and service availability problems only made matters worse.

Anthropic insists it didn't degrade its models intentionally. Rather, several adjustments went awry and those missteps created the perception of creeping AI incompetency.

First, on March 4, Anthropic adjusted Claude Code's default reasoning effort level from high to medium. Effort level controls how much effort the model puts into a particular reasoning task. Anthropic hoped the change it made would reduce the latency that followed from longer periods of cogitation.

"This was the wrong tradeoff," the company said. "We reverted this change on April 7 after users told us they'd prefer to default to higher intelligence and opt into lower effort for simple tasks."

Presumably, turning down the default effort level on Opus 4.6 and Sonnet 4.6 would also have lightened the inference burden – models would "think" less and consume fewer tokens, using limited capacity more judiciously.

The latest Claude Code build, v2.1.118, defaults to "xhigh" on Sonnet 4.6.

Anthropic’s second misfire was a bug introduced on March 26 when a cache optimization change ended up clearing cached session data with every turn of the prompt and response cycle.

Claude caches input tokens for an hour, which benefits the user by making sequential API calls faster and cheaper. Company engineers decided they wanted to clear output tokens (thinking sessions) for users who were idle for an hour, since the cache would not be used after that much time.

Anthropic’s motive for the change was to reduce the cost of resuming a session by disposing of old thinking traces that would no longer be relevant. Instead, engineers – Claude? – introduced a bug that cleared thinking sessions with each turn. The result was that Claude became "forgetful and repetitive." This was fixed April 10 for Sonnet 4.6 and Opus 4.6.

Third, on April 16 Anthropic revised its system prompt, among other measures, in an effort to make Claude models less verbose. The added passage sounds harmless:

"Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail."

Following several weeks of internal testing, model quality evaluations suggested the change was safe. But after shipping the amended system prompt in conjunction with the release of Opus 4.7, subsequent ablation tests – which involve removing system prompt instructions to measure the effect of their absence – revealed a three percent performance drop for both Opus 4.6 and 4.7. The relevant system prompt adjustment was reverted on April 20.

Anthropic is promising it will conduct more internal tests for future public builds of Claude Code, improvements in its Code Review tool, better evaluation of system prompt changes, and a new @ClaudeDevs account on social media site X "to give us the room to explain product decisions and the reasoning behind them in depth."

This only a day after head of growth Amol Avasare took to X to explain an unannounced A/B test and said the company would try to communicate more directly, so people don't have to hear about issues through social media channels like X and Reddit.

To help customers rediscover the state of being comfortably numb, the AI reset account usage levels for everyone.

"This isn't the experience users should expect from Claude Code," the company said. ®