the claude logo

Anthropic's most powerful model comes with a kill switch aimed at you

by · Boing Boing

Anthropic released its most capable model yet today, and it ships with a downgrade switch pointed at the user. Claude Fable 5 is "state-of-the-art on nearly all tested benchmarks," but when its classifiers detect a question about cybersecurity, biology, chemistry, or model "distillation," your request gets rerouted to a weaker model, Opus 4.8. Anthropic says this happens in "less than 5% of sessions" and admits the filters are tuned to "sometimes catch harmless requests."

The full-strength version, Claude Mythos 5 — the same model with the cyber guardrails removed — isn't for the unwashed masses. It goes first to US-government-aligned "cyberdefenders" under something called Project Glasswing, which Anthropic calls the model with "the strongest cybersecurity capabilities of any model in the world."

The company is also now keeping 30 days of all customer traffic to hunt for jailbreaks.

Previously: