How Fable 5 Refused All 200 Questions Yet Still Ranked First on the Toughest AI Coding Benchmark
Claude Fable 5’s newly added safety guardrails silently downgrade its answers, causing it to refuse every ProgramBench task and score zero, yet the model still tops the benchmark leaderboard, highlighting a paradox between model capability, safety restrictions, and practical usability.
When Anthropic released Claude Fable 5, the model achieved an impressive 80.3% score on SWE‑Bench Pro, prompting Andrej Karpathy to call for a major version bump and enabling Stripe to complete a massive Ruby code migration in a single day.
However, the model’s built‑in safety guardrail immediately sparked controversy. According to Anthropic’s system card, Fable 5 detects when a user is engaged in frontier AI research—such as training pipelines, distributed training infrastructure, or ML accelerator design—and silently reduces its answer quality without notifying the user, effectively falling back to an Opus 4.8‑level response.
Anthropic implemented this behavior with prompt modifications and vector‑based redirection, making the degradation invisible to users. After community backlash, Anthropic announced a policy change: the model would still downgrade but would now explicitly warn users and switch to Opus 4.8 when the safety filter triggers.
The most striking symptom of this “transparent” downgrade appears on the ProgramBench benchmark, created by the SWE‑Bench authors to reconstruct source code from compiled binaries. ProgramBench classifies the binary‑reconstruction task as a security‑sensitive operation, triggering Fable 5’s safety classifier. As a result, the model rejected all 200 questions, scoring 0% completion.
Despite the complete refusal, ProgramBench’s aggregated ranking still placed Fable 5 at the top, making it the first instance in AI evaluation history where a model that answered nothing secured the leading position.
Technical analysis of the guardrail reveals a two‑stage architecture: a real‑time probe monitors internal activations and flags suspicious traffic, then forwards the request to an independently trained LLM classifier for final decision. This system blocks domains including cybersecurity, biochemistry, and the aforementioned frontier AI tasks. For example, on the Terminal‑Bench 2.1 suite, about 20.9% of test cases were intercepted and fell back to Opus 4.8.
Vals AI’s independent testing confirmed that Fable 5’s refusal rate is especially high on security‑related queries, forcing the fallback model to handle those tasks. The coarse granularity of the safety filter also blocks legitimate binary‑reversal exercises, which are common in programming education and security research.
Beyond ProgramBench, UC Berkeley’s RDI lab evaluated Fable 5 with their new ALE (Agents’ Last Exam) benchmark, covering 55 professions and over 1,500 real‑world work scenarios. Fable 5 scored 22.0%, second only to GPT‑5.5 (24.0%), but its average cost per question was $15.70—about four times higher than GPT‑5.5’s $3.80.
In the hardest ALE tier, all frontier agents—including Fable 5—failed to solve any task, yielding a 0% pass rate. The article concludes that the paradox of stronger models facing tighter guardrails leads to reduced usability, and Anthropic’s situation exemplifies the industry‑wide tension between capability and safety.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
