Anthropic Postmortem: Claude Code Decline Due to Product‑Layer Changes
Anthropic’s detailed postmortem explains that recent user‑perceived declines in Claude Code’s reasoning depth, context retention, and response length stemmed from three product‑layer adjustments—a lowered default reasoning effort, a caching bug that repeatedly cleared thinking, and an overly restrictive system prompt—rather than any degradation of the underlying model itself.
Anthropic published a comprehensive postmortem describing why users reported that Claude Code had become "dumber," "shorter," and more "forgetful." The company concluded that the experience drop was not caused by model regression but by three separate product‑layer changes.
This Is What Happened
The three issues affected how long Claude can think, how much context it remembers, and how long its answers are.
Reduced default reasoning effort
Cache‑clearing bug that erased prior thinking
System prompt limiting verbosity
Combined, these changes made a previously stable assistant think less, lose more of its conversation history, and produce shorter replies, leading users to feel something was wrong.
First Change: Lowered Default Reasoning Effort
On March 4, Anthropic changed Claude Code’s default reasoning effort from high to medium to reduce latency and token usage. The trade‑off was understandable: high effort sometimes caused long wait times that appeared as a frozen UI. Internally, tests showed a modest intelligence loss but a clear speed gain.
However, many users value the model’s willingness to think longer. When the default was lowered, most did not manually raise the setting, so they perceived a drop in intelligence. The change was rolled back on April 7, restoring xhigh for Opus 4.7 and high for other models.
Second Change: Cache‑Optimization Bug
On March 26, a cache‑optimization intended to speed up long‑idle sessions introduced a bug. When a session was idle for over an hour, the system began repeatedly clearing old "thinking" tokens on every subsequent turn, not just once.
This caused the symptoms users reported: forgotten earlier statements, repeated actions, odd tool calls, and a sense that Claude was losing track of its own reasoning. The bug also increased cache misses, raising request costs and exhausting usage limits faster. The bug was fixed on April 10.
Third Change: Overly Restrictive System Prompt
When Opus 4.7 launched on April 16, Anthropic added a system prompt limiting tool‑call text to 25 words and final answers to 100 words. While intended to curb verbosity, the prompt, combined with other adjustments, reduced evaluation scores by about 3%.
The prompt was removed on April 20 after broader ablation testing showed it compressed both reasoning and expression space, harming tasks like code generation that benefit from more detailed explanations.
Why Users Perceived Overall Degradation
The three changes impacted different user groups at different times, creating a fragmented experience where some felt Claude was shorter, dumber, more forgetful, or that usage limits were draining faster. Internal testing initially failed to reproduce the issues because other unrelated experiments masked the bugs.
Community Reaction and Lessons
Anthropic’s follow‑up tweets acknowledged the problems, clarified that the base model and API were unaffected, and emphasized that the issues lay in the harness layer and system prompts. This admission highlighted a broader industry truth: modern LLM products are shaped not only by model weights but also by default parameters, prompt engineering, caching, and context management.
"The problem lies in Claude Code and the Agent SDK harness layer; the model itself has not regressed, and the Claude API is unaffected."
For developers and product teams, the postmortem underscores the importance of treating prompt changes as high‑risk, aligning internal and external versions, and conducting thorough, incremental evaluations before releasing product‑layer adjustments.
Anthropic’s Planned Remediation
Align internal and public Claude Code versions for all employees.
Improve the internal Code Review tool.
Introduce stricter testing, ablation, and audit processes for system prompt changes.
Extend observation periods and broaden evaluation for any change that could affect intelligence.
Reset usage limits for all subscribed users.
These steps aim to close the gap between internal experimentation and the experience of real users, acknowledging that product‑layer decisions can dramatically affect perceived AI quality.
Conclusion
The Anthropic postmortem demonstrates that apparent "model degradation" often originates from product‑layer modifications. Understanding and transparently communicating these layers is crucial for building trustworthy AI products.
Design Hub
Periodically delivers AI‑assisted design tips and the latest design news, covering industrial, architectural, graphic, and UX design. A concise, all‑round source of updates to boost your creative work.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
