Industry Insights 15 min read

Anthropic Postmortem: Claude Code Decline Due to Product‑Layer Changes

Anthropic’s detailed postmortem explains that recent user‑perceived declines in Claude Code’s reasoning depth, context retention, and response length stemmed from three product‑layer adjustments—a lowered default reasoning effort, a caching bug that repeatedly cleared thinking, and an overly restrictive system prompt—rather than any degradation of the underlying model itself.

Design Hub
Design Hub
Design Hub
Anthropic Postmortem: Claude Code Decline Due to Product‑Layer Changes

Anthropic published a comprehensive postmortem describing why users reported that Claude Code had become "dumber," "shorter," and more "forgetful." The company concluded that the experience drop was not caused by model regression but by three separate product‑layer changes.

This Is What Happened

The three issues affected how long Claude can think, how much context it remembers, and how long its answers are.

Reduced default reasoning effort

Cache‑clearing bug that erased prior thinking

System prompt limiting verbosity

Combined, these changes made a previously stable assistant think less, lose more of its conversation history, and produce shorter replies, leading users to feel something was wrong.

First Change: Lowered Default Reasoning Effort

On March 4, Anthropic changed Claude Code’s default reasoning effort from high to medium to reduce latency and token usage. The trade‑off was understandable: high effort sometimes caused long wait times that appeared as a frozen UI. Internally, tests showed a modest intelligence loss but a clear speed gain.

However, many users value the model’s willingness to think longer. When the default was lowered, most did not manually raise the setting, so they perceived a drop in intelligence. The change was rolled back on April 7, restoring xhigh for Opus 4.7 and high for other models.

Overview of the three problem categories
Overview of the three problem categories

Second Change: Cache‑Optimization Bug

On March 26, a cache‑optimization intended to speed up long‑idle sessions introduced a bug. When a session was idle for over an hour, the system began repeatedly clearing old "thinking" tokens on every subsequent turn, not just once.

This caused the symptoms users reported: forgotten earlier statements, repeated actions, odd tool calls, and a sense that Claude was losing track of its own reasoning. The bug also increased cache misses, raising request costs and exhausting usage limits faster. The bug was fixed on April 10.

Diagram of continuous thinking clearance
Diagram of continuous thinking clearance

Third Change: Overly Restrictive System Prompt

When Opus 4.7 launched on April 16, Anthropic added a system prompt limiting tool‑call text to 25 words and final answers to 100 words. While intended to curb verbosity, the prompt, combined with other adjustments, reduced evaluation scores by about 3%.

The prompt was removed on April 20 after broader ablation testing showed it compressed both reasoning and expression space, harming tasks like code generation that benefit from more detailed explanations.

Anthropic’s effort explanation in the product UI
Anthropic’s effort explanation in the product UI

Why Users Perceived Overall Degradation

The three changes impacted different user groups at different times, creating a fragmented experience where some felt Claude was shorter, dumber, more forgetful, or that usage limits were draining faster. Internal testing initially failed to reproduce the issues because other unrelated experiments masked the bugs.

Community Reaction and Lessons

Anthropic’s follow‑up tweets acknowledged the problems, clarified that the base model and API were unaffected, and emphasized that the issues lay in the harness layer and system prompts. This admission highlighted a broader industry truth: modern LLM products are shaped not only by model weights but also by default parameters, prompt engineering, caching, and context management.

"The problem lies in Claude Code and the Agent SDK harness layer; the model itself has not regressed, and the Claude API is unaffected."

For developers and product teams, the postmortem underscores the importance of treating prompt changes as high‑risk, aligning internal and external versions, and conducting thorough, incremental evaluations before releasing product‑layer adjustments.

Anthropic’s Planned Remediation

Align internal and public Claude Code versions for all employees.

Improve the internal Code Review tool.

Introduce stricter testing, ablation, and audit processes for system prompt changes.

Extend observation periods and broaden evaluation for any change that could affect intelligence.

Reset usage limits for all subscribed users.

These steps aim to close the gap between internal experimentation and the experience of real users, acknowledging that product‑layer decisions can dramatically affect perceived AI quality.

Conclusion

The Anthropic postmortem demonstrates that apparent "model degradation" often originates from product‑layer modifications. Understanding and transparently communicating these layers is crucial for building trustworthy AI products.

LLMprompt engineeringpostmortemAnthropicClaude CodeAI product engineering
Design Hub
Written by

Design Hub

Periodically delivers AI‑assisted design tips and the latest design news, covering industrial, architectural, graphic, and UX design. A concise, all‑round source of updates to boost your creative work.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.