Claude’s Official Report Reveals Three Quality Degradation Issues and Fixes

Anthropic’s recent report details three independent changes that caused quality regressions in Claude Code and Agent SDK—lowered default reasoning effort, a caching bug that erased thinking history, and an overly restrictive system prompt—each fixed by April 20 in version v2.1.116+, with plans to prevent future incidents.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Claude’s Official Report Reveals Three Quality Degradation Issues and Fixes

Anthropic investigated user‑reported quality drops in Claude Code over the past month and traced them to three independent changes affecting Claude Code, Claude Agent SDK, and Claude Cowork, while the core model and API remained unaffected. All three issues were resolved by April 20 (v2.1.116+).

Adjusting Claude Code's default reasoning effort

On March 4 the default reasoning effort for Claude Code was changed from high to medium to reduce long latency in high‑effort mode. Users reported that the lower default reduced intelligence, preferring the higher default and switching to lower effort only for simple tasks. The change impacted Sonnet 4.6 and Opus 4.6 and was rolled back on April 7, restoring xhigh effort for Opus 4.7 and high effort for other models.

A cache optimization that lost historical reasoning

On March 26 a cache‑optimization was deployed: idle sessions over one hour would clear older thinking content to lower resume cost. A bug caused the clear operation to run on every subsequent round of the same session, repeatedly discarding the thinking history. This produced forgetful and repetitive behavior, accelerated token consumption, and manifested as users reporting “memory loss” and odd tool selections. The bug was fixed in version v2.1.101 on April 10.

A system prompt change to reduce verbosity

On April 16 a new system‑prompt instruction was added to limit text between tool calls to ≤25 words and final responses to ≤100 words. Combined with other prompt tweaks, this severely hurt coding quality for Sonnet 4.6, Opus 4.6, and Opus 4.7. After internal testing failed to surface the regression, ablation studies showed a 3 % performance drop, leading to an immediate rollback on April 20.

“Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail.”

The investigation was complicated by unrelated internal experiments and edge‑case session behavior, making the bugs hard to reproduce. The caching issue intersected Claude Code’s context management, the Anthropic API, and extended thinking. A week‑long analysis, including code review with Opus 4.7, finally identified the root causes.

To avoid recurrence, Anthropic will align internal and public Claude Code builds, enhance the Code Review tool, enforce stricter controls on system‑prompt changes, run broader model‑specific evaluations, perform line‑by‑line ablations, and adopt soak periods with progressive rollouts. Guidance has been added to CLAUDE.md to limit model‑specific prompt changes.

Users can reset usage limits, submit feedback via the /feedback command, and follow updates on X @ClaudeDevs and the GitHub discussion thread.

AIquality assuranceClaudesystem promptcaching bugmodel regression
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.