Why Claude’s Performance Is Dropping: Data‑Driven Insights into AI Model Degradation

Since early 2024, Claude users have reported shallower reasoning, frequent failures, and soaring token costs, and an analysis of 6,852 logs reveals a 67% drop in thinking depth, disabled plan mode, and an 80‑fold increase in API expenses, highlighting a concerning industry‑wide trend of silent AI model downgrades.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
Why Claude’s Performance Is Dropping: Data‑Driven Insights into AI Model Degradation

Observed degradation in Claude Code (Opus 4.6)

From early February 2024 developers reported shallower reasoning, more errors, and frequent stop‑hook warnings. AMD AI director Stella Laurenzo collected logs from 6 852 real‑world sessions (≈3 months) and published the dataset on GitHub (issue #42796: https://github.com/anthropics/claude-code/issues/42796).

Quantitative findings

Reasoning depth dropped by 67 % (average number of reasoning steps per request).

Code‑reading frequency fell from 6.6 × to 2.0 × per edit.

Lazy‑hook (stop‑hook) warnings triggered 173 times after 2024‑03‑08, whereas none were recorded before.

API cost increased ≈80 × because shallow outputs caused repeated retries and token spikes.

Impact on functionality

The degradation made Claude unable to handle complex engineering tasks. Its built‑in Plan Mode, which automatically generates a multi‑step plan before code changes, stopped being recognized; projects had to be rewritten manually, often twice.

Anthropic’s configuration changes

Anthropic confirmed two default‑parameter adjustments:

2024‑02‑09: the “adaptive thinking” flag was enabled by default, which reduces the maximum reasoning depth to keep latency low.

2024‑03‑03: the default effort level for Opus 4.6 was changed from “high” to “medium”, limiting the token budget allocated to internal planning and code‑reading phases.

Anthropic described the change as finding a “sweet spot” between intelligence, latency, and cost.

Technical consequences

Lower effort reduces the number of internal reasoning cycles and the amount of source‑code the model scans before generating a response. This directly explains the observed drop in code‑reading frequency and the rise in stop‑hook warnings (the model aborts earlier when it cannot allocate enough tokens for a safe plan). The higher retry rate inflates token consumption, which explains the ≈80 × cost increase.

Broader implications

The case illustrates how large‑model providers can modify core inference parameters without explicit notice, effectively altering the quality‑of‑service while keeping price and UI unchanged. For users who rely on AI for critical, multi‑step engineering work, such silent “brain‑tax” adjustments can render a previously premium offering unsuitable.

References

GitHub issue with the 6 852 session logs: https://github.com/anthropics/claude-code/issues/42796

Hacker News discussion: https://news.ycombinator.com/item?id=47660925

Twitter thread: https://x.com/om_patel5/status/2041971334553727076

large language modelsClaudeAnthropicAI performanceAI model degradation
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.