Why Claude Code’s Context Caching Suddenly Fails and Costs Skyrocket
After Claude Code was updated to version 2.1.37, developers observed a sharp drop in context‑caching hit rates, causing unexpected cost spikes and slower responses, and a community investigation revealed that random headers and spaces injected by the tool break the model’s cache matching.
Developers using Claude Code reported that, despite enabling Context Caching (which should reduce latency and cost), their expenses suddenly surged and response times slowed.
Normally, a long‑conversation coding session benefits from cache reuse: the first request transmits the full context, and subsequent requests can reuse it, yielding hit rates above 90% for services such as PackyAPI. The article likens this to ordering coffee—once the barista knows your preference, the next order is instant and cheaper.
When Claude Code was upgraded to version 2.1.37, the cache hit rate collapsed dramatically, with some users seeing rates as low as 30% or even 0%. Consequently, every dialogue was treated as a brand‑new session, forcing the system to resend the entire code background each time and inflating costs.
Twitter user @玩个锤子 captured the network traffic and discovered that Claude Code randomly injects a string into the request’s system field, for example:
{
"text": "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode; cch=xxxxx;"
}The cch value changes on every request. Because the large‑model cache requires an exact match of the preceding context, this random token prevents the hash from matching, causing cache misses and extra computation.
Furthermore, when the model_name is set to a third‑party model such as Kimi or DeepSeek, Claude Code deliberately inserts random spaces into the system prompt, further breaking cache consistency.
Two explanations are offered: an accidental bug where developers added a tracking ID without realizing it would break caching, or an intentional design to penalize users who switch to cheaper models, effectively degrading their experience.
The open‑source community responded quickly: major third‑party API forwarding services filtered out the offending x-anthropic-billing-header and the random spaces, and users can also manually strip the field. As a result, normal caching behavior is restored.
This incident underscores the importance of transparent tooling in an open ecosystem; hidden modifications that sabotage caching can quickly erode trust and increase costs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
