Cut Claude Code Token Costs: 4 Strategies, Real Benchmarks, and Hidden Pitfalls

This guide dissects why Claude Code sessions can waste from 3 k to 30 k tokens, explains the four key cost drivers, and provides concrete techniques—such as prompt caching, precise prompts, single‑turn queries, and infrastructure tweaks—backed by detailed token measurements and real‑world examples.

大转转FE
大转转FE
大转转FE
Cut Claude Code Token Costs: 4 Strategies, Real Benchmarks, and Hidden Pitfalls

Understanding Where Every Token Is Spent

Claude API bills input (everything you send) and output (Claude's reply) separately. Input is cheaper per token but can accumulate to large volumes. The input payload is not just the question; it includes:

System prompt : Claude Code’s built‑in system instructions.

CLAUDE.md : Project‑level configuration file.

Extension declarations : MCP / Skills / Plugins metadata.

History : All messages from the start of the session.

Tool results : Outputs from Read / Grep / Bash commands.

Current user message .

These sources can be grouped into four categories:

System side : System prompt, CLAUDE.md, extensions (fixed overhead).

History side : Accumulating conversation history (cost grows over time).

Tool side : Results from tools like Read or Grep (most volatile).

Current side : The immediate user request.

The tool side decides whether a single turn suddenly spikes, while the history side determines the gradual increase in cost.

Prompt Cache: A Hidden 50% Cost Saver

If the prefix starting from the system prompt is exactly identical to a recent request, that part of the input is charged at only 10% (read‑only cost, no recompute).

Example: a 5 k token system prompt + CLAUDE.md that hits the cache costs the equivalent of 500 tokens.

Cache hit requires three conditions, all of which must be satisfied:

Exact byte‑for‑byte prefix match.

Same model (cache is model‑isolated).

Within the TTL (default 5 minutes for Claude Code, 1 hour for Anthropic API with a higher write price).

Pitfall #1 – Pasting Large Unstructured Text

Copy‑pasting an entire 3 000‑line crash log forces Claude to filter noise itself, yielding almost the same result as a pre‑filtered 30‑line snippet but at a massive token cost.

Case A (raw log) – 35.1k input, 1.2k output, 0 cache read, 61.5k cache write Case B (grep‑filtered) – 636 input, 687 output, 25.6k cache read, 1.5k cache write The difference is roughly 55× fewer input tokens.

Pitfall #2 – Open‑Ended vs Precise Questions

Providing known context up front saves tokens. For the same project, an open‑ended query costs about three times more than a precise one.

Open‑ended: 963 input, 2.5k output, 234.7k cache read, 32.2k cache write Precise: 435 input, 1.4k output, 83.1k cache read, 29.8k cache write Cache read drops from ~235 k to ~83 k, a 3× reduction.

Pitfall #3 – Multi‑Turn Dialogues vs One‑Shot Requests

Each turn repeats the full history, so five incremental turns quickly outpace a single well‑crafted request.

Turn 1: 415 / 3.1k / 219.3k / 33.7k
Turn 2: 26 / 1.0k / 103.0k / 3.1k
Turn 3: 42 / 2.2k / 189.9k / 4.8k
Turn 4: 82 / 4.5k / 442.7k / 7.4k
Turn 5: 114 / 11.0k / 745.1k / 15.1k
Total (5 turns): 679 / 21.8k / 1.7m / 64.1k

One‑shot request: 540 / 7.6k / 443.4k / 35.6k

The multi‑turn approach consumes about 4× more cache reads (1.7 m vs 443 k) and significantly more write tokens.

Pitfall #4 – Unintentional Cache Eviction

Resuming a session after the 5‑minute TTL or switching models forces a full cache write (≈30 k tokens). Example: after claude --resume, sending hi triggers a full write.

Switching from sonnet to opus and sending hi yields 0 cache read, 63.7k cache write, illustrating the cost of model changes.

Claude’s Working Style: Read vs Grep

Reading an entire 1 500‑line file to change a single handler costs far more than grepping the relevant lines.

Read whole file: 76 input, 1.8k output, 245.4k cache read, 108.6k cache write Grep locate: 422 input, 1.0k output, 110.8k cache read, 28.7k cache write This is roughly a 4× token saving on cache writes.

Advanced Tip – Plan Mode for Large Tasks

Plan mode ( Shift+Tab) forces Claude to emit a structured plan before coding. For small, well‑defined changes the extra output cost outweighs any savings; for large, ambiguous tasks the plan can avoid tens of thousands of wasted tokens. Guideline:

Small, clear tasks → skip plan.

Large, multi‑file or uncertain tasks → use plan.

Efficiency Tool – Subagents

Offloading bulk file reads to a subagent keeps the main session lightweight. Example: Main session after reading 60 files and then 10 more actions: 3.9k input, 23.1k output, 1.7m cache read, 93.6k cache write Subagent version: 540 input, 14.8k output, 398.8k cache read, 78.6k cache write Result: about 4× fewer cache reads.

Infrastructure Hidden Costs

Generated files (e.g., src/generated/api-types.ts ) can balloon token usage if not excluded. Without deny rules: 454 input, 1.9k output, 127.2k cache read, 62.2k cache write After adding .gitignore and a .claude/settings.json deny list: 627 input, 1.8k output, 140.1k cache read, 29.4k cache write

MCP and Skills Overhead

Each enabled MCP or Skill injects its metadata into the system prompt, consuming context. Installing 50 Skills + 3 MCPs occupies nearly half the context window, adding up to ~1 M extra tokens per day.

CLAUDE.md Pitfalls

Too large : A 1 500‑line file adds ~18 k input tokens per turn; trimming to 50 lines reduces it to ~0.5 k, saving billions of tokens annually. Dynamic content : Timestamp or TODO updates change the prefix, breaking cache hits and turning a 10% cost back into full price (≈10× increase).

Best Practices for CLAUDE.md

Store static conventions, directory rules, and command snippets (keep it under ~50 lines).

Avoid timestamps, ongoing TODOs, and long‑tail edge cases that change frequently.

Conclusion – Token Bills Reflect Engineering Habits

Token consumption mirrors how you structure prompts, manage history, configure tools, and maintain project files. Regularly inspecting /cost helps you identify wasteful patterns and refine your workflow.

PromptEngineeringClaudeTokenOptimizationAIProgrammingCostManagementPromptCache
大转转FE
Written by

大转转FE

Regularly sharing the team's thoughts and insights on frontend development

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.