Cut Claude Code’s Fluff with 8 Lines: Slash Output Tokens by 63%
By adding an eight‑line CLAUDE.md file that suppresses polite openings, repetitions, and unnecessary explanations, developers reduced Claude Code’s output token count by 63% without losing information, achieving up to 75% shorter code reviews and 64% shorter concept explanations, as verified by independent benchmarks.
Claude Code inserts polite openings such as “Sure!” or “Great question!”, repeats the user’s query, and ends with “I hope this helps!”. These fillers consume tokens without adding information.
Developer Drona Gangarapu placed an eight‑line CLAUDE.md file in the project root to suppress the fluff. The file contains the following rules:
1. Think before acting. Read existing files before writing code.
2. Be concise in output but thorough in reasoning.
3. Prefer editing over rewriting whole files.
4. Do not re‑read files you have already read.
5. Test your code before declaring done.
6. No sycophantic openers or closing fluff.
7. Keep solutions simple and direct.
8. User instructions always override this file.Measured reductions after applying the rules:
Code‑review output: 120 words → 30 words (‑75%).
Concept explanations: 180 words → 65 words (‑64%).
Correction of factual errors: 55 words → 20 words (‑64%).
Total output‑token usage dropped by roughly 63% while preserving the information content.
Independent third‑party testing on three coding challenges—CSV Reporter, SQLite window functions, and a WebSocket counter—showed that the v8 configuration (seven of the rules plus a budget of 20 tool calls) reduced cost by 17.4% compared with the previously optimal C‑structured approach.
The CLAUDE.md file itself must be supplied as input tokens on every conversation turn. Consequently, net token savings appear only in high‑frequency scenarios such as agent loops, automated pipelines, or workloads exceeding about 100 prompts per day; occasional interactions can become more expensive.
The trade‑off is a substitution of input tokens for output tokens, which must be evaluated when token quotas are tight (e.g., a 2026‑token cap).
Multiple profiles are provided:
Generic version
Coding‑benchmark version
Development‑project version
Agent‑pipeline version
Data‑analysis version
The most aggressive v8 profile limits tool calls to 20, forcing Claude to plan ahead rather than iteratively trial‑and‑error, making it suitable for simple, cost‑sensitive tasks.
Project repository: https://github.com/drona23/claude-token-efficient
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
