How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%
The article breaks down why input tokens dominate cost (70‑90%), then details platform‑specific techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—that together can reduce Claude Code, Codex, and OpenCode token consumption by 60‑90%, with a practical 10‑step checklist.
Source: 我是程序汪
Token‑consumption principles
Cost formula: Total cost = Input Tokens × input price + Output Tokens × output price
Input Tokens (70%–90%): commands, conversation history, project files, tool outputs, system prompts
Output Tokens (10%–30%): code, explanations, logs returned by the AI
Largest black hole: automatic project‑file reading, often 80% of a single interaction’s input
Platform‑specific token‑saving methods
Claude Code (most common, largest optimisation space)
File filtering – .claudeignore : create at project root, syntax identical to .gitignore. Example content:
# Dependencies & builds (largest black hole)
node_modules/
dist/
build/
.next/
__pycache/
# Lock files / logs
*.lock
package-lock.json
*.log
# VCS / IDE
.git/
.idea/
.vscode/
# Resources / cache
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/Effect: a single interaction drops from ~150 k tokens to ~60 k (≈60% reduction).
Context compression – /compact :
Manual: invoke /compact at logical checkpoints (e.g., after completing a feature).
Command‑guided: /compact keeps code changes and file paths, discarding analysis steps.
Auto‑compression: enable with /config Auto-compact enabled, reducing 25 k tokens to 3 k (≈88% saved).
Documentation‑driven prompt – CLAUDE.md : place at project root to describe stack, directory layout, and commands, avoiding exploratory cat/find/grep calls. Example:
# Project Overview
Next.js 14 + TypeScript + Prisma + PostgreSQL SaaS
# Directory Structure
src/app/ # App Router
src/components/ # Components
src/lib/ # Utilities
src/server/ # Server code
# Development Commands
pnpm dev
pnpm buildEffect: reduces useless token usage by >30%.
Memory management – /memory :
Store:
/memory 项目用 Next.js 14 + TypeScript,接口规范见 docs/api.mdView: /memory list Delete: /memory delete [key] Effect: avoids repeated pasting of configuration, saving >40% of repeated input.
Plan mode – Shift+Tab : AI first produces an execution plan; after confirmation it runs, preventing wasted exploration. Effect: saves >20% of useless tokens.
Output trimming:
Enable "trim tool output" to drop ANSI colors, progress bars, empty lines.
Long‑output truncation: keep only error stacks and failure cases. Example: npm test output reduced from 25 k to 2.5 k tokens (≈90% saved).
Model switching – /model :
Simple tasks (syntax, small functions): /model haiku (lowest price).
Complex tasks (architecture, multi‑file): /model sonnet.
Very complex: /model opus (use only when necessary).
Effect: per‑task cost reduction of 30%–80%.
Codex (GitHub Copilot – IDE‑centric)
IDE configuration – limit context files: VS Code → Settings → GitHub Copilot → Max File Context → set to 3–5 files. Effect: input reduced by >50%.
Command simplification: replace verbose natural‑language prompts with concise comment‑driven snippets. Bad prompt:
帮我写一个用户登录的后端接口,用 Node.js + Express,包含 JWT 验证、密码加密、错误处理Good prompt: // Node.js Express 登录接口 JWT bcrypt Effect: input reduced by >40%.
Disable unnecessary features: turn off auto‑completion, real‑time suggestions, and full‑project indexing when not needed, reducing continuous scanning token consumption.
File‑by‑file development: develop one function per file, avoid large cross‑file logic, manually copy needed snippets instead of auto‑reading. Effect: context size reduced by >60%.
OpenCode (open‑source / self‑hosted, highly configurable)
Precise context limits – config.json :
{
"model": {
"name": "deepseek-v3",
"input_limit": 128000, // set according to model capability
"output_limit": 80000
}
}Effect: fully utilizes context, avoids automatic truncation and duplicate requests, saving >30%.
File filtering – .opencodeignore : same syntax as .claudeignore, excludes dependencies, builds, logs, resource files.
Context management – manual history clearing: periodically run /clear to reset context and prevent multi‑task history buildup; use separate sessions for different functionalities. Effect: saves >50% of useless context.
Model choice – low‑cost open models:
Simple tasks: Qwen 7B, Llama 3 8B (local or cheap API).
Complex tasks: DeepSeek V3, Qwen Max (switch as needed).
Effect: per‑task price drops 70%–95%.
Compact comparison of optimisation dimensions
File filtering: .claudeignore / .opencodeignore vs IDE max‑file setting – saves 60%–80%.
Context compression: /compact (Claude) vs manual /clear (OpenCode) – saves 50%–88%.
Documentation‑driven prompt: CLAUDE.md vs custom README – saves 30%–50%.
Memory solidification: /memory vs global config – saves 40%–60%.
Plan mode: Shift+Tab vs manual task breakdown – saves 20%–40%.
Output trimming: tool‑output trimming and log truncation – saves 70%–90%.
Model switching: /model haiku/sonnet/opus vs manual plugin change – saves 30%–80%.
Context upper limit management: auto‑manage via /config (Claude) vs precise config.json (OpenCode) – saves >30%.
Practical 10‑step token‑saving checklist
Create .claudeignore or .opencodeignore at the project root and copy the template.
Add a CLAUDE.md describing the tech stack, directory layout, and commands.
Enable auto‑compression (Claude: /config Auto-compact).
For long dialogs, manually run /compact at logical breakpoints.
Store project conventions with /memory to avoid repeated input.
Use Plan Mode ( Shift+Tab) for complex tasks – plan before execution.
Switch models per task: simple → /model haiku, complex → /model sonnet, very complex → /model opus.
Disable unnecessary auto‑features (real‑time suggestions, full‑project scans).
Separate development into distinct sessions or files to prevent history bloat.
Regularly check token usage ( /usage) to locate new black holes.
Key reminders
Input is the core cost driver: prioritize trimming file reads, context size, and command length.
Prefer over‑exclusion to under‑exclusion: excluded files can be pasted manually, which is cheaper than automatic scanning.
Timely cleanup: long dialogs and multi‑task sessions require compression or clearing to avoid history inflation.
Model matching: select the appropriate tier for each task instead of defaulting to the highest‑end model.
IoT Full-Stack Technology
Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
