Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode

The article explains how to dramatically reduce token consumption in Claude Code, GitHub Copilot's Codex, and the open‑source OpenCode by tightly controlling input, trimming context, filtering files, leveraging tools, caching, and model selection, offering concrete commands, configuration files, and a ten‑step checklist that can cut usage by up to 80%.

IoT Full-Stack Technology
IoT Full-Stack Technology
IoT Full-Stack Technology
Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode

Token consumption fundamentals

Billing formula: Total cost = Input Tokens × input price + Output Tokens × output price.

Input Tokens (70‑90%) : commands, conversation history, project files, tool output, system prompts.

Output Tokens (10‑30%) : code, explanations, logs returned by the model.

Maximum black hole : automatic project file reading can consume up to 80% of Input per interaction.

Claude Code token‑saving techniques

1. File filtering with .claudeignore

Create a .claudeignore file in the project root; syntax mirrors .gitignore. Example patterns:

# Dependencies & builds (largest black holes)
node_modules/
dist/
build/
.next/
__pycache/
# Lock files / logs
*.lock
package-lock.json
*.log
# VCS / IDE
.git/
.idea/
.vscode/
# Assets / caches
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/

Effect: a single interaction drops from 150 k tokens to 60 k (≈60% reduction).

2. Context compression with /compact

Manual: run /compact at logical checkpoints (e.g., after completing a feature).

Command‑level: /compact keeps code changes and file paths, discarding analysis.

Automatic: enable via /config Auto-compact enabled. Effect: 25 k → 3 k tokens (88% saved).

3. Documentation‑driven approach with CLAUDE.md

Place a CLAUDE.md at the project root describing overview, tech stack, directory layout, and development commands. Example:

# Project Overview
Next.js 14 + TypeScript + Prisma + PostgreSQL SaaS
# Directory Structure
src/app/       # App Router
src/components/ # Components
src/lib/       # Utilities
src/server/    # Server side
# Development Commands
pnpm dev
pnpm build

Effect: reduces exploratory cat/find/grep operations, saving >30% unnecessary tokens.

4. Memory management with /memory

Store fixed information:

/memory 项目用 Next.js 14 + TypeScript,接口规范见 docs/api.md

View stored items: /memory list Delete a key: /memory delete [key] Effect: avoids repeated pasting, saving >40% repeated input.

5. Plan Mode (Shift+Tab)

Ask the AI to produce an execution plan first, confirm it, then run.

Effect: reduces trial‑and‑error, saving >20% useless tokens.

6. Output trimming

Enable tool‑output trimming via /config to strip ANSI colors, progress bars, empty lines.

Truncate long logs, keeping only error stacks and failure cases.

Effect: npm test output shrinks from 25 k to 2.5 k tokens (90% saved).

7. Model switching with /model

Simple tasks (syntax, small functions): /model haiku (lowest price).

Complex tasks (architecture, multi‑file): /model sonnet.

Ultra‑complex: /model opus (use only when necessary).

Effect: task cost reduced 30%–80%.

Codex (GitHub Copilot) token‑saving techniques

1. IDE configuration: limit max file context

In VS Code settings set GitHub Copilot → Max File Context to 3–5 files.

Effect: Input reduced >50%.

2. Prompt shortening with comments

Bad prompt:

Help me write a backend login API with Node.js + Express, JWT, password hashing, error handling

Good prompt: // Node.js Express login API JWT bcrypt Effect: Input reduced >40%.

3. Disable unnecessary features

Turn off auto‑completion and real‑time suggestions, enable only when needed.

Disable multi‑file indexing except during refactoring.

Effect: reduces background scanning token consumption.

4. File‑by‑file development

Develop one file per function, avoid large cross‑file logic; manually copy snippets when needed.

Effect: context size reduced >60%.

OpenCode (self‑hosted) token‑saving techniques

1. Precise context limits via config.json

{
  "model": {
    "name": "deepseek-v3",
    "input_limit": 128000,
    "output_limit": 80000
  }
}

Effect: fully utilizes context, avoids automatic truncation and duplicate requests, saving >30%.

2. File filtering with .opencodeignore

Same pattern list as .claudeignore to exclude dependencies, build artifacts, logs, and resource files.

3. Manual history management

Periodically run /clear to reset context and prevent multi‑task buildup.

Start a new session for each distinct function to avoid mixing histories.

Effect: avoids history bloat, saving >50% useless context.

4. Low‑cost model selection

Simple tasks: Qwen 7B, Llama 3 8B (local or cheap API).

Complex tasks: DeepSeek V3, Qwen Max (switch as needed).

Effect: unit price reduced 70%–95%.

Practical 10‑step token‑saving checklist

Create .claudeignore / .opencodeignore in the project root and copy the template patterns.

Add a CLAUDE.md file describing the tech stack, directory layout, and commands.

Enable automatic compression ( /config Auto-compact for Claude).

For long conversations, manually run /compact at logical checkpoints.

Store project configuration and conventions with /memory to avoid repeated input.

Use Plan Mode (Shift+Tab) for complex tasks, planning before execution.

Switch models per task: simple tasks use /model haiku, complex tasks use /model sonnet.

Disable unnecessary automatic features such as real‑time completion and full‑project scanning.

Separate development into distinct sessions or files to prevent history accumulation.

Regularly check token usage ( /usage) to locate black holes and adjust settings.

Key reminders

Input is the core : prioritize optimizing file reads, context size, and prompt length.

Prefer over‑exclusion : excluding files is cheaper than scanning them.

Timely cleanup : compress or clear long conversations and multi‑task histories to avoid context bloat.

Model matching : choose the appropriate model tier for each task instead of defaulting to the most powerful one.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Prompt EngineeringClaudeAI Coding AssistantCodexToken OptimizationOpenCode
IoT Full-Stack Technology
Written by

IoT Full-Stack Technology

Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.