Artificial Intelligence 11 min read

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

The article breaks down why input tokens dominate cost (70‑90%), then details platform‑specific techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—that together can reduce Claude Code, Codex, and OpenCode token consumption by 60‑90%, with a practical 10‑step checklist.

IoT Full-Stack Technology

Apr 25, 2026

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

Source: 我是程序汪

Token‑consumption principles

Cost formula: Total cost = Input Tokens × input price + Output Tokens × output price

Input Tokens (70%–90%): commands, conversation history, project files, tool outputs, system prompts

Output Tokens (10%–30%): code, explanations, logs returned by the AI

Largest black hole: automatic project‑file reading, often 80% of a single interaction’s input

Platform‑specific token‑saving methods

Claude Code (most common, largest optimisation space)

File filtering – .claudeignore : create at project root, syntax identical to .gitignore. Example content:

# Dependencies & builds (largest black hole)
node_modules/

dist/
build/
.next/
__pycache/

# Lock files / logs
*.lock
package-lock.json
*.log

# VCS / IDE
.git/
.idea/
.vscode/

# Resources / cache
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/

Effect: a single interaction drops from ~150 k tokens to ~60 k (≈60% reduction).

Context compression – /compact :

Manual: invoke /compact at logical checkpoints (e.g., after completing a feature).

Command‑guided: /compact keeps code changes and file paths, discarding analysis steps.

Auto‑compression: enable with /config Auto-compact enabled, reducing 25 k tokens to 3 k (≈88% saved).

Documentation‑driven prompt – CLAUDE.md : place at project root to describe stack, directory layout, and commands, avoiding exploratory cat/find/grep calls. Example:

# Project Overview
Next.js 14 + TypeScript + Prisma + PostgreSQL SaaS

# Directory Structure
src/app/       # App Router
src/components/ # Components
src/lib/       # Utilities
src/server/    # Server code

# Development Commands
pnpm dev
pnpm build

Effect: reduces useless token usage by >30%.

Memory management – /memory :

Store:

/memory 项目用 Next.js 14 + TypeScript，接口规范见 docs/api.md

View: /memory list Delete: /memory delete [key] Effect: avoids repeated pasting of configuration, saving >40% of repeated input.

Plan mode – Shift+Tab : AI first produces an execution plan; after confirmation it runs, preventing wasted exploration. Effect: saves >20% of useless tokens.

Output trimming:

Enable "trim tool output" to drop ANSI colors, progress bars, empty lines.

Long‑output truncation: keep only error stacks and failure cases. Example: npm test output reduced from 25 k to 2.5 k tokens (≈90% saved).

Model switching – /model :

Simple tasks (syntax, small functions): /model haiku (lowest price).

Complex tasks (architecture, multi‑file): /model sonnet.

Very complex: /model opus (use only when necessary).

Effect: per‑task cost reduction of 30%–80%.

Codex (GitHub Copilot – IDE‑centric)

IDE configuration – limit context files: VS Code → Settings → GitHub Copilot → Max File Context → set to 3–5 files. Effect: input reduced by >50%.

Command simplification: replace verbose natural‑language prompts with concise comment‑driven snippets. Bad prompt:

帮我写一个用户登录的后端接口，用 Node.js + Express，包含 JWT 验证、密码加密、错误处理

Good prompt: // Node.js Express 登录接口 JWT bcrypt Effect: input reduced by >40%.

Disable unnecessary features: turn off auto‑completion, real‑time suggestions, and full‑project indexing when not needed, reducing continuous scanning token consumption.

File‑by‑file development: develop one function per file, avoid large cross‑file logic, manually copy needed snippets instead of auto‑reading. Effect: context size reduced by >60%.

OpenCode (open‑source / self‑hosted, highly configurable)

Precise context limits – config.json :

{
  "model": {
    "name": "deepseek-v3",
    "input_limit": 128000, // set according to model capability
    "output_limit": 80000
  }
}

Effect: fully utilizes context, avoids automatic truncation and duplicate requests, saving >30%.

File filtering – .opencodeignore : same syntax as .claudeignore, excludes dependencies, builds, logs, resource files.

Context management – manual history clearing: periodically run /clear to reset context and prevent multi‑task history buildup; use separate sessions for different functionalities. Effect: saves >50% of useless context.

Model choice – low‑cost open models:

Simple tasks: Qwen 7B, Llama 3 8B (local or cheap API).

Complex tasks: DeepSeek V3, Qwen Max (switch as needed).

Effect: per‑task price drops 70%–95%.

Compact comparison of optimisation dimensions

File filtering: .claudeignore / .opencodeignore vs IDE max‑file setting – saves 60%–80%.

Context compression: /compact (Claude) vs manual /clear (OpenCode) – saves 50%–88%.

Documentation‑driven prompt: CLAUDE.md vs custom README – saves 30%–50%.

Memory solidification: /memory vs global config – saves 40%–60%.

Plan mode: Shift+Tab vs manual task breakdown – saves 20%–40%.

Output trimming: tool‑output trimming and log truncation – saves 70%–90%.

Model switching: /model haiku/sonnet/opus vs manual plugin change – saves 30%–80%.

Context upper limit management: auto‑manage via /config (Claude) vs precise config.json (OpenCode) – saves >30%.

Practical 10‑step token‑saving checklist

Create .claudeignore or .opencodeignore at the project root and copy the template.

Add a CLAUDE.md describing the tech stack, directory layout, and commands.

Enable auto‑compression (Claude: /config Auto-compact).

For long dialogs, manually run /compact at logical breakpoints.

Store project conventions with /memory to avoid repeated input.

Use Plan Mode ( Shift+Tab) for complex tasks – plan before execution.

Switch models per task: simple → /model haiku, complex → /model sonnet, very complex → /model opus.

Disable unnecessary auto‑features (real‑time suggestions, full‑project scans).

Separate development into distinct sessions or files to prevent history bloat.

Regularly check token usage ( /usage) to locate new black holes.

Key reminders

Input is the core cost driver: prioritize trimming file reads, context size, and command length.

Prefer over‑exclusion to under‑exclusion: excluded files can be pasted manually, which is cheaper than automatic scanning.

Timely cleanup: long dialogs and multi‑task sessions require compression or clearing to avoid history inflation.

Model matching: select the appropriate tier for each task instead of defaulting to the highest‑end model.

Codex AI coding assistants Claude Code Token Optimization OpenCode

Written by

IoT Full-Stack Technology

Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.