Artificial Intelligence 10 min read

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

The article breaks down why input tokens dominate 70‑90% of LLM costs and provides concrete, platform‑specific techniques—file filtering, context compression, documentation drives, memory caching, plan mode, output trimming, and model switching—that together can reduce token consumption by 20‑90% across Claude Code, Codex, and OpenCode.

Su San Talks Tech

May 31, 2026

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

Token billing follows the formula

Total Cost = Input Tokens × Input Price + Output Tokens × Output Price

, with input tokens accounting for 70‑90% of the expense, making input optimization the primary cost‑saving lever.

1. Claude Code (Claude) Optimizations

File filtering : create a .claudeignore file (syntax like .gitignore) at the project root to exclude build artifacts, logs, caches, and other irrelevant files. Example content:

# Dependency and build folders
node_modules/
.dist/
.build/
__pycache/
# Lock files and logs
*.lock
package-lock.json
*.log
# VCS and IDE files
.git/
.idea/
.vscode/
# Assets
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/

Result: a single interaction drops from 150 k to 60 k tokens (≈60% reduction).

Context compression : use /compact manually or enable auto‑compact via /config (e.g., Auto‑compact enabled). This trims long dialogues, keeping only essential code changes. Result: 25 k → 3 k tokens (≈88% saved).

Documentation drive : add a CLAUDE.md at the root describing the stack, directory layout, and common commands, so the model can retrieve project context without scanning files.

Memory management : store recurring information with /memory (e.g.,

/memory 项目用 Next.js 14 + TypeScript，接口规范见 docs/api.md

) and retrieve it via /memory list or delete with /memory delete [key]. Saves >40% of repeated input.

Plan mode : press Shift+Tab to let the model propose an execution plan before running code, avoiding wasted tokens from trial‑and‑error. Saves >20% of invalid output.

Output trimming : enable concise tool output with /config to strip ANSI colors, progress bars, and empty lines; truncate long logs to error stacks only. Example: npm test output reduced from 25 k to 2.5 k tokens (≈90% saved).

Model switching : select appropriate model per task ( /model haiku for simple syntax, /model sonnet for complex architecture, /model opus only when necessary). Cost reduction ranges from 30% to 80%.

2. Codex (GitHub Copilot) Optimizations

Limit the IDE’s maximum file context (set GitHub Copilot → Max File Context to 3–5 files) to prevent full‑project scans, cutting input tokens by >50%.

Use short inline prompts instead of verbose natural‑language requests, e.g., replace a long description with // Node.js Express 登录接口 JWT bcrypt, reducing input tokens by >40%.

Disable unnecessary features such as real‑time suggestions and automatic multi‑file indexing, and adopt a per‑file development style to shrink context size by >60%.

3. OpenCode (Self‑hosted) Optimizations

Configure .opencodeignore similarly to .claudeignore to filter out irrelevant files.

Set precise context limits in config.json (e.g., "input_limit": 128000, "output_limit": 80000) to fully utilize model capacity and avoid truncation, saving >30%.

Manually clear history with /clear and start new sessions for separate tasks to prevent context bloat, yielding >50% reduction in wasted tokens.

4. Comparative Savings Overview

Across the three platforms, token‑saving techniques achieve the following approximate reductions:

File filtering: 60%–80%

Context compression: 50%–88%

Documentation drive: 30%–50%

Memory caching: 40%–60%

Plan mode: 20%–40%

Output trimming: 70%–90%

Model switching: 30%–80%

Context limit tuning: 30%+

5. Practical 10‑Step Token‑Saving Checklist

Create .claudeignore / .opencodeignore at the project root using the provided template.

Add a CLAUDE.md (or README_OPENCODE.md) describing the tech stack, directory structure, and commands.

Enable auto‑compact with /config → Auto‑compact.

For long conversations, manually invoke /compact and periodically clear history.

Store project configuration and standards in /memory to avoid repeated input.

Use Plan Mode ( Shift+Tab) for complex tasks before execution.

Switch models per task: simple → Haiku/low‑cost, complex → Sonnet.

Turn off unnecessary auto‑features (real‑time suggestions, full‑project scans).

Develop in separate sessions or files to keep context size small.

Regularly check token usage with /usage to locate remaining “black holes”.

6. Key Reminders

Prioritize optimizing input: exclude as many files as possible.

Prefer over‑exclusion; excluded files can be manually pasted if needed.

Clean up long dialogues and multi‑task histories promptly.

Match the model tier to the task complexity; avoid defaulting to the most expensive model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt Engineering CodeX AI coding assistants Claude Code model switching token optimization OpenCode

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.