Artificial Intelligence 11 min read

How to Slash Token Costs on Claude Code, Codex, and OpenCode by Up to 90%

This guide explains why input tokens dominate cost, then details concrete techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—for Claude Code, GitHub Copilot (Codex) and OpenCode, culminating in a 10‑step checklist that can cut token usage by up to 90 %.

Architect's Tech Stack

Mar 31, 2026

How to Slash Token Costs on Claude Code, Codex, and OpenCode by Up to 90%

Token Consumption Overview

The dominant cost in AI‑assisted coding is the input side, which typically accounts for 70%–90% of total token usage. Reducing input tokens yields the greatest savings.

Token Consumption Principle

Input Tokens (70%–90%) : commands, conversation history, project files, tool outputs, system prompts.

Output Tokens (10%–30%) : code, explanations, logs returned by the model.

Largest Black Hole : automatic project‑file reading, which can consume up to 80% of input tokens in a single interaction.

Platform‑Specific Token‑Saving Techniques

Claude Code

File Filtering (.claudeignore)

# Dependency & build folders (big black holes)
node_modules/
dist/
build/
.next/
__pycache/
# Lock files / logs
*.lock
package-lock.json
*.log
# VCS / IDE files
.git/
.idea/
.vscode/
# Resources / cache
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/

Result: Single‑interaction token count drops from ~150 k to ~60 k (≈60% reduction).

Context Compression (/compact)

Manual: invoke /compact at logical checkpoints.

Command‑guided: /compact with options to keep code changes and file paths while discarding analysis.

Automatic: enable with /config Auto-compact enabled, reducing 25 k tokens to 3 k (≈88% reduction).

Documentation‑Driven Prompts (CLAUDE.md)

# Project Overview
Next.js 14 + TypeScript + Prisma + PostgreSQL SaaS
# Directory Structure
src/app/       # App Router
src/components/ # Components
src/lib/       # Utilities
src/server/    # Server code
# Commands
pnpm dev
pnpm build

Result: Eliminates repeated cat/find/grep scans, saving >30% of input tokens.

Memory Management (/memory)

Store fixed information:

/memory 项目用 Next.js 14 + TypeScript，接口规范见 docs/api.md

View stored items: /memory list Delete an entry: /memory delete [key] Result: Avoids repetitive pasting, saving >40% of repeated input.

Plan Mode (Shift+Tab)

Activate Plan Mode to let the model generate an execution plan first; confirm before proceeding to avoid wasted exploration.

Result: Reduces trial‑and‑error tokens by >20%.

Output Trimming

Enable output filtering via /config to strip ANSI colors, progress bars, and empty lines.

Truncate long logs, keeping only error stacks and concise summaries.

Result: Test output drops from 25 k to 2.5 k tokens (≈90% reduction).

Model Switching (/model)

Simple tasks: /model haiku (lowest cost).

Complex tasks: /model sonnet.

Very complex: /model opus (use only when necessary).

Result: Task‑specific model choice cuts cost by 30%–80%.

Codex (GitHub Copilot)

IDE Configuration

In VS Code set GitHub Copilot → Max File Context to 3–5 files, limiting the amount of code the model sees.

Result: Input tokens reduced by >50%.

Command Shortening

Use concise comments instead of verbose natural‑language prompts, e.g. // Node.js Express 登录接口 JWT bcrypt rather than a full sentence.

Result: Input tokens reduced by >40%.

Disable Unnecessary Features

Turn off real‑time suggestions and auto‑completion when not needed.

Disable multi‑file indexing except during refactoring.

Result: Eliminates background scanning token consumption.

File‑by‑File Development

Keep each file focused on a single function; manually copy required snippets instead of relying on automatic reads.

Result: Context size reduced by >60%.

OpenCode (Self‑Hosted)

Configuration File (config.json)

{
  "model": {
    "name": "deepseek-v3",
    "input_limit": 128000, // set according to model capability
    "output_limit": 80000
  }
}

Result: Utilizes full context window, avoiding automatic truncation and duplicate requests, saving >30%.

File Filtering (.opencodeignore)

Same syntax as .claudeignore; exclude dependencies, build artifacts, logs, and resource files.

Context Management

Manually clear history with /clear.

Use separate sessions for different functionalities.

Result: Prevents history bloat, saving >50% of unnecessary context.

Memory Storage

Store global directives in configuration files, achieving 40%–60% token savings.

Plan Mode

Implement custom scripts or plugins to emulate plan mode, yielding 20%–40% savings.

Output Trimming

Configure filter rules to remove noisy output, saving 70%–90% of output tokens.

Model Switching

Simple tasks: low‑cost models such as Qwen 7B or Llama 3 8B.

Complex tasks: switch to higher‑capability models like DeepSeek V3 or Qwen Max.

Result: Per‑task model selection reduces unit price by 70%–95%.

Practical 10‑Step Token‑Saving Checklist

Create .claudeignore / .opencodeignore in the project root using the provided template.

Add a CLAUDE.md file that lists the tech stack, directory layout, and common commands.

Enable automatic compression ( /config Auto-compact enabled).

For long conversations, manually invoke /compact at logical breakpoints.

Store recurring project configuration with /memory to avoid repeated input.

Use Plan Mode (Shift+Tab) for complex tasks before execution.

Switch models per task: low‑cost haiku for simple work, sonnet for complex work.

Disable unnecessary auto‑features such as real‑time completion and full‑project scanning.

Separate development into distinct sessions or files to prevent history accumulation.

Regularly review token usage ( /usage) to identify and eliminate new black holes.

Key Reminders

Input is king : prioritize trimming file reads, context size, and prompt length.

Prefer exclusion over inclusion : over‑excluding files is safer than missing costly ones.

Timely cleanup : compress or clear long dialogues and multi‑task histories.

Model matching : select the appropriate model tier for each task instead of defaulting to the most powerful option.

AI PromptEngineering Claude Codex OpenCode TokenOptimization

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.