Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How

The article demonstrates that Claude Code’s verbose responses waste hundreds of tokens, but a free “caveman” plugin can slash token consumption by up to 75% while preserving answer quality, backed by benchmark data and a research paper on concise replies.

DevOps Coach
DevOps Coach
DevOps Coach
Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How

Claude Code charges for filler phrases such as “Certainly” or “Sure, I’d be happy to help with that,” causing unnecessary token consumption.

In a test using the same Unity UI bug, the default Claude Code response required 1,252 tokens, whereas after applying the caveman plugin the response used only 410 tokens; the answer remained identical, with the reduction coming from the elimination of roughly 800 redundant tokens.

The caveman plugin is a free GitHub project with over 13,000 stars. It can be added and installed with the commands claude plugin marketplace add JuliusBrussee/caveman and claude plugin install caveman@caveman, then activated in a session with */caveman*.

Before activation the model replies with a verbose explanation of the authentication middleware bug; after activation it returns a concise statement that the middleware uses “<" instead of “<=”, directly presenting the fix and noting the cost‑saving benefit.

Contrary to the intuition that shorter replies are less accurate, the paper “Concise Constraints Reverse Performance Rankings” reports a 26% accuracy increase for brief responses on benchmark tests.

The plugin offers three modes—Lite (slightly trimmed, full syntax), Full (default, drops articles and non‑essential words), and Ultra (extreme abbreviation, one word per idea)—plus a Classical Chinese mode for maximal compression.

Benchmark statistics from Julius Brussee show substantial token savings, with the caveman plugin reducing overall costs and the companion caveman-compress tool cutting CLAUDE.md file size by about 45%.

To use the plugin, install it once, then run /caveman in any session for concise output, or *caveman ultra* for the most compact style. For deeper token savings, run *caveman-compress* on CLAUDE.md files.

The plugin is free, developed by Julius Brussee, and the author disclaims any affiliation with Claude or the Caveman project.

prompt engineeringClaudeToken OptimizationLLM cost reductioncaveman plugin
DevOps Coach
Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.