Tagged articles

token reduction

15 articles · Page 1 of 1

Jul 16, 2026 · Artificial Intelligence

Brake Overthinking in Long‑Reasoning Models by Detecting Semantic Redundancy

Long‑thinking LLMs often waste 41‑52% of tokens after the final answer; the PUMA framework detects when reasoning stops producing new semantic information, enabling early exit that cuts average token usage by 26.2% while keeping accuracy stable and even improving speed across multiple benchmarks.

PUMAearly exitllm-inference

0 likes · 9 min read

Brake Overthinking in Long‑Reasoning Models by Detecting Semantic Redundancy

AI Engineering

Jun 26, 2026 · Artificial Intelligence

Headroom: Netflix Engineer’s Open‑Source Context Compression Tool – Does It Save Tokens or Burn More?

Headroom positions itself as a reversible context‑compression layer for AI agents, offering six algorithms and three integration modes that claim up to 92% token savings in benchmarks, yet real‑world tests by engineers show mixed results and occasional token overhead.

AI agentsContext CompressionHeadroom

0 likes · 9 min read

Headroom: Netflix Engineer’s Open‑Source Context Compression Tool – Does It Save Tokens or Burn More?

java1234

Jun 26, 2026 · Artificial Intelligence

Headroom: Open‑Source AI Agent Context Compression Cuts Token Usage by 60‑95%

Headroom inserts a reversible compression layer between your AI agent and the LLM, trimming irrelevant context such as tool outputs, logs, and RAG results, which can reduce token consumption by 60‑95% while preserving accuracy, as demonstrated on real‑world workloads.

AI agentsContext CompressionHeadroom

0 likes · 7 min read

Headroom: Open‑Source AI Agent Context Compression Cuts Token Usage by 60‑95%

AI Architecture Path

Jun 20, 2026 · Artificial Intelligence

Ultimate Browser Automation for AI Agents: 2.7K+ Stars, Cut Token Use by 90%, Solve Anti‑Scraping, Captcha, and Multi‑Account Issues

BrowserAct v2.0.2 provides a stealthy, CLI‑driven browser automation layer for AI agents that eliminates manual QR logins, bypasses Cloudflare and anti‑bot blocks, isolates multi‑account sessions, auto‑solves captchas, and reduces token consumption by about 90%, with real‑world benchmarks and detailed usage guidance.

AI AgentBrowserActCaptcha Solving

0 likes · 16 min read

Ultimate Browser Automation for AI Agents: 2.7K+ Stars, Cut Token Use by 90%, Solve Anti‑Scraping, Captcha, and Multi‑Account Issues

Java Architect Essentials

Jun 16, 2026 · Artificial Intelligence

Cut Claude Code Token Costs by Up to 90% with This Open‑Source Rust Proxy

RTK is a Rust‑based CLI proxy that filters and compresses shell command output for LLM agents, slashing token usage by 60‑90% with less than 10 ms overhead, supporting over 100 commands, multiple AI tools, and configurable privacy‑safe telemetry.

AI agentsCLILLM

0 likes · 5 min read

Cut Claude Code Token Costs by Up to 90% with This Open‑Source Rust Proxy

Frontend AI Walk

Jun 10, 2026 · Artificial Intelligence

How RTK Eliminates 89% of Redundant Tokens in AI Programming

RTK, a Rust‑based CLI filter, removes progress bars, empty lines and other noise from AI coding assistant output, cutting token usage by about 89%, which lowers costs, extends session limits and improves context quality for tools like Claude Code and Cursor.

AI ProgrammingCLIClaude Code

0 likes · 11 min read

How RTK Eliminates 89% of Redundant Tokens in AI Programming

Java Tech Enthusiast

Jun 8, 2026 · Artificial Intelligence

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

The article breaks down token billing, shows that input tokens account for 70‑90% of cost, and provides concrete techniques—file filtering, context compression, doc‑driven prompts, memory caching, plan mode, output trimming, and model switching—across Claude Code, Codex, and OpenCode, culminating in a 10‑step checklist and a comparison table that demonstrate up to 80% token savings.

AI CodingClaude CodeCodex

0 likes · 11 min read

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

Architect's Tech Stack

Jun 4, 2026 · Artificial Intelligence

How TencentDB Agent Memory Cuts Token Usage by 61% and Boosts Task Success

TencentDB Agent Memory, an open‑source hierarchical memory system for long‑running AI agents, offloads tool calls, structures short‑term and four‑layer long‑term memories, and reduces token consumption by 61% while raising task success rate 51% and persona accuracy from 48% to 76%, all running locally with SQLite and no API keys.

AI agentsOpenClawTencentDB Agent Memory

0 likes · 4 min read

How TencentDB Agent Memory Cuts Token Usage by 61% and Boosts Task Success

AI Architecture Path

Jun 3, 2026 · Artificial Intelligence

How Headroom Cuts Claude Code Token Usage by Up to 95% Without Losing Accuracy

Headroom is a locally run, reversible context‑compression layer for Claude Code that reduces input tokens by 60‑95 % without sacrificing precision, eliminates context‑limit errors, cuts token costs, protects privacy, and enables seamless memory sharing across multiple AI coding agents, as demonstrated by real‑world benchmarks.

AI CodingClaude CodeContext Compression

0 likes · 15 min read

How Headroom Cuts Claude Code Token Usage by Up to 95% Without Losing Accuracy

Ubiquitous Tech

May 23, 2026 · Artificial Intelligence

How CodeGraph Cuts Token Usage for AI Coding Assistants

The article analyzes why AI coding agents waste tokens while exploring unfamiliar code bases, introduces the open‑source CodeGraph tool that builds a local code knowledge graph, and shows how it reduces API calls from 57 to 5 and speeds up responses from minutes to seconds.

AI CodingMCPNode.js

0 likes · 13 min read

How CodeGraph Cuts Token Usage for AI Coding Assistants

AI Engineering

May 16, 2026 · Backend Development

Cut 92% of Claude Code Tool Calls for Large Codebases with CodeGraph

CodeGraph builds a semantic knowledge graph of a codebase so Claude Code can query the graph instead of scanning files, reducing tool calls by an average of 92% and speeding up exploration by 71% across multiple large, multi‑language projects.

AI code assistanceClaude Codebenchmark

0 likes · 6 min read

Cut 92% of Claude Code Tool Calls for Large Codebases with CodeGraph

Machine Heart

May 12, 2026 · Artificial Intelligence

DECS Cuts Overthinking in Models: Halve Inference Tokens and Raise Accuracy

DECS, a novel training framework introduced by researchers from Fudan, Shanghai Jiao Tong, and the Shanghai AI Lab, theoretically exposes the flaws of length‑penalty rewards and, through token‑level reward decoupling and dynamic batch scheduling, reduces inference token counts by over 50% while improving accuracy across multiple benchmarks.

DECSReward Designbenchmark evaluation

0 likes · 9 min read

DECS Cuts Overthinking in Models: Halve Inference Tokens and Raise Accuracy

Java Web Project

May 1, 2026 · Artificial Intelligence

How a Single Command Cuts AI Coding Token Usage from 210K to 23K

The article explains why AI coding tools waste hundreds of thousands of tokens on noisy terminal output, presents official data showing a typical two‑hour session generating 210,000 useless tokens, and demonstrates how the open‑source Rust Token Killer (RTK) filters output to save up to 80% of tokens with a single command.

AI CodingCLIRTK

0 likes · 4 min read

How a Single Command Cuts AI Coding Token Usage from 210K to 23K

AI Engineering

Apr 2, 2026 · Artificial Intelligence

Cut Claude Code’s Fluff with 8 Lines: Slash Output Tokens by 63%

By adding an eight‑line CLAUDE.md file that suppresses polite openings, repetitions, and unnecessary explanations, developers reduced Claude Code’s output token count by 63% without losing information, achieving up to 75% shorter code reviews and 64% shorter concept explanations, as verified by independent benchmarks.

ClaudeGitHubLLM prompt

0 likes · 4 min read

Cut Claude Code’s Fluff with 8 Lines: Slash Output Tokens by 63%

AI Open-Source Efficiency Guide

Mar 26, 2026 · Artificial Intelligence

OpenSpace: HKU’s Open‑Source AI Agent Engine Cuts Tokens by 46% and Boosts ROI 4.2×

OpenSpace is an open‑source, self‑evolving AI agent engine that supports major agent frameworks, reduces token consumption by 46%, achieves a 4.2‑fold return on 50 professional tasks across six industries using the Qwen 3.5‑Plus model, and provides auto‑fix, auto‑improve, and auto‑learn capabilities for collective intelligence.

AI AgentOpenSourcebenchmark

0 likes · 9 min read

OpenSpace: HKU’s Open‑Source AI Agent Engine Cuts Tokens by 46% and Boosts ROI 4.2×