prompt caching — 19 Technical Articles

Apr 24, 2026 · Artificial Intelligence

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

The article examines Hermes Agent’s three‑layer memory system—facts, session retrieval, and process assets—detailing how Skills are created, stored, patched, and secured at runtime, and argues that reliable self‑evolution requires disciplined versioning, evaluation, and access controls rather than unchecked automatic skill generation.

AI SkillsHermes AgentProcess Assets

0 likes · 21 min read

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

AI Architecture Hub

Apr 24, 2026 · Artificial Intelligence

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

Claude Code’s prompt‑caching delivers a 92% hit rate, slashing a 50‑round agent session cost from $6 to $1.15 by separating stable prefixes from dynamic tails, using a three‑layer cache architecture, exact token‑sequence matching, and three strict engineering rules that keep the cache hot and reliable.

Agent EngineeringCache Hit RateClaude Code

0 likes · 13 min read

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

AI Architecture Hub

Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI engineeringClaude CodeLLM agents

0 likes · 10 min read

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Architect

Apr 21, 2026 · Artificial Intelligence

Why a 92% Prompt Cache Hit Rate Slashes LLM Costs: A Deep Dive into Context Engineering

The article dissects Anthropic's Prompt Caching mechanism, explaining how a 92% cache‑hit rate dramatically reduces pre‑fill costs for long‑running AI agents by structuring stable and dynamic context, managing TTL, look‑back limits, and applying seven practical engineering checks.

AI agentsCache Hit RateClaude

0 likes · 22 min read

Why a 92% Prompt Cache Hit Rate Slashes LLM Costs: A Deep Dive into Context Engineering

AI Tech Publishing

Apr 20, 2026 · Artificial Intelligence

How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive

This article explains the mechanics of prompt‑caching for large language models, breaks down static versus dynamic context, details KV‑cache operation and its pricing, and shows how Claude Code’s 30‑minute programming session reached a 92% cache hit rate that reduced inference costs by 81%, concluding with three production‑grade design rules.

AI agentsAnthropic APIClaude Code

0 likes · 13 min read

How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive

Tencent Cloud Developer

Apr 15, 2026 · Artificial Intelligence

How Hermes Agent’s Skills System Enables Self‑Learning AI Agents

This article provides an in‑depth technical analysis of Hermes Agent’s Skills closed‑loop system, detailing its lifecycle from experience extraction and knowledge storage to intelligent retrieval, conditional activation, progressive disclosure, security scanning, and self‑improvement, while comparing it to academic prototypes like Voyager.

AI AgentHermes AgentSkills System

0 likes · 27 min read

How Hermes Agent’s Skills System Enables Self‑Learning AI Agents

Machine Heart

Apr 13, 2026 · Artificial Intelligence

What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?

The article dissects coding agents by outlining their six core components, explaining how an agent harness orchestrates model inference, repository context, prompt caching, tool validation, context compression, structured memory, and bounded sub‑agents, and shows why these architectural choices give Claude Code a performance edge over plain LLMs.

Agent HarnessContext CompressionLLM

0 likes · 22 min read

What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?

AI Tech Publishing

Apr 6, 2026 · Artificial Intelligence

Six Core Components of a Coding Agent Explained with Code

The article systematically breaks down the six essential building blocks of a programming agent—live repository context, prompt shape and cache reuse, structured tool access and validation, context reduction, structured session memory, and bounded sub‑agent delegation—illustrated with a Mini Coding Agent implementation and comparisons to Claude Code, Codex, and OpenClaw.

Context CompressionLLMPython

0 likes · 15 min read

Six Core Components of a Coding Agent Explained with Code

AI Programming Lab

Apr 5, 2026 · Artificial Intelligence

Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session

The article explains what tokens are, how different models tokenize text, the role of token embeddings, positional encoding, self‑attention, KV cache, and why output tokens cost far more than input tokens, while also covering pricing differences and prompt‑caching savings across major LLM providers.

KV cacheLLM pricingLarge Language Model

0 likes · 13 min read

Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session

Machine Heart

Apr 1, 2026 · Artificial Intelligence

Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature

The accidental exposure of Claude Code’s TypeScript source via an npm source‑map mishap sparked a rapid community deep‑dive that uncovered anti‑distillation safeguards, a hidden Buddy pet, extensive prompt‑caching logic, undercover mode, auto‑compaction thresholds, and broader engineering trade‑offs, while Anthropic and its founder responded to the slip.

AI agentsClaude Codeanti-distillation

0 likes · 20 min read

Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature

Architect

Mar 18, 2026 · Artificial Intelligence

Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture

The article explains that Prompt Cache is not merely a way to reduce token costs, but a fundamental mechanism that forces developers to redesign the context management of long‑running AI agents, turning caching considerations into core architectural decisions.

Context Engineeringlarge language modelsprompt caching

0 likes · 25 min read

Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture

DataFunTalk

Mar 15, 2026 · Artificial Intelligence

How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs

The OpenClaw v2026.3.7 upgrade introduces webhook compatibility fixes, typing‑feedback support, a 33% prompt‑caching cost reduction, smarter model routing with domestic model integration, and persistent bindings for container deployments, making the platform far more suitable for enterprise AI agent scenarios.

AI agentsContainer DeploymentOpenClaw

0 likes · 10 min read

How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs

High Availability Architecture

Mar 12, 2026 · Artificial Intelligence

How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%

This article explains the prompt‑caching mechanism used by Claude Code, showing how separating static prefixes from dynamic tails and leveraging KV‑tensor caching reduces the O(n²) complexity of transformer pre‑fill to O(n), achieving a 92% cache hit rate and up to 81% cost savings in long‑running AI agent sessions.

AI agentsClaudeLLM optimization

0 likes · 12 min read

How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%

Code Mala Tang

Mar 9, 2026 · Artificial Intelligence

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Claude’s API now automatically caches static parts of prompts—system instructions, tool definitions, and context—so repeated calls reuse these sections at only 10% of the standard token price, dramatically reducing costs for multi‑turn agents, but developers must manage prefixes and avoid cache‑breaking changes.

Claude APILLM engineeringToken Optimization

0 likes · 15 min read

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

AI Code to Success

Mar 1, 2026 · Artificial Intelligence

How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons

This article explains how Claude Code’s Prompt Caching technique dramatically reduces latency and cost for long‑running AI agents, and shares five hard‑won engineering practices—including prompt layout, message‑based updates, avoiding mid‑conversation model or tool changes, and safe context forking—to help developers build efficient, cache‑friendly AI applications.

Context Managementcost optimizationlarge language models

0 likes · 10 min read

How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons

AI Waka

Feb 24, 2026 · Artificial Intelligence

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

By adding a single field to Claude API requests, developers can automatically cache static prompt parts, reducing token billing to just 10% of the original cost and dramatically lowering expenses for multi‑turn AI agents.

AI agentsClaude APIToken Optimization

0 likes · 13 min read

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

PaperAgent

Feb 1, 2026 · Artificial Intelligence

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

The article provides a deep technical breakdown of the OpenClaw (formerly Clawdbot) AI agent’s token consumption patterns, identifies four major architectural token‑black‑holes, explains why they are hard to avoid, and offers concrete mitigation strategies such as prompt caching, workflow engines, context compaction, tool pruning, and model routing to dramatically reduce operational costs.

AI agentsReAct loopToken Optimization

0 likes · 12 min read

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

Instant Consumer Technology Team

Oct 10, 2025 · Artificial Intelligence

Why Does Claude Code Burn Tokens So Fast? A Deep Dive into Costs and Optimization

A developer recounts two days of using the VS Code Claude Code plugin, discovers a shocking 57 million token usage costing over $30, analyzes the breakdown, compares it with Copilot and Windsurf, and shares practical tips to curb token consumption and avoid rate limits.

AI coding toolsClaudecost analysis

0 likes · 12 min read

Why Does Claude Code Burn Tokens So Fast? A Deep Dive into Costs and Optimization

Baobao Algorithm Notes

Oct 17, 2024 · Artificial Intelligence

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs

Anthropic’s Contextual Retrieval augments traditional RAG with contextual embeddings and BM25, reducing retrieval failure rates by 49% (up to 67% with reranking), improving accuracy across domains, and lowering latency and cost through Claude’s prompt‑caching feature.

AIBM25Contextual Retrieval

0 likes · 11 min read

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs