Architect
Architect
Apr 24, 2026 · Artificial Intelligence

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

The article examines Hermes Agent’s three‑layer memory system—facts, session retrieval, and process assets—detailing how Skills are created, stored, patched, and secured at runtime, and argues that reliable self‑evolution requires disciplined versioning, evaluation, and access controls rather than unchecked automatic skill generation.

AI SkillsHermes AgentProcess Assets
0 likes · 21 min read
How Hermes Agents Self‑Evolve: What Should Remain After a Task?
AI Architecture Hub
AI Architecture Hub
Apr 24, 2026 · Artificial Intelligence

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

Claude Code’s prompt‑caching delivers a 92% hit rate, slashing a 50‑round agent session cost from $6 to $1.15 by separating stable prefixes from dynamic tails, using a three‑layer cache architecture, exact token‑sequence matching, and three strict engineering rules that keep the cache hot and reliable.

Agent EngineeringCache Hit RateClaude Code
0 likes · 13 min read
How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules
AI Architecture Hub
AI Architecture Hub
Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI engineeringClaude CodeLLM agents
0 likes · 10 min read
Why Prompt Caching Is Critical: Lessons from Building Claude Code
AI Tech Publishing
AI Tech Publishing
Apr 20, 2026 · Artificial Intelligence

How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive

This article explains the mechanics of prompt‑caching for large language models, breaks down static versus dynamic context, details KV‑cache operation and its pricing, and shows how Claude Code’s 30‑minute programming session reached a 92% cache hit rate that reduced inference costs by 81%, concluding with three production‑grade design rules.

AI agentsAnthropic APIClaude Code
0 likes · 13 min read
How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive
Tencent Cloud Developer
Tencent Cloud Developer
Apr 15, 2026 · Artificial Intelligence

How Hermes Agent’s Skills System Enables Self‑Learning AI Agents

This article provides an in‑depth technical analysis of Hermes Agent’s Skills closed‑loop system, detailing its lifecycle from experience extraction and knowledge storage to intelligent retrieval, conditional activation, progressive disclosure, security scanning, and self‑improvement, while comparing it to academic prototypes like Voyager.

AI AgentHermes AgentSkills System
0 likes · 27 min read
How Hermes Agent’s Skills System Enables Self‑Learning AI Agents
Machine Heart
Machine Heart
Apr 13, 2026 · Artificial Intelligence

What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?

The article dissects coding agents by outlining their six core components, explaining how an agent harness orchestrates model inference, repository context, prompt caching, tool validation, context compression, structured memory, and bounded sub‑agents, and shows why these architectural choices give Claude Code a performance edge over plain LLMs.

Agent HarnessContext CompressionLLM
0 likes · 22 min read
What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?
AI Tech Publishing
AI Tech Publishing
Apr 6, 2026 · Artificial Intelligence

Six Core Components of a Coding Agent Explained with Code

The article systematically breaks down the six essential building blocks of a programming agent—live repository context, prompt shape and cache reuse, structured tool access and validation, context reduction, structured session memory, and bounded sub‑agent delegation—illustrated with a Mini Coding Agent implementation and comparisons to Claude Code, Codex, and OpenClaw.

Context CompressionLLMPython
0 likes · 15 min read
Six Core Components of a Coding Agent Explained with Code
AI Programming Lab
AI Programming Lab
Apr 5, 2026 · Artificial Intelligence

Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session

The article explains what tokens are, how different models tokenize text, the role of token embeddings, positional encoding, self‑attention, KV cache, and why output tokens cost far more than input tokens, while also covering pricing differences and prompt‑caching savings across major LLM providers.

KV cacheLLM pricingLarge Language Model
0 likes · 13 min read
Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session
Machine Heart
Machine Heart
Apr 1, 2026 · Artificial Intelligence

Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature

The accidental exposure of Claude Code’s TypeScript source via an npm source‑map mishap sparked a rapid community deep‑dive that uncovered anti‑distillation safeguards, a hidden Buddy pet, extensive prompt‑caching logic, undercover mode, auto‑compaction thresholds, and broader engineering trade‑offs, while Anthropic and its founder responded to the slip.

AI agentsClaude Codeanti-distillation
0 likes · 20 min read
Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature
Architect
Architect
Mar 18, 2026 · Artificial Intelligence

Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture

The article explains that Prompt Cache is not merely a way to reduce token costs, but a fundamental mechanism that forces developers to redesign the context management of long‑running AI agents, turning caching considerations into core architectural decisions.

Context Engineeringlarge language modelsprompt caching
0 likes · 25 min read
Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture
DataFunTalk
DataFunTalk
Mar 15, 2026 · Artificial Intelligence

How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs

The OpenClaw v2026.3.7 upgrade introduces webhook compatibility fixes, typing‑feedback support, a 33% prompt‑caching cost reduction, smarter model routing with domestic model integration, and persistent bindings for container deployments, making the platform far more suitable for enterprise AI agent scenarios.

AI agentsContainer DeploymentOpenClaw
0 likes · 10 min read
How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs
High Availability Architecture
High Availability Architecture
Mar 12, 2026 · Artificial Intelligence

How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%

This article explains the prompt‑caching mechanism used by Claude Code, showing how separating static prefixes from dynamic tails and leveraging KV‑tensor caching reduces the O(n²) complexity of transformer pre‑fill to O(n), achieving a 92% cache hit rate and up to 81% cost savings in long‑running AI agent sessions.

AI agentsClaudeLLM optimization
0 likes · 12 min read
How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%
Code Mala Tang
Code Mala Tang
Mar 9, 2026 · Artificial Intelligence

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Claude’s API now automatically caches static parts of prompts—system instructions, tool definitions, and context—so repeated calls reuse these sections at only 10% of the standard token price, dramatically reducing costs for multi‑turn agents, but developers must manage prefixes and avoid cache‑breaking changes.

Claude APILLM engineeringToken Optimization
0 likes · 15 min read
How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents
AI Code to Success
AI Code to Success
Mar 1, 2026 · Artificial Intelligence

How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons

This article explains how Claude Code’s Prompt Caching technique dramatically reduces latency and cost for long‑running AI agents, and shares five hard‑won engineering practices—including prompt layout, message‑based updates, avoiding mid‑conversation model or tool changes, and safe context forking—to help developers build efficient, cache‑friendly AI applications.

Context Managementcost optimizationlarge language models
0 likes · 10 min read
How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons
AI Waka
AI Waka
Feb 24, 2026 · Artificial Intelligence

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

By adding a single field to Claude API requests, developers can automatically cache static prompt parts, reducing token billing to just 10% of the original cost and dramatically lowering expenses for multi‑turn AI agents.

AI agentsClaude APIToken Optimization
0 likes · 13 min read
How Claude’s New Auto‑Caching Cuts API Token Costs by 90%
PaperAgent
PaperAgent
Feb 1, 2026 · Artificial Intelligence

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

The article provides a deep technical breakdown of the OpenClaw (formerly Clawdbot) AI agent’s token consumption patterns, identifies four major architectural token‑black‑holes, explains why they are hard to avoid, and offers concrete mitigation strategies such as prompt caching, workflow engines, context compaction, tool pruning, and model routing to dramatically reduce operational costs.

AI agentsReAct loopToken Optimization
0 likes · 12 min read
Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs