How CodeGenius Re‑engineered Memory to Tame AI Agent Context Bloat
This article explains how the rapid evolution of AI agents caused context explosion, why the original fixed‑window memory failed, and how CodeGenius introduced a layered memory system that unloads stale data, deduplicates files, generates structural summaries, and dynamically compresses dialogue to keep prompts stable, reduce token cost, and improve task continuity.
Background
As large‑language‑model (LLM) capabilities increase, code‑focused AI agents evolve from simple chatbots to autonomous multi‑step executors. They must analyse user intent, read many files, invoke tools, and iteratively refine results. All of this information accumulates in the prompt, causing exponential context growth, higher latency, higher cost, and loss of critical signals.
Problems with Fixed‑Window Memory
Context breakage – truncating to the last few turns discards essential information.
Cache invalidation – each truncation changes the prompt, preventing reuse of cached LLM responses.
Redundant noise – repeated file contents and outdated tool outputs waste tokens and confuse the model.
Memory System Goals
Control overall context size.
Preserve key semantics.
Improve model stability.
Reduce cost and latency.
Enable truly continuous task execution.
Key Mechanisms
1. Unloading Stale Information
After several dialogue rounds, historical messages often contain redundant data (e.g., code that has already been executed). The system removes tool inputs/outputs older than five turns and stores them as external files, keeping only file paths and brief hints in the prompt. To avoid constant cache loss, unloading is batched: every five turns the oldest data is purged, balancing token reduction with cache reuse.
Claude series models charge 0.1× for reading cached tokens and 1.25× for creating cache entries.
2. File Deduplication and Summarisation
File contents dominate token usage. The strategy follows an append‑only model: the full file is sent only on the first read; subsequent edits send only diffs; otherwise only the file path is referenced. For large files (>3000 lines), tree‑sitter extracts a concise summary containing type definitions, variable declarations, and function signatures, allowing the model to fetch only the relevant sections.
interface RuleNode {
title: string;
key: string;
hitRate: number;
totalCases: number;
nodeType: NodeTypeEnum | 'logicGroup';
logicType?: 'AND' | 'OR';
isLeaf?: boolean;
children?: RuleNode[];
}3. Dynamic Dialogue Summarisation
Even after unloading and deduplication, context still grows with each turn. The system triggers a summarisation step that collapses the entire conversation into a 2‑3 KB structured summary, preserving intent, technical concepts, file references, errors, problem‑solving steps, pending tasks, and current work. The summary follows nine sections (Primary Request, Key Concepts, Files & Code, Errors, Problem Solving, All User Messages, Pending Tasks, Current Work, Optional Next Step) and is generated using the Claude Code Compact prompt.
4. Compression Triggers
When context usage reaches ~70 % of the model window, compression runs pre‑emptively.
If a new user topic is unrelated to the existing context, the system compresses history to free space for the fresh task.
Compression only occurs when the token saving exceeds the cost of generating the summary (typically when history >3 K tokens).
Observed Benefits
Significant increase in prompt‑cache hit rate, lowering inference cost.
Improved generation quality for complex, multi‑file, multi‑step development tasks.
Average token consumption dropped due to deduplication, diff‑only updates, and structural summaries.
Future Directions
Context isolation via Sub‑Agent mechanisms to prevent unrelated tasks from contaminating the main context.
Hierarchical memory tiers: short‑term prompt, mid‑term structured summaries, long‑term external knowledge bases.
Dynamic policy optimisation that automatically adjusts thresholds and compression intensity based on context size and task complexity.
References
https://github.com/Yuyz0112/claude-code-reverse/blob/main/results/prompts/compact.prompt.md https://drive.google.com/file/d/1QGJ-BrdiTGslS71sYH4OJoidsry3Ps9g/view https://aider.chat/2023/10/22/repomap.htmlAlibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
