Agent Architecture Design Part 1: Context Compression Strategies and Their Use Cases
The article explains why large‑model agents need context compression, outlines five engineering‑level schemes (both lossless and lossy), demonstrates each with concrete XML snippets and step‑by‑step reasoning, and advises using lossless methods before resorting to lossy prompt‑driven compression.
Why Context Compression?
Large‑model context windows are limited; even models with million‑token windows cannot hold hundreds of Agent Loop rounds or multi‑page PDFs. Moreover, research (Trychroma “Context Rot”) shows that longer contexts degrade LLM performance, so compression is required for stable agents.
Compression Schemes Overview
The article presents five engineering‑level compression methods, divided into lossless and lossy categories.
2.1 Lossless Scheme 1 – Omit Redundant I/O
When the input and output of a function share the same structure (e.g., create_todo_list), the repeated parts can be removed. The original XML snippet of round 0 is reduced to a compact form that keeps only the essential parameters and a brief result message.
<第0轮>
<执行ID>op00000</执行ID>
<执行函数>create_todo_list</执行函数>
<参数>{"id":"todo001","steps":["搜索最新的人工智能发展趋势","整理搜索结果","撰写总结报告"]}</参数>
<执行结果>{"message":"待办事项列表已创建,ID为 todo001"}</执行结果>
</第0轮>2.2 Lossless Scheme 2 – Omit Long‑Text Results After n+3 Rounds
For functions that return large text (e.g., search_with_google), the result is replaced with a placeholder after the third subsequent round. The placeholder instructs the user to retrieve the full result with view_return_value(execution_id).
<第1轮>
<执行ID>op12345</执行ID>
<执行函数>search_with_google</执行函数>
<参数>{"query":"最新的人工智能发展趋势","num_results":5}</参数>
<执行结果>执行结果被省略,如需要,请使用view_return_value(execution_id)查看结果</执行结果>
</第1轮>2.3 Lossless Scheme 3 – Segment‑Read Long Documents
The file_reader tool can be instructed to read only a line range (e.g., begin_line to end_line) with a maximum line count, preventing the entire document from exploding the context.
<第2轮>
<执行ID>op67890</执行ID>
<执行函数>file_reader</执行函数>
<参数>{"file_path":"ai_trends_summary.txt","begin_line":0,"end_line":5,"max_range":10}</参数>
<执行结果>{"content":"2024年人工智能发展趋势总结:..."}</执行结果>
</第2轮>2.4 Lossy Scheme 1 – On‑the‑Fly Summarisation (ask_document)
Instead of returning the full text, the agent calls ask_document with a specific question (e.g., “extract the core content in ≤20 characters”). The LLM returns a concise summary, discarding most details.
<第2轮>
<执行ID>op678900</执行ID>
<执行函数>ask_document</执行函数>
<参数>{"file_path":"ai_trends_summary.txt","question":"提取文章的核心内容,不超过20字"}</参数>
<执行结果>{"content":"2024年人工智能发展趋势总结:内容创新、边缘AI等"}</执行结果>
</第2轮>2.5 Lossy Scheme 2 – Prompt‑Driven Compression
When the accumulated Agent Loop context exceeds a threshold (e.g., 60 000 tokens), a custom “compression prompt” is sent to the LLM together with the full context. The prompt asks the model to keep all round identifiers, IDs, function names, parameters, and core result content while removing redundancy, preserving the original XML structure.
请将以下操作记录进行压缩,保留所有轮次(包括第0轮、第2-4轮)的关键信息,如执行ID、执行函数、参数和执行结果的核心内容,但简化描述以减少冗余。压缩后的输出应保持原始的XML格式结构,...The example produced by DeepSeek reduces the original ~4000‑token log to about 1200 tokens, demonstrating the trade‑off between size reduction and information loss.
Conclusion
The article enumerates five representative context‑compression techniques for autonomous agents, recommending lossless methods first and resorting to lossy approaches when the context still exceeds practical limits. Future posts will extend the toolbox with additional strategies.
AI Tech Publishing
In the fast-evolving AI era, we thoroughly explain stable technical foundations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
