Artificial Intelligence 14 min read

Mastering Context Engineering for AI Agents: KV-Cache, Tool Management, and Error Handling

Peak, co‑founder of Manus, shares practical lessons on building AI agents through context engineering, emphasizing KV‑cache optimization, stable prompt prefixes, controlled tool selection, file‑system memory, attention‑directed todo lists, and preserving error traces to improve robustness and scalability.

DataFunTalk

Jul 19, 2025

Mastering Context Engineering for AI Agents: KV-Cache, Tool Management, and Error Handling

01. Designing Around KV‑Cache

KV‑cache hit rate is the single most important metric for production‑grade AI agents, directly affecting latency and cost. Context grows with each iteration while outputs remain short, leading to a high input‑to‑output token ratio (≈100:1 for Manus). Using KV‑cache for repeated prefixes can cut first‑token latency and inference cost dramatically (e.g., Claude Sonnet cache price $0.30 per million tokens vs. $3 without cache).

02. Constraining Action Selection via Masking Instead of Removal

As the agent’s capability set expands, the action space can explode. Dynamically adding or removing tools during a run breaks KV‑cache and can cause the model to hallucinate or select invalid actions. Manus uses a context‑aware state machine that masks logits for disallowed tool tokens, keeping the tool definitions in the context but preventing their selection.

Typical function‑call modes (illustrated with NousResearch’s Hermes format) are:

Auto – model may call a function; prefix <|im_start|>assistant Required – model must call a function; prefix <|im_start|>assistant<tool_call> Specified – model must call from a specific subset; prefix includes the function name, e.g.,

<|im_start|>assistant<tool_call>{"name":"browser_"}

03. Treating the File System as Unlimited Context

Modern LLMs support 128K+ tokens, yet real‑world agents still exceed this limit with large observations (web pages, PDFs). Instead of aggressive compression, Manus externalizes memory to a persistent file system, allowing the model to read/write files on demand, effectively turning the file system into an external, structured memory.

04. Re‑writing Todo Lists to Control Attention

Manus creates a todo.md file for each complex task and updates it incrementally. By repeatedly appending completed items, the overall goal stays near the end of the context, keeping it in the model’s short‑term attention window and preventing “mid‑task drift.”

05. Preserving Errors Instead of Hiding Them

Agents inevitably make mistakes—hallucinations, tool failures, or unexpected edge cases. Erasing these failures removes evidence needed for the model to adapt. Manus keeps error traces in the context, allowing the model to adjust its beliefs and improve future decisions.

06. The Few‑Shot Backfire

Few‑shot prompting can unintentionally lock the agent into repetitive patterns. When the context contains many similar action‑observation pairs, the model tends to mimic that pattern even if it’s suboptimal. Manus mitigates this by injecting small, structured variations—different serialization templates, alternative phrasings, or order changes—to break monotony and keep the agent robust.

Conclusion

Context engineering remains a nascent discipline but is essential for reliable AI agents. While models become faster and cheaper, they cannot replace external memory, environment interaction, and feedback loops. How you shape context determines speed, recoverability, and scalability of your agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents Error handling tool selection KV cache context engineering

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.