Artificial Intelligence 20 min read

Efficient AI Agent Design: Context, Tool Loading & Loop Strategies

This article analyses the architectural choices behind modern AI agents such as OpenClaw and Claude Code, covering context management (append‑only, compression, task isolation), tool loading (tools field vs. prompt embedding, console vs. remote MCP), tool discovery methods (full injection, incremental loading, sub‑agents, vector search, Skill layer), and the trade‑offs between dialogue‑driven and task‑driven main loops, concluding with practical recommendations for building cost‑effective agents.

vivo Internet Technology

Apr 8, 2026

Efficient AI Agent Design: Context, Tool Loading & Loop Strategies

Introduction

Building an AI agent requires a series of architectural decisions: how to manage context, how to load tools, how to find the right tool, and how to design the main loop. These choices have no single correct answer, and each trade‑off has clear cost implications.

1. Context Management

1.1 Append‑only Context

OpenClaw and Claude Code both use an append‑only context: the entire conversation history is kept in an ever‑growing array and sent as the prompt for every LLM call. This simple design yields high cache‑hit rates, easy implementation, and full conversational continuity, but it forces the model to resend all previous tokens even when the current request is trivial, leading to high token‑based cost.

第1轮 LLM 调用：
┌──────────────────────────────────────────────┐
│ System Prompt │ User: "帮我重构登录模块" → LLM → 回复 + 工具调用 │
└──────────────────────────────────────────────┘
~2K tokens

第2轮 LLM 调用（工具结果返回后）：
┌──────────────────────────────────────────────────────────────────┐
│ System Prompt │ User │ Assistant │ Tool Call │ Tool Result │ ... │ → LLM │
└──────────────────────────────────────────────────────────────────┘
~8K tokens

... 经过 20 轮工具调用 ...

第N轮 LLM 调用：
┌─────────────────────────────────────────────────────────────────────────────┐
│ System │ U │ A │ T │ R │ A │ T │ R │ A │ T │ R │ ... │ User: "你好" │
└─────────────────────────────────────────────────────────────────────────────┘
│◄──────────────── 80K tokens 历史 ──────────────────────►│◄─ 2 tokens ──►│
全部重新发送   实际新内容

1.2 Compression Strategies

When the context window approaches its limit, both products trigger compression. Claude Code trims tool outputs and prepends a structured summary, keeping recent turns intact. OpenClaw first writes a memory file to disk, then performs staged compression, preserving the memory across sessions.

Claude Code 压缩前：
┌─────────────────────────────────────────────────┐
│ System │ U │ A │ T │ R │ ... │ U │ A │ T │ R │
└─────────────────────────────────────────────────┘
~167K tokens

Claude Code 压缩后：
┌─────────────────────────────────────────────────┐
│ System │ [压缩摘要: 已完成X, 正在做Y, 最近几轮] │
└─────────────────────────────────────────────────┘
~10‑20K tokens

OpenClaw 采用记忆落盘 + 分阶段压缩（跨会话保留）

1.3 Task Isolation

Instead of compressing after the fact, tasks can be isolated from the start. Each task receives its own independent context window, preventing unrelated history from polluting the prompt. The downside is the loss of cross‑task continuity, which must be restored via explicit sharing mechanisms.

2. Tool Loading

2.1 Conflict Between tools Field and Prompt Cache

The tools top‑level field is separate from messages. Changing the tool list invalidates the prompt cache, causing the entire message history to be re‑computed.

请求 A（5 个工具）：
┌───────────────────────────────────────┐
│ system prompt │ tools(5) │ msg1 … msg100 │
└───────────────────────────────────────┘
✅ 全部命中

请求 B（新增 1 个工具 → 6 个）：
┌────────────────────────────────────────┐
│ system prompt │ tools(6) │ msg1 … msg100 │
└────────────────────────────────────────┘
↑ tools 变了，整条链全部失效
❌ 100K token 的 messages 缓存失效

2.2 Moving Tool Descriptions into the Prompt

Embedding tool descriptions inside messages keeps the cache stable when new tools are added, but sacrifices the structured guarantee of the tools field and requires custom parsing.

当前方式（tools 字段注入）：
┌───────────────────────────────────────┐
│ system │ tools [A,B,C] │ messages… │
└───────────────────────────────────────┘
↑ 改这里 → 后面全部失效

替代方式（工具描述嵌入 prompt）：
┌───────────────────────────────────────┐
│ system │ messages… │ [工具 A 描述] [工具 B 描述] │
└───────────────────────────────────────┘
↑ 追加新工具 → 前面的缓存全部保留 ✅

2.3 Console vs. Remote MCP

A local “console” tool that executes shell commands (e.g., execute) never changes, preserving cache while handling most frequent operations. Remote MCP provides dynamic discovery for complex APIs but incurs cache invalidation when its tool list changes.

MCP 方式（需要 N 个工具定义）：
┌───────────────────────────────────────┐
│ tools: [get_weather, query_db, send_email, …] │
└───────────────────────────────────────┘
每增加一个 → 缓存全部失效

控制台方式（始终只需要 1 个工具）：
┌───────────────────────┐
│ tools: [execute] │ ← 永远不变，永远命中缓存
└───────────────────────┘
需要查询天气 → execute: curl wttr.in/Beijing
需要查询数据库 → execute: psql -c "SELECT …"
需要发邮件 → execute: sendmail …

3. Tool Finding

3.1 Four Approaches

Full injection of the tools field – simple but incurs huge token overhead when the list grows.

Incremental context loading – preserves cache but accumulates stale tool descriptions.

Sub‑agent lookup – isolates the main context but may miss broader conversational information.

Vector or keyword search – token‑free yet less accurate at large scale.

3.2 Skill Layer

A “Skill” groups related tools by functionality (e.g., database management) and provides a concise description of how to combine them. Skills act as a cache of tool‑calling knowledge, reducing the search space from dozens of individual APIs to a handful of high‑level capabilities.

# 数据库管理技能
## 能力描述
管理 PostgreSQL 数据库：查询、写入、schema 管理、数据导入导出。
## 可用工具
- psql：命令行客户端，支持 SQL 查询和管理
- pg_dump / pg_restore：备份与恢复
## 常见用法
> psql -h $HOST -d $DB -c "SELECT * FROM users WHERE active = true"
> psql -h $HOST -d $DB -c "\COPY (SELECT ...) TO STDOUT CSV HEADER" > output.csv
> psql -h $HOST -d $DB -c "\d+ tablename"
## 注意事项
- 写操作前先确认，避免误修改
- 大查询加 LIMIT

4. Main Loop Design

4.1 Dialogue‑Driven Loop

All current agents (OpenClaw, ChatGPT, Claude) treat the chat window as both input and output, making the interaction barrier low but tying every action to the conversation context.

4.2 Task‑Driven Loop

The agent perceives events (chat, cron, webhook), thinks (plan, select tools), and acts (execute, call MCP, reply). The core output becomes the reasoning chain rather than the final reply, improving observability and debuggability.

任务驱动示例：
任务: 排查线上报错
思考: 需要先查看错误日志
行动: execute("grep -n ERROR app.log")
思考: 发现数据库连接超时，需要检查连接池配置
行动: execute("cat config/db.yaml")
思考: 连接池 max_idle 设置过低，需告知用户
行动: chat.send("问题定位到了：…")
任务完成

4.3 Practical Trade‑offs

Large language models are fine‑tuned for dialogue, so they naturally prefer answering directly over invoking tools. Achieving a pure task‑driven loop requires system‑prompt engineering and future model training that emphasizes tool usage.

Conclusion

The four key design decisions—context management, tool loading, tool finding, and main‑loop architecture—are inter‑dependent. Task isolation influences context strategy; loading choices affect how Skills are organized; the loop design determines which sources can generate tasks. Understanding these trade‑offs is essential for building efficient, cost‑effective AI agents.

AI agents task isolation tool discovery tool loading

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.