2025’s Hottest Agent Architecture Patterns: A Deep Technical Summary
The article surveys emerging 2025 agent architecture patterns—including giving agents a computer, multi‑layer action spaces, progressive disclosure, context offloading, caching, sub‑agent isolation, evolving context, and multi‑agent coordination—backed by citations from Meta, Anthropic, and open‑source projects.
1. Giving agents a computer
Agents are defined as LLM‑driven systems that can autonomously direct their own actions [8] . Providing file‑system access and a shell gives agents persistent context and the ability to run built‑in tools, CLIs, preset scripts, or custom code. Claude Code runs on a real computer, while Manus uses a virtual computer [9][10] . The core abstraction for coding agents is the command‑line interface (CLI), because agents need OS‑level access; interpreting Claude Code as an “AI of the operating system” captures this view [11] .
2. Multi‑layer action space
Agents invoke tools to act. Model Context Protocol (MCP) makes tool definitions easy to add [12] , but scaling MCP overloads the context window and intermediate tool results consume extra tokens. A GitHub MCP server with 35 tools occupies ~26 k tokens [14] . Overlapping tools can also confuse the model [15] . Popular general‑purpose agents keep tool counts low: Claude Code uses about a dozen tools [16] , Manus under 20 [17] , and Amp Code curates a small set [18] .
One solution pushes actions from the tool‑call layer down to the computer layer. Manus employs a hierarchical action space: a few atomic tools (e.g., a bash tool) run on a virtual computer, which can then invoke shell tools, CLIs, or execute code. The CodeAct paper shows agents can link many actions by writing and executing code, saving tokens because intermediate tool results are avoided [19][20] . Claude Code examples illustrate this approach [21] .
3. Progressive disclosure
Instead of loading all tool definitions into context up front, progressive disclosure shows only necessary information initially and reveals more details on demand. Some agents index tool definitions and retrieve them via a search tool [22][23] . For shell tools, Manus lists available tools in the agent instruction and uses the --help flag to discover any tool when needed [24] .
MCP servers can push from the tool‑call layer to the computer layer, enabling progressive disclosure. Cursor Agent synchronizes MCP tool descriptions to a folder, giving agents a short list of usable tools and allowing them to read full descriptions only when required [25] . Anthropic and Cloudflare discuss similar MCP management strategies [26][27] . Anthropic’s “skill standard” loads SKILL.md files on demand, with YAML metadata in the instruction and full files read only when needed [28][29][30] .
4. Offloading context
Agents can unload context to the file system. Manus writes old tool results to files and only summarizes when marginal benefit declines [31] . Cursor Agent similarly offloads tool results and agent trajectories to the file system, reading them back when needed [32] . This mitigates concerns about “context compression” losing useful information [33][34] . Storing plans or progress in files enables long‑running agents to reinforce goals or verify work [35][36] .
5. Caching context
Agents typically manage context as a linear message list, appending each action. Modifying the chat history by adding or removing blocks is a promising direction, but without prompt caching costs can become prohibitive. Prompt caching lets agents recover a prompt prefix [38] . Manus notes that cache‑hit rate is the most important metric for production agents, and high‑capacity models with caching can be cheaper than low‑cost models without it [39] . Without caching, the cost of coding agents like Claude Code would be prohibitive [40] .
6. Isolating context
Many agents delegate tasks to sub‑agents with independent context windows, tools, and instructions, enabling parallelizable work. Claude Code uses sub‑agents for code‑review tasks, a MapReduce‑style pattern [41] . Long‑running agents also isolate context: the “Ralph Wiggum” loop repeatedly runs an agent until a plan is satisfied, storing progress in files and communicating via git history [42][43] . Anthropic describes a version of the Ralph loop where a parent agent sets up the environment and sub‑agents handle individual tasks [44] . Claude Code validates work after each Ralph iteration using stop hooks [45] .
7. Evolving context
Continual learning aims to let agents improve over time by updating their context (not model weights). Deployment failures often stem from an inability to adapt [46] . Letta AI discusses “continual learning in token space,” where agents reflect on past trajectories to update context [47] . The GEPA method collects agent trajectories, scores them, reflects on failures, and generates task‑specific prompt variants for further testing [48] . Open‑memory learning follows similar patterns, distilling sessions into diary entries and updating files such as CLAUDE.md or SKILL.md [49][50][51] . Skill‑learning examples show agents extracting reusable programs from trajectories and saving them as new skills [52][53] .
8. Future directions
Emerging patterns include giving agents a computer, pushing actions to the computer layer, offloading context with progressive disclosure, isolating context via sub‑agents, evolving context for memory or skill learning, and caching to save cost and latency. Numerous unresolved challenges are expected to persist over the next year.
9. Learning‑oriented context management
Context management may involve handcrafted compression prompts, generated sub‑agents, deciding when and what to offload, and evolving context over time. The “bitter lesson” predicts that scaling compute/model size will outpace handcrafted methods [54] . Jeffrey Huber proposes ideas beyond compression [55] . “Sleep‑time computation” shows agents can offline‑think their context, reflecting on past sessions to update memory or skills [56] .
10. Multi‑agent coordination
As agents take on larger tasks, many agents working concurrently will become common. Current agents struggle with shared context, leading to conflicting decisions when acting in parallel without explicit communication [57] . The “Gas Town” project demonstrates a multi‑agent coordinator with git‑backed work tracking, a “mayor” agent with full workspace context, and role‑specialized Claude Code instances coordinated via a merge queue [58] .
11. Abstractions for long‑running agents
Long‑running agents need new infrastructure: observability, human‑review hooks, and graceful degradation. Claude Code uses stop hooks to verify work after each iteration [59] , and the Ralph loop tracks progress via git history. No consensus yet exists on generic debugging interfaces or human‑machine collaboration monitoring, indicating a need for new abstractions as agent runtimes lengthen.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Tech Publishing
In the fast-evolving AI era, we thoroughly explain stable technical foundations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
