Artificial Intelligence 17 min read

Learning Agent Architecture from Giants: Blueprint of Hermes and Claude Code

The article breaks down a six‑layer agent architecture—entry, core loop, tool ecosystem, memory & learning, scheduling & orchestration, and output delivery—illustrating how Hermes and Claude Code implement each layer and offering guidance on choosing the right framework for specific needs.

AI Step-by-Step

May 24, 2026

Learning Agent Architecture from Giants: Blueprint of Hermes and Claude Code

AI agents such as Claude Code, Codex, OpenClaw, and Hermes have demonstrated strong capabilities, prompting the need for a systematic way to design new agents by learning from these mature implementations.

Six‑Layer Agent Architecture Overview

The author extracts a six‑layer model from Hermes and the open‑source Claude Code, where each layer addresses a distinct engineering concern:

0. Entry Layer – how tasks reach the agent.

1. Core Loop – prompt assembly, model invocation, tool execution, and result injection.

2. Tool Ecosystem – built‑in tools, MCP integration, and tool registry.

3. Memory & Learning – persistent memory, session storage, and skill system.

4. Scheduling & Orchestration – background processes, cron jobs, and delegate tasks.

5. Output Delivery – returning results via gateways, file system, or TTS.

1. Entry Layer

Agents can receive tasks through several channels:

CLI / TUI – Hermes provides a zero‑latency terminal interface with commands such as hermes config set, hermes tools and hermes setup.

Gateway (multi‑platform) – supports Telegram, Discord, Slack, WhatsApp, Teams, SMS, WebChat, etc., normalising message formats, media handling, acknowledgements and rate limits.

ACP (Agent Communication Protocol) – a standardized agent‑to‑agent protocol that lets Hermes call external coding agents (e.g., Claude Code) as sub‑agents.

API / Cron – an HTTP server for REST calls and a cron scheduler that triggers non‑interactive sessions and routes results back through the appropriate gateway.

2. Core Loop

The heart of the agent is a closed loop that repeats for each user message:

Prompt Builder → Provider Resolution → Model Call → Tool Parsing → Tool Execution → Result Injection → Prompt Builder . The loop may iterate multiple times when tools are invoked.

Prompt Builder assembles a stable system‑prompt prefix containing persona, tool rules, frozen snapshots of MEMORY.md and USER.md, skill indexes, project context, timestamps and platform hints. A stable prefix enables prompt caching on Anthropic and OpenAI, reducing latency and cost. Claude Code uses a more aggressive fixed system prompt, injecting project‑level preferences via CLAUDE.md.

Context Manager guards the token budget. When history exceeds the limit, it compresses the middle of the conversation while preserving the head and tail, and tracks prompt‑caching hit rates. Hermes follows a “protect head & tail, compress middle” policy; Claude Code and Codex trigger a simple automatic compression near the context limit.

Provider Resolution selects the model channel (Anthropic API, OpenAI‑compatible endpoint, OpenRouter, DeepSeek, local llama.cpp, etc.) based on explicit runtime flags, config.yaml, environment variables, or defaults, and assembles the request format, base URL, credentials, and fallback chain, avoiding hidden overrides.

3. Tool Ecosystem

Agents gain capability through tools, sourced from three avenues:

Built‑in tools – Hermes ships with over 70 tools covering terminal, file I/O, browser automation, web search, git, GitHub PR/Issue/Review, memory, session search, skill management, cron jobs, delegation, vision analysis, text‑to‑speech, etc., each with a defined input‑output schema and invoked via function calling.

MCP integration – the Model Context Protocol (Anthropic) standardises external tool servers. Hermes supports stdio and HTTP transports, configurable in config.yaml, and can filter exposed tools for safety.

Tool Registry – registers, groups (toolsets), and controls permissions for tools. Hermes organises tools into domains (browser, terminal, file, web, search, skills, memory) and enables per‑scenario activation, limiting exposure for security.

Skill encapsulation – higher‑level workflows that chain multiple tool calls. For example, a “Write WeChat article” skill orchestrates topic classification, web search, source filtering, style checking, and HTML generation, injected as context for the agent.

4. Memory & Learning

Modern agents differ from the “reset each turn” model by persisting knowledge.

Persistent memory – MEMORY.md stores factual experience (e.g., "project uses pytest with xdist"), while USER.md stores user preferences (e.g., "prefer concise replies"). These files are snapshot‑injected at the start of each round and automatically updated when the agent learns.

Session storage – after each round, the full conversation is written to a SQLite database with FTS5 full‑text search. The session_search tool can retrieve history by keyword, phrase, or boolean expression, avoiding the need to keep all history in the context window.

Skill system – records procedural memory. Successful task executions are saved as skill files; subsequent similar tasks load the skill directly, and skills can be patched on‑the‑fly when new edge cases are discovered, forming a self‑growing learning loop.

Claude Code relies on a file‑system‑based memory ( CLAUDE.md) and lacks a native factual memory layer, making it better suited for one‑off coding tasks rather than long‑term collaboration.

5. Scheduling & Orchestration

Beyond turn‑by‑turn interaction, agents need temporal autonomy.

Background process management – Hermes terminal tools support background=true, assigning a session_id. Agents can poll, log, wait, submit input, or kill the process, and receive automatic notifications on completion.

Cron scheduling – the cronjob tool creates timed tasks (e.g., every 30 minutes monitoring or daily summary). Each job runs in an isolated session with its own skill set and model configuration; no_agent mode runs pure scripts without consuming LLM tokens.

Delegate tasks & workflow – the delegate_task tool implements a sub‑agent pattern, packaging a goal, context, and toolset for an independent child agent. Up to three child agents can run in parallel. Claude Code offers a similar sub‑agent mode.

Three time scales – real‑time (background processes), scheduled (cron), and parallel (sub‑agent distribution).

6. Output Delivery

The final layer returns results to the user through the appropriate channel:

CLI – direct terminal echo.

Gateway – platform‑specific formatting (Telegram HTML, Discord rich embeds, Slack Block Kit) and media routing.

API – HTTP response payload.

Cron – results posted back via the gateway.

File system – generated code files, PDFs, CSVs, or HTML pages for downstream consumption.

Text‑to‑speech – built‑in text_to_speech tool using Edge, OpenAI, or ElevenLabs providers, delivering audio messages especially useful on mobile chat apps.

Conclusion

When evaluating or building an agent framework, consider the trade‑offs at each of the six layers. Hermes invests heavily in memory and scheduling, while Claude Code excels in a lean core loop and tool invocation. Selecting a framework therefore depends on which layer requires the most depth for your use case.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory management AI agents scheduling Hermes agent architecture Claude Code tool ecosystem

Written by

AI Step-by-Step

Sharing AI knowledge, practical implementation records, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.