Artificial Intelligence 24 min read

Deep Dive into Hermes Agent: Self‑Improving AI Agent Architecture with 110K+ Stars

Hermes Agent, an open‑source self‑improving AI agent framework that has amassed over 110 K GitHub stars, introduces a native closed‑learning loop, a unified single‑process agent cycle, self‑registering tools, pluggable context compression, multi‑API model support, and a scalable multi‑platform gateway, all built on Python 3.11+, SQLite + WAL, and extensive modular design.

Shuge Unlimited

Apr 23, 2026

Deep Dive into Hermes Agent: Self‑Improving AI Agent Architecture with 110K+ Stars

Project Positioning and Core Differentiation

Hermes Agent is positioned as a self‑improving AI agent . Unlike most frameworks that focus on making large language models (LLMs) better at invoking tools, Hermes Agent aims to make the agent smarter the more it is used. This goal drives a five‑stage closed‑learning loop:

Planning memory – after a task finishes the agent decides what to remember.

Skill creation – recurring patterns are turned into Markdown skill files.

Skill self‑improvement – failed skills are automatically patched.

FTS5 retrieval – historical dialogue is recalled via SQLite full‑text search.

User modeling – behavior is used to infer preferences.

The loop maps to three cognitive memory types: situational (conversation history), semantic (persistent facts in MEMORY.md), and procedural (skill files).

Core Architecture Overview

The repository follows a clear directory‑to‑responsibility mapping: run_agent.py – AIAgent class, core agent loop. tools/ – 40+ tool implementations (terminal, browser, file, vision, etc.). toolsets.py – definition and composition of toolsets. agent/prompt_builder.py – system prompt assembly. agent/memory_manager.py – dual‑provider memory management. agent/context_engine.py – abstract ContextEngine interface. agent/context_compressor.py – default ContextCompressor implementation. hermes_state.py – SQLite state storage with FTS5 full‑text search. gateway/run.py – multi‑platform message gateway. hermes_cli/ – CLI interface (51 modules). environments/ – execution back‑ends (local, Docker, SSH, etc.).

Key design decisions include running the agent loop and tool execution in the same process (no micro‑service split), a self‑registration mechanism for tools via registry.register(), a pluggable context compressor, and SQLite + WAL for lightweight multi‑reader/single‑writer state storage.

Agent Loop Deep Dive

The loop in run_agent.py follows this sequence:

receive user message → build request (system prompt + memory + context) → call LLM → parse response → if tool call → execute tool → inject tool result → repeat until no tool call → return final response

Iteration Budget Control

Parent agent default budget: 90 iterations.

Child agent default budget: 50 iterations.

This prevents infinite tool‑call loops and token exhaustion.

Parallel Tool Execution

The method _should_parallelize_tool_batch() classifies tools into three categories: _NEVER_PARALLEL_TOOLS – never parallel (e.g., clarify which requires user interaction). _PARALLEL_SAFE_TOOLS – read‑only and safe to run concurrently (e.g., web_search, read_file). _PATH_SCOPED_TOOLS – path‑isolated tools that can run in parallel only when they operate on different files (e.g., read_file, write_file, patch).

Maximum worker threads are hard‑coded to 8 ( _MAX_TOOL_WORKERS). Read‑only tools are fully parallel; write‑related tools require path isolation to avoid conflicts.

Interrupt and Steering Mechanism

Two flags, _interrupt_requested and _pending_steer, ensure that an ongoing tool batch finishes before a new user directive is injected, preserving atomicity of operations such as write_file.

Multi‑API Model Support

AIAgent

supports four API modes – chat_completions, codex_responses, anthropic_messages, and bedrock_converse – and automatically routes to providers such as OpenAI, Anthropic, Bedrock, OpenRouter, and Copilot ACP. Switching models requires creating a new AIAgent instance; no code changes are needed.

Prompt Builder

agent/prompt_builder.py

assembles a multi‑layer system prompt, injecting identity, platform hints, skill index, memory snapshots, and guidance sections: MEMORY_GUIDANCE – how memory should be used. TOOL_USE_ENFORCEMENT_GUIDANCE – rules for tool usage.

PromptBuilder caches the skill index with an in‑memory LRU plus a disk snapshot to speed up look‑ups.

Tool System: Self‑Registration and Toolset Composition

Tool registration lives in tools/registry.py. Each tool module calls registry.register() at import time, declaring its schema, handler, and toolset. Registration uses a thread‑safe threading.RLock() and provides snapshot reads.

Discovery is automatic via AST analysis of .py files that contain registry.register(), eliminating a manual tool list. toolsets.py implements a composable toolset system where toolsets can include other sets, enabling recursive resolution. The core tool list _HERMES_CORE_TOOLS contains 63 tools across categories (Web, Terminal, File, Vision, Skills, Browser, Planning, Code Execution, Scheduling, Others). Platform‑specific toolsets (e.g., hermes‑cli, hermes‑telegram, hermes‑discord) extend the core set, allowing new platforms to add only their unique tools while reusing shared implementations.

Memory and Learning Closed Loop

Memory management uses a dual‑provider architecture: an internal provider handling MEMORY.md (personal notes, environment facts, tool tricks) and USER.md (user preferences, communication style). Memory injection uses a <memory‑context> fence to keep the model from treating memory as fresh user input. Entries are separated by the section sign § to avoid collisions with normal text.

Freeze‑Snapshot Mode

At session start, MemoryManager snapshots current memory into the system prompt. Subsequent memory writes update the disk files but do not refresh the prompt, preserving KV‑cache effectiveness and avoiding repeated token recomputation.

Skill System

After completing a complex task, the agent automatically detects reusable patterns, generates a Markdown skill file (trigger conditions, steps, cautions), and stores it under ~/.hermes/skills/. If a skill fails, it is patched automatically. PromptBuilder includes SKILLS_GUIDANCE to steer creation and update, and employs a two‑level cache (LRU + disk) for fast skill lookup.

Session Search

Full‑text search across all sessions is powered by SQLite FTS5 ( SessionDB). The flow is: session_search (FTS5 recall) → LLM summarises results → injects summary with memory snapshots → continues the main loop. This constitutes a multi‑level retrieval design.

Context Management: Pluggable Compression Engine

agent/context_engine.py

defines the abstract ContextEngine. The default implementation ContextCompressor applies a configurable compression strategy with parameters such as threshold_percent (0.75), protect_first_n (3 messages), and protect_last_n (6 messages). The process protects head and tail messages, uses an auxiliary LLM to summarise the middle portion, applies a summarisation template (solved problems, pending items, active tasks), trims tool outputs, and distributes token budget proportionally.

Additional compressors like trajectory_compressor.py support specialised batch trajectory generation for research scenarios.

Prompt Builder Security Scan

The private method _scan_context_content() checks context files ( .hermes.md, AGENTS.md, CLAUDE.md, .cursorrules) for prompt‑injection patterns, prioritising files in the order listed.

Sub‑Agent Delegation Mechanism

Implemented in tools/delegate_tool.py, sub‑agents are isolated: they lack parent history, have independent terminal sessions, and are restricted from using certain tools via DELEGATE_BLOCKED_TOOLS. Delegation depth is tracked by _delegate_depth (default MAX_DEPTH=1), and concurrency is limited to three sub‑agents using a ThreadPoolExecutor. Roles can be leaf (execute only) or orchestrator (can delegate further).

Multi‑Platform Gateway Architecture

gateway/run.py

provides a single‑process message gateway handling 17+ platforms (Telegram, Discord, Slack, WhatsApp, Signal, Email, SMS, Home Assistant, Mattermost, Matrix, DingTalk, Feishu, WeChat, WeCom, QQ, BlueBubbles, Webhook). Agent instances are cached (capacity 128) with LRU eviction and a 1‑hour idle TTL. The gateway maps incoming messages to the appropriate cached or newly created agent, processes the message through the agent loop, and routes the response back to the originating platform.

State Persistence: SQLite + FTS5

hermes_state.py

implements SessionDB using SQLite in WAL mode, supporting multi‑reader/single‑writer concurrency. Full‑text search is enabled via FTS5. Schema versioning is at v8 with automatic migrations. Write contention is mitigated by a random‑back‑off retry (15 attempts, 20‑150 ms jitter) to avoid convoy effects.

Strengths and Limitations

Native closed‑learning loop integrated from day 1 (planning memory, skill creation, skill self‑improvement, FTS5 retrieval, user modeling).

Self‑registering tools with composable toolsets simplify platform extensions.

Pluggable context compression with configurable thresholds and protection zones.

Iteration‑budget safeguards and interrupt‑steering mechanism prevent runaway loops.

Unified multi‑platform gateway with LRU+TTL agent caching.

SQLite + FTS5 provides lightweight, file‑based state with full‑text search.

Limitations include a sub‑agent concurrency cap of three, rapid release cadence (four major versions in three weeks), Windows incompatibility without WSL2, and ongoing controversy over alleged architectural plagiarism.

Overall, Hermes Agent implements Mitchell Hashimoto’s “Harness Engineering” five‑component model (instruction, constraint, feedback, memory, orchestration) as built‑in capabilities, making it one of the most complete open‑source AI agent frameworks available.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MCP AI Agent SQLite Multi‑Platform Context Compression Tool Registry Hermes Agent Closed Loop Learning

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Project Positioning and Core Differentiation

Core Architecture Overview

Agent Loop Deep Dive

Iteration Budget Control

Parallel Tool Execution

Interrupt and Steering Mechanism

Multi‑API Model Support

Prompt Builder

Tool System: Self‑Registration and Toolset Composition

Memory and Learning Closed Loop

Freeze‑Snapshot Mode

Skill System

Session Search

Context Management: Pluggable Compression Engine

Prompt Builder Security Scan

Sub‑Agent Delegation Mechanism

Multi‑Platform Gateway Architecture

State Persistence: SQLite + FTS5

Strengths and Limitations

Shuge Unlimited

How this landed with the community

Was this worth your time?

0 Comments

State Persistence: SQLite + FTS5