Artificial Intelligence 20 min read

How Hermes Agent Structures Persistent Memory, Skills, and Session Search

This article dissects Hermes Agent's three‑layer persistence model, skill discovery mechanisms, tool registration and scheduling, session‑search retrieval, and automated skill evolution, highlighting design trade‑offs, concurrency handling, and practical pitfalls for building robust AI‑driven agents.

Architecture and Beyond

Apr 19, 2026

How Hermes Agent Structures Persistent Memory, Skills, and Session Search

Three‑Layer Persistence

Hermes Agent stores long‑term information in three distinct layers. Memory lives in ~/.hermes/memories/MEMORY.md (environment facts, work conventions, tool temperament) and USER.md (user preferences, communication style). Skills reside under ~/.hermes/skills/ and contain procedural knowledge – essentially "how to do a task". Session Search uses SQLite + FTS5 to index past conversations, allowing selective retrieval of historical context without flooding the current prompt.

The design assigns stable facts to Memory, on‑demand procedural steps to Skills, and cross‑session recollection to Session Search, reducing prompt size while requiring three separate storage and access pathways.

Skill Discovery

System‑prompt layer : During AIAgent._build_system_prompt() the function build_skills_system_prompt() builds a compact index of available skills (category, name, short description, visibility conditions) and injects it into the system prompt.

Memory provides stable facts for the prompt.

Skills supply programmatic knowledge, loaded as needed.

Session Search offers cross‑session recall only when required.

Index entries are not static; they are filtered by platform, disabled flags, and required tools using front‑matter keys such as platforms, metadata.hermes.requires_tools, requires_toolsets, and fallback_for_tools. A two‑level cache (in‑process LRU and on‑disk .skills_prompt_snapshot.json) avoids rebuilding the index on every start.

Progressive Disclosure

After the model knows a skill exists, the runtime layer lazily loads the full skill content via two tools: skills_list – returns minimal metadata for further filtering. skill_view – loads SKILL.md and associated references/, templates/, scripts/, assets/ when needed.

The process includes locating the skill (local, external repo, or plugin namespace), performing safety checks (platform match, disable filters, path‑traversal protection), and returning the body plus any auxiliary files. If a skill declares environment variables or credential files, those readiness details are also returned.

Command Recall

Hermes Agent maps each skill to a dynamic slash command (e.g., /skill-name) via agent/skill_commands.py. Triggering the command injects an "activation message" into the current message stream, offering a faster, switch‑like interaction compared with typing a full prompt.

Three discovery entry points (system‑prompt index, skills_list, and slash command) can diverge if their caching or filtering logic drifts, leading to inconsistencies such as a skill visible in the prompt but not callable via command.

Skill Distribution (Skills Hub)

The Skills Hub acts as a lightweight package manager. Its entry points ( skills_hub.py and tools/skills_hub.py) support multiple source adapters (official catalog, arbitrary GitHub repos, third‑party markets). Installation follows a quarantine‑then‑move workflow with security scanning and a lockfile that records source and version.

Bundled skills are synchronized by skills_sync.py, which compares bundled hashes, local hashes, and manifest origins, preserving user‑modified versions across upgrades.

Tool Invocation Architecture

Each tool registers itself via registry.register(), declaring schema, handler, toolset, and availability checks (see model_tools.py). The agent first parses toolsets (e.g., web, skills, browser) to flatten definitions and filter out unavailable tools using each tool's check_fn.

During dispatch, schema‑level corrections remove cross‑references to unavailable tools, preventing hallucinated calls.

Scheduling Details

When a model emits a tool call, Hermes Agent intercepts special agent‑owned tools ( todo, memory, session_search, delegate_task) before handing the rest to handle_function_call() in the registry. Agent‑owned tools need access to session state (e.g., MemoryStore, SessionDB).

Two‑layer scheduling separates agent‑owned tools from registry‑owned tools, but duplicated logic currently exists in run_agent.py and should be unified under invoke_agent_tool().

Concurrency Execution

Batch tool calls are parallelized when safe: read‑only or non‑overlapping file operations run in a thread pool, while dependent or state‑mutating calls fall back to serial execution. Concurrency demands that handlers be thread‑safe, respecting file locks in MemoryStore and WAL in SessionDB.

Memory Management

MemoryStore

maintains live entries (updated immediately after a tool call) and a frozen snapshot injected into the system prompt at session start. The snapshot is immutable during the session to keep the prompt cache stable.

Memory files enforce character limits, entry‑level deduplication, atomic writes with file locks, and content scanning to block prompt injection or leakage of sensitive data.

Pre‑Compression Flush

Before context compression, Hermes Agent calls flush_memories(). It temporarily adds a system‑type user message prompting the model to store important content, invokes the memory tool, and then removes the flush marker from the message list.

Session Search Layer

Session Search stores "what happened in the past" rather than "what must be remembered every round". It queries the messages_fts table in state.db, aggregates by session, excludes the current session, selects top‑N results, truncates to the most relevant fragments, and passes them to an auxiliary model for summarization.

Sanitization prevents malformed SQLite MATCH queries, and a fallback raw preview ensures a non‑empty result when summarization fails.

Skill Evolution

The skill_manage tool enables six actions: create, edit, patch, delete, write_file, and remove_file. The patch action is preferred because incremental edits are cheaper and safer than full rewrites; fuzzy matching tolerates minor formatting differences.

After a skill is written, the system runs a security scan and clears the skills system‑prompt cache so the next index build sees the change immediately.

Automatic Review

Two automatic review mechanisms keep the skill base healthy:

Nudge : after a configurable number of memory updates or tool iterations, a background review agent examines the session for content worth persisting to memory or converting into a skill.

Guidance : the system prompt explicitly advises the model to create or patch skills when it encounters complex, repeatable, or outdated workflows.

While this reduces manual upkeep, the nudges are soft reminders, and without observable logs the effectiveness of the review agent is hard to measure.

Costs and Pitfalls

Key challenges include:

Multiple discovery entry points can drift, causing mismatches between prompt‑visible and command‑invokable skills.

Frozen memory snapshots improve cache stability but create a perception gap where newly written facts are invisible until the next session.

Skill metadata lacks lifecycle details (creator, reason for patch, last verification), increasing maintenance overhead as the skill library grows.

Automatic review actions are not observable without dedicated event logging, making it difficult to assess their impact.

Agent‑owned tool implementations duplicate logic across serial and concurrent paths, risking divergence as the codebase expands.

Addressing these issues—unifying the skill catalog service, enriching skill metadata, adding observability to review agents, and consolidating scheduling logic—will improve robustness and scalability of Hermes Agent.

software architecture memory management AI agents skill discovery tool orchestration session search

Written by

Architecture and Beyond

Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.