How Hermes Agent Self‑Evolves: Memory, Skills, and Offline Training Pipelines
This article dissects Hermes Agent’s self‑evolution mechanism, explaining how stable facts are stored in memory, reusable procedures become skills, and rollout trajectories are turned into training data through background review, context compression, and OPD‑based token‑level distillation.
Overview
Hermes Agent’s "evolution" happens in two layers: the context layer and the training data layer . The system does not rely on hidden tricks; it simply records experience, re‑uses it, and feeds the recorded trajectories back into an offline training loop.
1. Online Chain – Making the Agent More Handy Over Time
1.1 System Prompt Construction
When building the system prompt, Hermes separates stable content from temporary content. Stable parts are cached in a prefix and reused across turns, while temporary parts are injected only for the current API call.
def _build_system_prompt(self, system_message: str = None) -> str:
# 1. Agent identity
# 2. User / gateway system prompt
# 3. Persistent memory (frozen snapshot)
# 4. Skills guidance
# 5. Context files
if self._memory_store:
if self._memory_enabled:
mem_block = self._memory_store.format_for_system_prompt("memory")
if mem_block:
prompt_parts.append(mem_block)
if self._user_profile_enabled:
user_block = self._memory_store.format_for_system_prompt("user")
if user_block:
prompt_parts.append(user_block)
if self._memory_manager:
_ext_mem_block = self._memory_manager.build_system_prompt()
if _ext_mem_block:
prompt_parts.append(_ext_mem_block)
has_skills_tools = any(name in self.valid_tool_names for name in ["skills_list", "skill_view", "skill_manage"])
if has_skills_tools:
skills_prompt = build_skills_system_prompt(
available_tools=self.valid_tool_names,
available_toolsets=avail_toolsets,
)
if skills_prompt:
prompt_parts.append(skills_prompt)The format_for_system_prompt() method returns a frozen snapshot of the memory file, not the live state, so the cache remains stable and inference cost does not explode.
1.2 Memory Stores Stable Facts
Hermes uses two files as its built‑in memory: MEMORY.md: environment facts, project conventions, tool quirks, and other stable discoveries. USER.md: user preferences, communication habits, and repeatedly corrected requirements.
These files are read, deduplicated, and frozen into a snapshot at session start. Writes are atomic, scanned for injections, locked, and persisted without altering the already‑cached system prompt.
class MemoryStore:
def load_from_disk(self):
mem_dir = get_memory_dir()
mem_dir.mkdir(parents=True, exist_ok=True)
self.memory_entries = self._read_file(mem_dir / "MEMORY.md")
self.user_entries = self._read_file(mem_dir / "USER.md")
self.memory_entries = list(dict.fromkeys(self.memory_entries))
self.user_entries = list(dict.fromkeys(self.user_entries))
self._system_prompt_snapshot = {
"memory": self._render_block("memory", self.memory_entries),
"user": self._render_block("user", self.user_entries),
}Crucially, the system prompt never receives live updates; only the frozen snapshot is injected, keeping the prompt size predictable.
1.3 Skills Store Procedural Knowledge
Skills are stored under ~/.hermes/skills/. Each skill is a directory that may contain SKILL.md, references/, templates/, scripts/, and assets/. The skill manager builds a compact index of all visible skills and injects it into the system prompt, allowing the model to decide whether to load a particular skill.
Directory layout for user skills:
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md
│ ├── references/
│ ├── templates/
│ ├── scripts/
│ └── assets/The index is cached in an in‑memory LRU; only a cache miss triggers a full disk scan, preventing linear cost growth as the skill library expands.
1.4 Background Review Agent
After each main response, Hermes spawns a silent review agent that re‑examines the conversation and decides whether to store new memory or create/update a skill. Two prompts guide this process:
_MEMORY_REVIEW_PROMPT = (
"Review the conversation above and consider saving to memory if appropriate.
"
"Focus on: 1) User preferences worth remembering? 2) Expected behavior?
"
"If something stands out, save it using the memory tool. Otherwise, say 'Nothing to save.'"
)
_SKILL_REVIEW_PROMPT = (
"Review the conversation above and consider saving or updating a skill if appropriate.
"
"Focus on: non‑trivial approaches, trial‑and‑error lessons, or changed strategies.
"
"If a relevant skill exists, update it; otherwise create a new skill."
)If the final response is not interrupted and either review flag is true, the background review is triggered:
if final_response and not interrupted and (_should_review_memory or _should_review_skills):
self._spawn_background_review(
messages_snapshot=list(messages),
review_memory=_should_review_memory,
review_skills=_should_review_skills,
)2. Offline Chain – Turning Rollouts into Training Signals
2.1 Trajectory Export
Both interactive and batch runs export a trajectory JSONL record via agent/trajectory.py. The record contains the full conversation, timestamps, model name, and a completion flag.
def save_trajectory(trajectory, model, completed, filename=None):
if filename is None:
filename = "trajectory_samples.jsonl" if completed else "failed_trajectories.jsonl"
entry = {
"conversations": trajectory,
"timestamp": datetime.now().isoformat(),
"model": model,
"completed": completed,
}
with open(filename, "a", encoding="utf-8") as f:
f.write(json.dumps(entry, ensure_ascii=False) + "
")Batch runs deliberately disable persistent memory and context files ( skip_memory=True, skip_context_files=True) to avoid contaminating the dataset with local artefacts.
2.2 Reward Function Decomposition
Instead of a single binary score, the reward combines three signals: correctness , efficiency , and tool usage . For example, in AgenticOPDEnv:
if exit_code == 0 and "passed" in output.lower():
correctness = 1.0
elif exit_code == 0:
correctness = 0.8
elif "assert" in output.lower() and "error" in output.lower():
correctness = 0.2
else:
correctness = 0.1
if turns_used <= 3:
efficiency = 1.0
elif turns_used <= max_turns // 2:
efficiency = 0.8
# tool usage bonus
if "terminal" in tools_used and ("write_file" in tools_used or "patch" in tools_used):
tool_usage = 1.0
reward = min(1.0, max(0.0, cfg.correctness_weight * correctness +
cfg.efficiency_weight * efficiency +
cfg.tool_usage_weight * tool_usage))This granular feedback lets the model learn not only whether it succeeded, but also how efficiently it acted and whether it used the right tools.
2.3 Online Policy Distillation (OPD)
OPD extracts "hindsight" from each tool result. For every (assistant_turn, next_state) pair, a judge model generates a hint describing how the previous assistant action could be improved. The hint is then appended to the original context, forming an enhanced prompt that a stronger teacher model evaluates to produce top‑k token distributions.
# Step 1: extract turn pairs
turn_pairs = self._extract_turn_pairs(messages)
# Step 2: generate hint
hint = await self._extract_hint(pair["assistant_text"], pair["next_state_text"], pair["next_state_role"])
# Step 3: append hint to context
enhanced_messages = _append_hint_to_messages(pair["context_messages"], hint)
# Step 4: teacher model provides token‑level distribution
logprob_result = await self.server.get_logprobs(input_ids=enhanced_ids, top_k=k, split="eval")The resulting distill_token_ids and distill_logprobs are stored alongside the original trajectory, giving the student model dense, token‑level supervision instead of a single scalar reward.
3. Context Compression – Structured Hand‑off
When a session grows too long, Hermes does not simply truncate. It first runs flush_memories() to save any remaining stable facts, then inserts a sentinel user message that triggers a short memory tool call. Afterwards, ContextCompressor creates a structured handoff summary using a detailed template (goal, constraints, completed actions, active state, blockers, etc.). This summary replaces the middle of the conversation, preserving continuity while keeping token usage low.
def compress(self, messages, current_tokens=None, focus_topic=None):
# 1. prune old tool results
messages, pruned_count = self._prune_old_tool_results(...)
# 2. determine boundaries
compress_start = self.protect_first_n
compress_start = self._align_boundary_forward(messages, compress_start)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
# 3. summarize middle turns
turns_to_summarize = messages[compress_start:compress_end]
summary = self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# 4. assemble compressed messages
compressed = [...]
compressed = self._sanitize_tool_pairs(compressed)
return compressedThe summary template includes sections such as Goal, Constraints, Completed Actions, Active State, Blocked, Resolved Questions, Pending User Asks, Relevant Files, and Remaining Work. On the second compression pass, the previous summary is used as a starting point, ensuring that knowledge is not overwritten.
4. Session Lineage and Cross‑Session Recall
After compression, the current session is closed in the SQLite sessions table and a new child session is created with parent_session_id pointing to the old one. This preserves a full lineage, allowing session_search to walk back through history.
self._session_db.end_session(self.session_id, "compression")
old_session_id = self.session_id
self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
self._session_db.create_session(
session_id=self.session_id,
source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
model=self.model,
parent_session_id=old_session_id,
)Search is powered by an FTS5 virtual table over the messages table. The query is sanitized, matched, and the top sessions are retrieved. Each matched session transcript is truncated around the hit, then summarized by a lightweight model (e.g., Gemini Flash) before being returned.
def search_messages(self, query, source_filter=None, exclude_sources=None, role_filter=None, limit=20, offset=0):
query = self._sanitize_fts5_query(query)
where_clauses = ["messages_fts MATCH ?"]
# additional filters omitted for brevity
sql = f"""
SELECT m.id, m.session_id, m.role,
snippet(messages_fts, 0, '>>>', '<<<', '...', 40) AS snippet,
s.source, s.model, s.started_at AS session_started
FROM messages_fts
JOIN messages m ON m.id = messages_fts.rowid
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(where_clauses)}
ORDER BY rank
LIMIT ? OFFSET ?
"""
# execution omitted5. Delegation and Observation Isolation
Child agents are created with skip_memory=True and skip_context_files=True, preventing them from writing directly to the shared memory store. After a delegated task finishes, the parent records the result as an observation in the memory provider, ensuring a single source of truth.
DELEGATE_BLOCKED_TOOLS = frozenset(["delegate_task", "clarify", "memory", "send_message", "execute_code"])
child = AIAgent(..., skip_context_files=True, skip_memory=True, ...)
parent_agent._memory_manager.on_delegation(task=_task_goal, result=entry.get("summary", ""), child_session_id=...)6. Practical Takeaways
Stable facts are kept in MEMORY.md and USER.md as frozen snapshots.
Reusable procedures become skills stored under ~/.hermes/skills/.
Full rollout histories are persisted as sessions in SQLite, enabling cross‑session recall.
When context length approaches the model limit, Hermes first flushes memories, then compresses the middle with a structured hand‑off summary.
Offline training pipelines turn each rollout into a trajectory, compute a multi‑component reward, and optionally apply OPD to generate token‑level distillation data.
7. Key Files to Explore
run_agent.py– main loop, system‑prompt assembly, memory flush, compression, background review. tools/memory_tool.py – file‑backed memory implementation. agent/memory_manager.py – unified interface for built‑in and external memory providers. tools/skill_manager_tool.py – skill creation, modification, directory layout. agent/prompt_builder.py – how memory, skills, and context files are injected into the system prompt. agent/context_compressor.py – algorithm for pruning, summarizing, and sanitizing messages. hermes_state.py – SQLite schema, FTS5 virtual table, session lineage. tools/session_search_tool.py – bridge from FTS5 matches to focused summaries. batch_runner.py – large‑scale trajectory generation with memory and context disabled. environments/hermes_base_env.py – rollout execution, reward calculation, sandbox verification. environments/agentic_opd_env.py – hindsight hint extraction and token‑level distillation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
o-ai.tech
I’ll keep you updated with the latest AI news and tech developments in real time—let’s embrace AI together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
