Cross‑Session Retrieval with SQLite FTS5 and LLM Summaries – Hermes Agent’s Four‑Layer Architecture
This article dissects Hermes Agent’s four‑layer cross‑session retrieval system, covering persistent storage, dual‑table FTS5 indexing for CJK and English, a three‑path search strategy, intelligent truncation for LLM prompts, structured summarisation, and a holographic retrieval layer that blends FTS5, Jaccard similarity and HRR vector algebra.
Four‑Layer Cross‑Session Retrieval
Hermes stores all raw messages in a SQLite state.db file. The sessions table holds session metadata; the messages table holds each message.
Two FTS5 virtual tables index messages.content in parallel: messages_fts – uses the unicode61 tokenizer for English and standard Unicode. messages_fts_trigram – uses a trigram tokenizer for CJK text.
Layer 3 ( session_search_tool) retrieves candidate messages, groups them by session, loads surrounding context, and runs an auxiliary LLM to produce a structured summary.
Layer 4 (Holographic Retrieval) combines three scoring dimensions – FTS5 text match, Jaccard set similarity, and HRR vector algebra – and applies trust‑weight and time‑decay adjustments.
SQLite Schema and Dual‑Table FTS5 Index
-- sessions table: session metadata
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
parent_id TEXT, -- points to compressed child session
created_at TIMESTAMP,
updated_at TIMESTAMP,
title TEXT,
model TEXT
);
-- messages table: each message
CREATE TABLE messages (
id TEXT PRIMARY KEY,
session_id TEXT REFERENCES sessions(id),
role TEXT, -- user/assistant/system/tool
content TEXT,
created_at TIMESTAMP
);Two FTS5 tables are required because the unicode61 tokenizer splits Chinese characters into single‑character tokens, producing noisy matches. The trigram table preserves three‑character sliding windows (e.g., "大别山") and dramatically improves Chinese search precision.
# Python – create dual FTS5 tables
FTS_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts
USING fts5(content, content_rowid=rowid);
"""
FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram
USING fts5(content, tokenize='trigram');
"""Triggers keep the virtual tables in sync with messages inserts, updates, and deletes.
# Insert trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_insert
AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts(rowid, content) VALUES (new.rowid, new.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.rowid, new.content);
END;
# Delete trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_delete
AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.rowid, old.content);
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.rowid, old.content);
END;
# Update trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_update
AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.rowid, old.content);
INSERT INTO messages_fts(rowid, content) VALUES (new.rowid, new.content);
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.rowid, old.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.rowid, new.content);
END;search_messages – Three‑Path Dispatch
The search_messages entry point selects a search path based on the presence of CJK characters in the query.
# Python implementation
def search_messages(self, query: str, limit: int = 50) -> list<SearchResult>:
cjk_chars = [c for c in query if '\u4e00' <= c <= '\u9fff']
if not cjk_chars:
# Path 1 – pure English → FTS5 standard MATCH
return self._search_fts5_standard(query, limit)
elif len(cjk_chars) >= 3:
# Path 2 – CJK ≥ 3 characters → trigram table
return self._search_fts5_trigram(query, limit)
else:
# Path 3 – short CJK → LIKE fallback
return self._search_like_fallback(query, limit)
def _search_fts5_standard(self, query: str, limit: int) -> list<SearchResult>:
sql = """
SELECT m.id, m.session_id, m.role, m.created_at,
snippet(messages_fts, 0, '>>>', '<<<', '...', 40) AS snippet,
rank
FROM messages_fts
JOIN messages m ON messages_fts.rowid = m.rowid
WHERE messages_fts MATCH ?
ORDER BY rank
LIMIT ?
"""
return self.db.execute(sql, [query, limit]).fetchall()
def _search_fts5_trigram(self, query: str, limit: int) -> list<SearchResult>:
sql = """
SELECT m.id, m.session_id, m.role, m.created_at,
snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
rank
FROM messages_fts_trigram
JOIN messages m ON messages_fts_trigram.rowid = m.rowid
WHERE messages_fts_trigram MATCH ?
ORDER BY rank
LIMIT ?
"""
return self.db.execute(sql, [query, limit]).fetchall()
def _search_like_fallback(self, query: str, limit: int) -> list<SearchResult>:
sql = """
SELECT id, session_id, role, created_at, content AS snippet
FROM messages
WHERE content LIKE ?
ORDER BY created_at DESC
LIMIT ?
"""
return self.db.execute(sql, [f'%{query}%', limit]).fetchall()Each result also includes one message before and after the hit, providing context for downstream processing.
session_search_tool – From Raw Hits to Structured Summaries
The tool operates in two modes:
recent mode (no keyword) – lists the most recent N sessions with title and timestamp.
search mode (keyword supplied) – performs full‑text search, groups hits by session, truncates relevant windows, and runs parallel LLM summarisation.
# Python – full search workflow
async def search_sessions(self, query: str, max_results: int = 5) -> str:
# Step 1: FTS5 search for candidate messages
hits = self.db.search_messages(query, limit=50)
if not hits:
return "No matching sessions found."
# Step 2: Group by session, exclude current lineage
session_groups: dict = {}
current_lineage = self._get_session_lineage(self.current_session_id)
for hit in hits:
parent_id = self._resolve_to_parent(hit.session_id)
if parent_id in current_lineage:
continue
session_groups.setdefault(parent_id, []).append(hit)
# Step 3: Top‑N sessions by hit count
top_sessions = sorted(session_groups.items(), key=lambda x: len(x[1]), reverse=True)[:max_results]
# Step 4: Parallel LLM summarisation
tasks = []
for session_id, hits in top_sessions:
conversation = self.db.get_messages_as_conversation(session_id)
truncated = self._truncate_around_matches(conversation, query, max_chars=100_000)
tasks.append(self._summarize_session(truncated, query, session_id))
summaries = await asyncio.gather(*tasks)
return self._format_results(summaries)Intelligent Truncation Algorithm
When a session contains tens of thousands of characters, only the most relevant window around the query terms is kept for the LLM.
# Python – window selection around matches
def _truncate_around_matches(self, text: str, query: str, max_chars: int = 100_000) -> str:
if len(text) <= max_chars:
return text
positions = []
query_lower = query.lower()
text_lower = text.lower()
for term in query_lower.split():
start = 0
while True:
idx = text_lower.find(term, start)
if idx == -1:
break
positions.append(idx)
start = idx + 1
if not positions:
return text[:max_chars]
positions.sort()
window_size = max_chars
best_start = 0
best_count = 0
for pos in positions:
window_start = max(0, pos - window_size // 4) # bias start 1/4 before match
window_end = window_start + window_size
count = sum(1 for p in positions if window_start <= p <= window_end)
if count > best_count:
best_count = count
best_start = window_start
return text[best_start:best_start + max_chars]LLM Summarisation Prompt
SUMMARIZE_PROMPT = """Summarize this conversation excerpt.
Focus on content related to: {query}
Requirements:
- Use past tense (this conversation already happened)
- Preserve specific details: names, numbers, commands, file paths, URLs
- Focus on the search topic, skip unrelated parts
- Keep the summary under 300 words
- Structure: what was discussed → what was decided → what was the outcome
Conversation:
{truncated_text}"""Holographic Retrieval – Beyond Text Matching
FTS5 finds messages containing the keywords, but some queries need semantic understanding (e.g., “What was my opinion on Kubernetes?”). The holographic module adds two extra scoring dimensions.
Three‑Way Mixed Scoring
# Python – FactRetriever.search()
class FactRetriever:
async def search(self, query: str, limit: int = 10, hrr_mode: str = "probe") -> list<ScoredFact>:
# 1️⃣ FTS5 loose recall (3× candidates)
candidates = self.db.fts_search(query, limit=limit * 3)
# 2️⃣ Jaccard set similarity boost
query_tokens = set(tokenize(query))
for fact in candidates:
fact_tokens = set(tokenize(fact.content))
jaccard = len(query_tokens & fact_tokens) / len(query_tokens | fact_tokens)
fact.score += jaccard * JACCARD_WEIGHT
# 3️⃣ HRR vector algebra similarity
query_hrr = self._encode_hrr(query)
for fact in candidates:
hrr_sim = self._hrr_similarity(query_hrr, fact.hrr_vector, mode=hrr_mode)
fact.score += hrr_sim * HRR_WEIGHT
# Trust‑weight
for fact in candidates:
fact.score *= fact.trust_weight
# Time decay
now = time.time()
for fact in candidates:
age_days = (now - fact.timestamp) / 86400
decay = math.exp(-age_days / HALF_LIFE_DAYS)
fact.score *= decay
candidates.sort(key=lambda f: f.score, reverse=True)
return candidates[:limit]HRR Query Modes
probe – direct similarity (e.g., "Memory about Docker").
related – associated concepts (e.g., "Technologies related to Docker").
reason – causal inference (e.g., "Why Docker was chosen").
contradict – contradiction detection (e.g., "Memories that conflict with previous statements").
# Python – HRR similarity implementation
def _hrr_similarity(self, query_vec: np.ndarray, fact_vec: np.ndarray, mode: str) -> float:
if mode == "probe":
return np.dot(query_vec, fact_vec) / (np.linalg.norm(query_vec) * np.linalg.norm(fact_vec))
elif mode == "related":
unbound = np.fft.ifft(np.fft.fft(fact_vec) / np.fft.fft(query_vec)).real
return np.max(unbound)
elif mode == "reason":
causal_vec = self._get_causal_binding(query_vec)
return np.dot(causal_vec, fact_vec) / (np.linalg.norm(causal_vec) * np.linalg.norm(fact_vec))
elif mode == "contradict":
neg_query = -query_vec
return np.dot(neg_query, fact_vec) / (np.linalg.norm(neg_query) * np.linalg.norm(fact_vec))Memory Provider – Lifecycle Hooks
# Python – abstract MemoryProvider
class MemoryProvider(ABC):
@abstractmethod
async def prefetch(self, session_id: str, context: dict) -> str:
"""Called before each turn to preload relevant memories into the system prompt"""
pass
@abstractmethod
async def on_session_end(self, session_id: str, messages: list) -> None:
"""Extract facts at the end of a session and store them permanently"""
pass
@abstractmethod
async def sync_turn(self, session_id: str, message: dict) -> None:
"""Synchronise after each turn for real‑time index updates"""
pass
@abstractmethod
async def on_pre_compress(self, session_id: str, messages: list) -> None:
"""Ensure important information is retained before compression"""
passThe prefetch workflow loads relevant history from FTS5/Holographic retrieval, generates a short memory injection, and adds it to the system prompt, giving the agent a human‑like associative start.
Common Pitfalls and Fixes
Pitfall 1 – CJK Tokenisation Disaster
Using only unicode61 splits "大别山项目进度" into seven single‑character tokens, causing noisy matches. Routing queries with three or more CJK characters to the trigram table yields exact three‑character sequence matches.
# Wrong – only unicode61
# Search "大别山" matches "别忘了山上的大石头" ❌
# Correct – trigram for CJK ≥3 chars
# Search "大别山" matches only messages containing the exact sequence ✅Pitfall 2 – Child Session Lineage
After compression, a logical conversation may be split into multiple child sessions. Resolve the true parent recursively with _resolve_to_parent.
def _resolve_to_parent(self, session_id: str) -> str:
"""Recursively resolve to the top‑level parent session"""
current = session_id
visited = set()
while True:
if current in visited:
break # prevent loops
visited.add(current)
parent = self.db.get_session_parent(current)
if parent is None:
return current
current = parent
return currentPitfall 3 – Excluding Current Session
When the user asks about a topic already present in the current dialogue, the search may return the current session itself. Compute the full lineage (ancestors + descendants) and filter it out.
def _get_session_lineage(self, session_id: str) -> set[str]:
"""Collect all ancestors and descendants of a session"""
lineage = set()
current = session_id
while current:
lineage.add(current)
current = self.db.get_session_parent(current)
self._collect_descendants(session_id, lineage)
return lineagePitfall 4 – FTS5 Syntax Errors
User queries may contain special characters ("*", "\"", "OR") that break the FTS5 parser. The implementation catches sqlite3.OperationalError and falls back to a safe LIKE search.
def search_messages(self, query: str, limit: int = 50) -> list<SearchResult>:
try:
return self._search_fts5(query, limit)
except sqlite3.OperationalError:
escaped = query.replace('%', r'\%').replace('_', r'\_')
return self._search_like_fallback(escaped, limit)Summary of Core Design Points
Dual‑table FTS5 is essential for accurate CJK full‑text search.
The three‑path dispatch balances precision (English) and coverage (CJK long vs. short).
Aggregating messages into sessions matches user intent – they care about "which conversation" rather than a single line.
Intelligent truncation selects the densest window around matches, fitting LLM token limits while preserving information.
LLM summarisation turns raw hits into structured, past‑tense memories with concrete details.
Holographic Retrieval extends beyond keyword matching by mixing FTS5, Jaccard similarity, and HRR vector algebra, plus trust weighting and temporal decay.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
