Cross‑Session Retrieval with SQLite FTS5 and LLM Summaries – Hermes Agent’s Four‑Layer Architecture

This article dissects Hermes Agent’s four‑layer cross‑session retrieval system, covering persistent storage, dual‑table FTS5 indexing for CJK and English, a three‑path search strategy, intelligent truncation for LLM prompts, structured summarisation, and a holographic retrieval layer that blends FTS5, Jaccard similarity and HRR vector algebra.

James' Growth Diary
James' Growth Diary
James' Growth Diary
Cross‑Session Retrieval with SQLite FTS5 and LLM Summaries – Hermes Agent’s Four‑Layer Architecture

Four‑Layer Cross‑Session Retrieval

Hermes stores all raw messages in a SQLite state.db file. The sessions table holds session metadata; the messages table holds each message.

Two FTS5 virtual tables index messages.content in parallel: messages_fts – uses the unicode61 tokenizer for English and standard Unicode. messages_fts_trigram – uses a trigram tokenizer for CJK text.

Layer 3 ( session_search_tool) retrieves candidate messages, groups them by session, loads surrounding context, and runs an auxiliary LLM to produce a structured summary.

Layer 4 (Holographic Retrieval) combines three scoring dimensions – FTS5 text match, Jaccard set similarity, and HRR vector algebra – and applies trust‑weight and time‑decay adjustments.

SQLite Schema and Dual‑Table FTS5 Index

-- sessions table: session metadata
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    parent_id TEXT,            -- points to compressed child session
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    title TEXT,
    model TEXT
);

-- messages table: each message
CREATE TABLE messages (
    id TEXT PRIMARY KEY,
    session_id TEXT REFERENCES sessions(id),
    role TEXT,               -- user/assistant/system/tool
    content TEXT,
    created_at TIMESTAMP
);

Two FTS5 tables are required because the unicode61 tokenizer splits Chinese characters into single‑character tokens, producing noisy matches. The trigram table preserves three‑character sliding windows (e.g., "大别山") and dramatically improves Chinese search precision.

# Python – create dual FTS5 tables
FTS_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts
USING fts5(content, content_rowid=rowid);
"""

FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram
USING fts5(content, tokenize='trigram');
"""

Triggers keep the virtual tables in sync with messages inserts, updates, and deletes.

# Insert trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_insert
AFTER INSERT ON messages BEGIN
    INSERT INTO messages_fts(rowid, content) VALUES (new.rowid, new.content);
    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.rowid, new.content);
END;

# Delete trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_delete
AFTER DELETE ON messages BEGIN
    INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.rowid, old.content);
    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.rowid, old.content);
END;

# Update trigger
CREATE TRIGGER IF NOT EXISTS messages_fts_update
AFTER UPDATE ON messages BEGIN
    INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.rowid, old.content);
    INSERT INTO messages_fts(rowid, content) VALUES (new.rowid, new.content);
    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.rowid, old.content);
    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.rowid, new.content);
END;

search_messages – Three‑Path Dispatch

The search_messages entry point selects a search path based on the presence of CJK characters in the query.

# Python implementation
def search_messages(self, query: str, limit: int = 50) -> list<SearchResult>:
    cjk_chars = [c for c in query if '\u4e00' <= c <= '\u9fff']
    if not cjk_chars:
        # Path 1 – pure English → FTS5 standard MATCH
        return self._search_fts5_standard(query, limit)
    elif len(cjk_chars) >= 3:
        # Path 2 – CJK ≥ 3 characters → trigram table
        return self._search_fts5_trigram(query, limit)
    else:
        # Path 3 – short CJK → LIKE fallback
        return self._search_like_fallback(query, limit)

def _search_fts5_standard(self, query: str, limit: int) -> list<SearchResult>:
    sql = """
        SELECT m.id, m.session_id, m.role, m.created_at,
               snippet(messages_fts, 0, '>>>', '<<<', '...', 40) AS snippet,
               rank
        FROM messages_fts
        JOIN messages m ON messages_fts.rowid = m.rowid
        WHERE messages_fts MATCH ?
        ORDER BY rank
        LIMIT ?
    """
    return self.db.execute(sql, [query, limit]).fetchall()

def _search_fts5_trigram(self, query: str, limit: int) -> list<SearchResult>:
    sql = """
        SELECT m.id, m.session_id, m.role, m.created_at,
               snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
               rank
        FROM messages_fts_trigram
        JOIN messages m ON messages_fts_trigram.rowid = m.rowid
        WHERE messages_fts_trigram MATCH ?
        ORDER BY rank
        LIMIT ?
    """
    return self.db.execute(sql, [query, limit]).fetchall()

def _search_like_fallback(self, query: str, limit: int) -> list<SearchResult>:
    sql = """
        SELECT id, session_id, role, created_at, content AS snippet
        FROM messages
        WHERE content LIKE ?
        ORDER BY created_at DESC
        LIMIT ?
    """
    return self.db.execute(sql, [f'%{query}%', limit]).fetchall()

Each result also includes one message before and after the hit, providing context for downstream processing.

session_search_tool – From Raw Hits to Structured Summaries

The tool operates in two modes:

recent mode (no keyword) – lists the most recent N sessions with title and timestamp.

search mode (keyword supplied) – performs full‑text search, groups hits by session, truncates relevant windows, and runs parallel LLM summarisation.

# Python – full search workflow
async def search_sessions(self, query: str, max_results: int = 5) -> str:
    # Step 1: FTS5 search for candidate messages
    hits = self.db.search_messages(query, limit=50)
    if not hits:
        return "No matching sessions found."
    # Step 2: Group by session, exclude current lineage
    session_groups: dict = {}
    current_lineage = self._get_session_lineage(self.current_session_id)
    for hit in hits:
        parent_id = self._resolve_to_parent(hit.session_id)
        if parent_id in current_lineage:
            continue
        session_groups.setdefault(parent_id, []).append(hit)
    # Step 3: Top‑N sessions by hit count
    top_sessions = sorted(session_groups.items(), key=lambda x: len(x[1]), reverse=True)[:max_results]
    # Step 4: Parallel LLM summarisation
    tasks = []
    for session_id, hits in top_sessions:
        conversation = self.db.get_messages_as_conversation(session_id)
        truncated = self._truncate_around_matches(conversation, query, max_chars=100_000)
        tasks.append(self._summarize_session(truncated, query, session_id))
    summaries = await asyncio.gather(*tasks)
    return self._format_results(summaries)

Intelligent Truncation Algorithm

When a session contains tens of thousands of characters, only the most relevant window around the query terms is kept for the LLM.

# Python – window selection around matches
def _truncate_around_matches(self, text: str, query: str, max_chars: int = 100_000) -> str:
    if len(text) <= max_chars:
        return text
    positions = []
    query_lower = query.lower()
    text_lower = text.lower()
    for term in query_lower.split():
        start = 0
        while True:
            idx = text_lower.find(term, start)
            if idx == -1:
                break
            positions.append(idx)
            start = idx + 1
    if not positions:
        return text[:max_chars]
    positions.sort()
    window_size = max_chars
    best_start = 0
    best_count = 0
    for pos in positions:
        window_start = max(0, pos - window_size // 4)  # bias start 1/4 before match
        window_end = window_start + window_size
        count = sum(1 for p in positions if window_start <= p <= window_end)
        if count > best_count:
            best_count = count
            best_start = window_start
    return text[best_start:best_start + max_chars]

LLM Summarisation Prompt

SUMMARIZE_PROMPT = """Summarize this conversation excerpt.
Focus on content related to: {query}

Requirements:
- Use past tense (this conversation already happened)
- Preserve specific details: names, numbers, commands, file paths, URLs
- Focus on the search topic, skip unrelated parts
- Keep the summary under 300 words
- Structure: what was discussed → what was decided → what was the outcome

Conversation:
{truncated_text}"""

Holographic Retrieval – Beyond Text Matching

FTS5 finds messages containing the keywords, but some queries need semantic understanding (e.g., “What was my opinion on Kubernetes?”). The holographic module adds two extra scoring dimensions.

Three‑Way Mixed Scoring

# Python – FactRetriever.search()
class FactRetriever:
    async def search(self, query: str, limit: int = 10, hrr_mode: str = "probe") -> list<ScoredFact>:
        # 1️⃣ FTS5 loose recall (3× candidates)
        candidates = self.db.fts_search(query, limit=limit * 3)
        # 2️⃣ Jaccard set similarity boost
        query_tokens = set(tokenize(query))
        for fact in candidates:
            fact_tokens = set(tokenize(fact.content))
            jaccard = len(query_tokens & fact_tokens) / len(query_tokens | fact_tokens)
            fact.score += jaccard * JACCARD_WEIGHT
        # 3️⃣ HRR vector algebra similarity
        query_hrr = self._encode_hrr(query)
        for fact in candidates:
            hrr_sim = self._hrr_similarity(query_hrr, fact.hrr_vector, mode=hrr_mode)
            fact.score += hrr_sim * HRR_WEIGHT
        # Trust‑weight
        for fact in candidates:
            fact.score *= fact.trust_weight
        # Time decay
        now = time.time()
        for fact in candidates:
            age_days = (now - fact.timestamp) / 86400
            decay = math.exp(-age_days / HALF_LIFE_DAYS)
            fact.score *= decay
        candidates.sort(key=lambda f: f.score, reverse=True)
        return candidates[:limit]

HRR Query Modes

probe – direct similarity (e.g., "Memory about Docker").

related – associated concepts (e.g., "Technologies related to Docker").

reason – causal inference (e.g., "Why Docker was chosen").

contradict – contradiction detection (e.g., "Memories that conflict with previous statements").

# Python – HRR similarity implementation
def _hrr_similarity(self, query_vec: np.ndarray, fact_vec: np.ndarray, mode: str) -> float:
    if mode == "probe":
        return np.dot(query_vec, fact_vec) / (np.linalg.norm(query_vec) * np.linalg.norm(fact_vec))
    elif mode == "related":
        unbound = np.fft.ifft(np.fft.fft(fact_vec) / np.fft.fft(query_vec)).real
        return np.max(unbound)
    elif mode == "reason":
        causal_vec = self._get_causal_binding(query_vec)
        return np.dot(causal_vec, fact_vec) / (np.linalg.norm(causal_vec) * np.linalg.norm(fact_vec))
    elif mode == "contradict":
        neg_query = -query_vec
        return np.dot(neg_query, fact_vec) / (np.linalg.norm(neg_query) * np.linalg.norm(fact_vec))

Memory Provider – Lifecycle Hooks

# Python – abstract MemoryProvider
class MemoryProvider(ABC):
    @abstractmethod
    async def prefetch(self, session_id: str, context: dict) -> str:
        """Called before each turn to preload relevant memories into the system prompt"""
        pass

    @abstractmethod
    async def on_session_end(self, session_id: str, messages: list) -> None:
        """Extract facts at the end of a session and store them permanently"""
        pass

    @abstractmethod
    async def sync_turn(self, session_id: str, message: dict) -> None:
        """Synchronise after each turn for real‑time index updates"""
        pass

    @abstractmethod
    async def on_pre_compress(self, session_id: str, messages: list) -> None:
        """Ensure important information is retained before compression"""
        pass

The prefetch workflow loads relevant history from FTS5/Holographic retrieval, generates a short memory injection, and adds it to the system prompt, giving the agent a human‑like associative start.

Common Pitfalls and Fixes

Pitfall 1 – CJK Tokenisation Disaster

Using only unicode61 splits "大别山项目进度" into seven single‑character tokens, causing noisy matches. Routing queries with three or more CJK characters to the trigram table yields exact three‑character sequence matches.

# Wrong – only unicode61
# Search "大别山" matches "别忘了山上的大石头" ❌

# Correct – trigram for CJK ≥3 chars
# Search "大别山" matches only messages containing the exact sequence ✅

Pitfall 2 – Child Session Lineage

After compression, a logical conversation may be split into multiple child sessions. Resolve the true parent recursively with _resolve_to_parent.

def _resolve_to_parent(self, session_id: str) -> str:
    """Recursively resolve to the top‑level parent session"""
    current = session_id
    visited = set()
    while True:
        if current in visited:
            break  # prevent loops
        visited.add(current)
        parent = self.db.get_session_parent(current)
        if parent is None:
            return current
        current = parent
    return current

Pitfall 3 – Excluding Current Session

When the user asks about a topic already present in the current dialogue, the search may return the current session itself. Compute the full lineage (ancestors + descendants) and filter it out.

def _get_session_lineage(self, session_id: str) -> set[str]:
    """Collect all ancestors and descendants of a session"""
    lineage = set()
    current = session_id
    while current:
        lineage.add(current)
        current = self.db.get_session_parent(current)
    self._collect_descendants(session_id, lineage)
    return lineage

Pitfall 4 – FTS5 Syntax Errors

User queries may contain special characters ("*", "\"", "OR") that break the FTS5 parser. The implementation catches sqlite3.OperationalError and falls back to a safe LIKE search.

def search_messages(self, query: str, limit: int = 50) -> list<SearchResult>:
    try:
        return self._search_fts5(query, limit)
    except sqlite3.OperationalError:
        escaped = query.replace('%', r'\%').replace('_', r'\_')
        return self._search_like_fallback(escaped, limit)

Summary of Core Design Points

Dual‑table FTS5 is essential for accurate CJK full‑text search.

The three‑path dispatch balances precision (English) and coverage (CJK long vs. short).

Aggregating messages into sessions matches user intent – they care about "which conversation" rather than a single line.

Intelligent truncation selects the densest window around matches, fitting LLM token limits while preserving information.

LLM summarisation turns raw hits into structured, past‑tense memories with concrete details.

Holographic Retrieval extends beyond keyword matching by mixing FTS5, Jaccard similarity, and HRR vector algebra, plus trust weighting and temporal decay.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMSQLiteagent architectureFTS5Cross-Session RetrievalHolographic RetrievalHRR
James' Growth Diary
Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.