Artificial Intelligence 27 min read

Inside Harness’s Super‑Powerful Three‑Level Memory Architecture: Context, History Layers, and Fact Lists

The article provides a detailed, source‑code‑backed walkthrough of Harness’s three‑level memory system—user context, historical layering, and a structured fact list—explaining each layer’s purpose, update frequency, lifecycle, and how the surrounding middleware, queue, updater, storage, and injection modules cooperate to deliver real‑time, persistent, and searchable memory for AI agents.

Tech Freedom Circle

Apr 29, 2026

Inside Harness’s Super‑Powerful Three‑Level Memory Architecture: Context, History Layers, and Fact Lists

Overview of the Three‑Level Memory Architecture

DeerFlow’s memory is organized into three distinct layers: user (short‑term/active memory) , history (mid‑to‑long‑term memory) , and facts (structured long‑term knowledge base) . Each layer stores information at a different granularity and update cadence, enabling the agent to retain immediate context, track past behavior, and retrieve discrete knowledge points.

User layer – refreshed on every dialogue turn, holds the current work context, personal preferences, and top‑of‑mind topics. It acts like a real‑time notification bar, overwriting previous values within hours or days.

History layer – updated monthly or quarterly, records recent months, earlier context, and long‑term background. It captures the user’s activity timeline and stable background information.

Facts layer – low‑frequency updates, stored permanently. Each fact includes a category (preference, knowledge, context, behavior, goal) and is designed for fast retrieval.

Key Data Model

The memory skeleton is created by _create_empty_memory() in updater.py:

def _create_empty_memory() -> dict[str, Any]:
    return {
        "version": "1.0",
        "lastUpdated": datetime.utcnow().isoformat() + "Z",
        "user": {
            "workContext": {"summary": "", "updatedAt": ""},
            "personalContext": {"summary": "", "updatedAt": ""},
            "topOfMind": {"summary": "", "updatedAt": ""},
        },
        "history": {
            "recentMonths": {"summary": "", "updatedAt": ""},
            "earlierContext": {"summary": "", "updatedAt": ""},
            "longTermBackground": {"summary": "", "updatedAt": ""},
        },
        "facts": [],
    }

Fact Entry Structure

When new facts are generated, they are normalised into a uniform schema:

fact_entry = {
    "id": f"fact_{uuid.uuid4().hex[:8]}",
    "content": fact.get("content", ""),
    "category": fact.get("category", "context"),
    "confidence": confidence,
    "createdAt": now,
    "source": thread_id or "unknown",
}
current_memory["facts"].append(fact_entry)

Module‑Level Architecture

The memory system follows a clear, layered pipeline:

MemoryMiddleware (interaction layer) – intercepts every user‑assistant exchange, forwards cleaned messages to the update queue.

MessageProcessing (pre‑processing layer) – filters out non‑essential content (tool calls, uploaded file tags) and extracts key signals such as user corrections or positive feedback.

MemoryUpdateQueue (asynchronous buffer) – debounces rapid dialogue events, batches them to avoid excessive LLM calls.

MemoryUpdater (core engine) – formats the conversation, builds a prompt ( MEMORY_UPDATE_PROMPT), calls the LLM to obtain a JSON update, then applies the updates via _apply_updates(), strips residual upload mentions, and finally writes the result atomically with _save_memory_to_file().

MemoryStorage (persistence layer) – abstracts load/reload/save operations; the concrete FileMemoryStorage writes JSON to disk, while the design allows swapping in vector stores or distributed databases.

MemoryInjection (prompt injection layer) – formats the persisted memory according to the token budget, prioritising high‑confidence facts, and injects the formatted snippet into the agent’s system prompt for the next turn.

Conversation Formatting Example

The helper format_conversation_for_update() truncates overly long messages and produces a concise text block for the LLM:

def format_conversation_for_update(messages: list[Any]) -> str:
    lines = []
    for msg in messages:
        role = getattr(msg, "type", "unknown")
        content = getattr(msg, "content", str(msg))
        if len(str(content)) > 1000:
            content = str(content)[:1000] + "..."
        if role == "human":
            lines.append(f"User: {content}")
        elif role == "ai":
            lines.append(f"Assistant: {content}")
    return "

".join(lines)

Benefits of the Design

The separation of concerns yields three major engineering advantages: high maintainability (each layer can be altered independently), performance optimisation (hot short‑term memory stays in RAM while cold facts are persisted), and extensibility (storage back‑ends can be swapped, new middleware can be added without touching the core updater).

Overall, the three‑level architecture transforms a naïve “concatenate‑all‑history” approach into a structured, searchable, and updatable memory system that prevents “agent forgetfulness” and supports enterprise‑scale, multi‑turn interactions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python AI agent Memory Architecture Layered Design DeerFlow Harness

Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.