Artificial Intelligence 17 min read

Agent Memory: From Theory to Practical Implementation

The article explains how AI agents can acquire long‑term memory by combining three functions—coherence, context, and learning—with four memory types, describes the full retrieval‑store loop, and provides a step‑by‑step Python implementation using OpenAI embeddings, ChromaDB, and forgetting strategies.

AI Architecture Hub

May 19, 2026

Agent Memory: From Theory to Practical Implementation

1. What Is Agent Memory?

Agent memory is not a single feature but a backend system that includes different storage mechanisms, retrieval methods, and management strategies, allowing an agent to retain context over long periods.

It serves three distinct functions:

Coherence : remembers identity, preferences, and shared outcomes so each interaction is not a fresh start.

Context : records the current task, recent tool results, and next steps to keep multi‑step workflows smooth.

Learning : aggregates successful and failed actions to continuously improve decision making.

2. Four Memory Types

In‑context memory : the token window used during a single forward pass. It includes system prompts, conversation history, tool call results, retrieved snippets, and temporary drafts. Capacity is limited and cleared after the session.

External memory : storage outside the model, such as PostgreSQL, Redis, SQLite for exact queries, or vector stores like Pinecone, Chroma, pgvector for semantic retrieval. Retrieval speed and relevance are the main performance bottlenecks.

Episodic memory : structured logs of past events (task, approach, outcome, duration, token cost, quality score, notes). These logs enable few‑shot learning by retrieving similar episodes.

Parametric memory : knowledge baked into model weights during pre‑training (world facts, language rules, reasoning strategies). It cannot be updated without re‑training and may produce hallucinations.

3. Memory Flow in an Agent Loop

Before each model invocation the system retrieves relevant memories; after the response it writes new information back. This makes the otherwise stateless LLM behave as a stateful, aware agent.

4. Building a Memory Layer in Python

We implement a MemoryStore class that uses chromadb for persistent vector storage and OpenAI embeddings. The class provides remember, recall, and forget methods.

import chromadb
from openai import OpenAI
from datetime import datetime
import json, uuid

class MemoryStore:
    """AI agent persistent vector memory"""
    def __init__(self, agent_id: str, persist_dir: str = "./memory_db"):
        self.agent_id = agent_id
        self.openai = OpenAI()
        self.client = chromadb.PersistentClient(path=persist_dir)
        self.collection = self.client.get_or_create_collection(
            name=f"agent_{agent_id}_memories",
            metadata={"hnsw:space": "cosine"}
        )
    # _embed, remember, recall, forget methods omitted for brevity

An EpisodicLogger stores episode records, and a MemoryAugmentedAgent combines the two, retrieving both semantic memories and similar episodes before constructing the system prompt.

class EpisodicLogger:
    def __init__(self, memory_store: MemoryStore):
        self.store = memory_store
    def log(self, episode: Episode):
        doc = (
            f"任务: {episode.task}
"
            f"方法: {episode.approach}
"
            f"结果: {episode.outcome}
"
            f"备注: {episode.notes}"
        )
        self.store.remember(content=doc, memory_type="episode", metadata={
            "outcome": episode.outcome,
            "quality_score": episode.quality_score,
            "duration_ms": episode.duration_ms,
            "token_cost": episode.token_cost,
        })

class MemoryAugmentedAgent:
    def __init__(self, agent_id: str):
        self.client = anthropic.Anthropic()
        self.memory = MemoryStore(agent_id)
        self.episodes = EpisodicLogger(self.memory)
    def _build_memory_context(self, user_message: str) -> str:
        memories = self.memory.recall(user_message, k=4)
        episodes = self.episodes.recall_similar(user_message, k=2)
        parts = []
        if memories:
            parts.append("## 相关记忆
" + "
".join([
                f"- [{m['metadata']['type']}] {m['content']} (相关性: {m['relevance']})"
                for m in memories]))
        if episodes:
            parts.append("## 相似历史任务
" + "
".join([
                f"- {e['content'][:200]}..." for e in episodes]))
        return "

".join(parts) if parts else ""
    # run method omitted for brevity

5. Vector Database and Similarity Retrieval

Embeddings are 1536‑dimensional float arrays generated by OpenAI’s text‑embedding‑3‑small. Cosine similarity finds the nearest vectors, enabling semantic search even without exact keyword overlap.

def cosine_similarity(a: list, b: list) -> float:
    """1.0 = identical semantics, 0.0 = unrelated"""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Local development can start with ChromaDB; production may switch to pgvector, Pinecone, or Qdrant for larger scale.

6. Forgetting and Consolidation Strategies

Time‑decay scoring : combines relevance, importance, and recency (e.g., memory_score formula from Park et al., 2023).

Importance scoring at write‑time : let a LLM rate the future value of a piece of information (0.0–1.0).

Periodic consolidation : merge highly similar memories into a summary, analogous to human sleep‑time memory consolidation.

def memory_score(relevance, importance, created_at, recency_weight=0.3, decay_factor=0.995):
    """Inspired by "Generative Agents" (Park et al., 2023)"""
    hours_old = (datetime.utcnow() - created_at).total_seconds() / 3600
    recency = math.pow(decay_factor, hours_old)
    return relevance * 0.4 + importance * 0.3 + recency * recency_weight

7. Takeaway

Memory transforms an LLM from a stateless tool into a collaborative partner that can understand, adapt, and evolve over time. Designing the right mix of parametric, external, episodic, and in‑context memories, together with effective forgetting policies, is the key to building robust AI agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python AI Agents vector database retrieval ChromaDB Memory systems forgetting strategies

Written by

AI Architecture Hub

Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.