Agent Memory: From Theory to Practical Implementation
The article explains how AI agents can acquire long‑term memory by combining three functions—coherence, context, and learning—with four memory types, describes the full retrieval‑store loop, and provides a step‑by‑step Python implementation using OpenAI embeddings, ChromaDB, and forgetting strategies.
1. What Is Agent Memory?
Agent memory is not a single feature but a backend system that includes different storage mechanisms, retrieval methods, and management strategies, allowing an agent to retain context over long periods.
It serves three distinct functions:
Coherence : remembers identity, preferences, and shared outcomes so each interaction is not a fresh start.
Context : records the current task, recent tool results, and next steps to keep multi‑step workflows smooth.
Learning : aggregates successful and failed actions to continuously improve decision making.
2. Four Memory Types
In‑context memory : the token window used during a single forward pass. It includes system prompts, conversation history, tool call results, retrieved snippets, and temporary drafts. Capacity is limited and cleared after the session.
External memory : storage outside the model, such as PostgreSQL, Redis, SQLite for exact queries, or vector stores like Pinecone, Chroma, pgvector for semantic retrieval. Retrieval speed and relevance are the main performance bottlenecks.
Episodic memory : structured logs of past events (task, approach, outcome, duration, token cost, quality score, notes). These logs enable few‑shot learning by retrieving similar episodes.
Parametric memory : knowledge baked into model weights during pre‑training (world facts, language rules, reasoning strategies). It cannot be updated without re‑training and may produce hallucinations.
3. Memory Flow in an Agent Loop
Before each model invocation the system retrieves relevant memories; after the response it writes new information back. This makes the otherwise stateless LLM behave as a stateful, aware agent.
4. Building a Memory Layer in Python
We implement a MemoryStore class that uses chromadb for persistent vector storage and OpenAI embeddings. The class provides remember, recall, and forget methods.
import chromadb
from openai import OpenAI
from datetime import datetime
import json, uuid
class MemoryStore:
"""AI agent persistent vector memory"""
def __init__(self, agent_id: str, persist_dir: str = "./memory_db"):
self.agent_id = agent_id
self.openai = OpenAI()
self.client = chromadb.PersistentClient(path=persist_dir)
self.collection = self.client.get_or_create_collection(
name=f"agent_{agent_id}_memories",
metadata={"hnsw:space": "cosine"}
)
# _embed, remember, recall, forget methods omitted for brevityAn EpisodicLogger stores episode records, and a MemoryAugmentedAgent combines the two, retrieving both semantic memories and similar episodes before constructing the system prompt.
class EpisodicLogger:
def __init__(self, memory_store: MemoryStore):
self.store = memory_store
def log(self, episode: Episode):
doc = (
f"任务: {episode.task}
"
f"方法: {episode.approach}
"
f"结果: {episode.outcome}
"
f"备注: {episode.notes}"
)
self.store.remember(content=doc, memory_type="episode", metadata={
"outcome": episode.outcome,
"quality_score": episode.quality_score,
"duration_ms": episode.duration_ms,
"token_cost": episode.token_cost,
}) class MemoryAugmentedAgent:
def __init__(self, agent_id: str):
self.client = anthropic.Anthropic()
self.memory = MemoryStore(agent_id)
self.episodes = EpisodicLogger(self.memory)
def _build_memory_context(self, user_message: str) -> str:
memories = self.memory.recall(user_message, k=4)
episodes = self.episodes.recall_similar(user_message, k=2)
parts = []
if memories:
parts.append("## 相关记忆
" + "
".join([
f"- [{m['metadata']['type']}] {m['content']} (相关性: {m['relevance']})"
for m in memories]))
if episodes:
parts.append("## 相似历史任务
" + "
".join([
f"- {e['content'][:200]}..." for e in episodes]))
return "
".join(parts) if parts else ""
# run method omitted for brevity5. Vector Database and Similarity Retrieval
Embeddings are 1536‑dimensional float arrays generated by OpenAI’s text‑embedding‑3‑small. Cosine similarity finds the nearest vectors, enabling semantic search even without exact keyword overlap.
def cosine_similarity(a: list, b: list) -> float:
"""1.0 = identical semantics, 0.0 = unrelated"""
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))Local development can start with ChromaDB; production may switch to pgvector, Pinecone, or Qdrant for larger scale.
6. Forgetting and Consolidation Strategies
Time‑decay scoring : combines relevance, importance, and recency (e.g., memory_score formula from Park et al., 2023).
Importance scoring at write‑time : let a LLM rate the future value of a piece of information (0.0–1.0).
Periodic consolidation : merge highly similar memories into a summary, analogous to human sleep‑time memory consolidation.
def memory_score(relevance, importance, created_at, recency_weight=0.3, decay_factor=0.995):
"""Inspired by "Generative Agents" (Park et al., 2023)"""
hours_old = (datetime.utcnow() - created_at).total_seconds() / 3600
recency = math.pow(decay_factor, hours_old)
return relevance * 0.4 + importance * 0.3 + recency * recency_weight7. Takeaway
Memory transforms an LLM from a stateless tool into a collaborative partner that can understand, adapt, and evolve over time. Designing the right mix of parametric, external, episodic, and in‑context memories, together with effective forgetting policies, is the key to building robust AI agents.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Architecture Hub
Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
