How to Build Intelligent Contextual Memory for AI Agents
The article examines why naïvely feeding all dialogue history to large language models is costly and unreliable, and it walks through rolling context windows, inverted‑index pruning, semantic vector search, and GraphRAG as complementary techniques for creating efficient, reasoning‑capable AI agent memory.
Why Not Send All History to the LLM
Sending the entire conversation history with each request keeps the model’s attention mechanism simple and works for short, small‑scale tasks, but it quickly leads to problems: irrelevant context overwhelms the model, output becomes inconsistent, debugging is hard, token usage and cost rise sharply, and latency grows.
Rolling Context Window
A common alternative is to limit the number of recent interactions sent to the model, known as a rolling context window. By keeping only the last N messages, developers can control token consumption, reduce latency, and keep costs predictable. This works well when the task mainly depends on recent dialogue, but it discards older information that may be needed for accurate reasoning.
WINDOW_SIZE = 6
# Added a wrapper method for returning window messages
def rolling_add_messages(old, new):
combined = add_messages(old, new)
return combined[-WINDOW_SIZE:]
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], rolling_add_messages]Inverted Index for Smarter Context Pruning
An inverted index maps terms directly to the documents or messages where they appear, enabling fast full‑text retrieval.
Building an inverted index over chat history allows the system to retrieve only messages that contain query terms, dramatically shrinking the context sent to the LLM while preserving relevance, even for very early conversation turns.
def build_inverted_index(messages):
index = defaultdict(set)
for i, msg in enumerate(messages):
words = msg.lower().split()
for word in words:
index[word].add(i)
return index
def search_context(query, index, messages):
query_words = query.lower().split()
matched_ids = set()
for word in query_words:
if word in index:
matched_ids.update(index[word])
return [messages[i] for i in matched_ids]Semantic Context Retrieval
Semantic search interprets user intent and retrieves context based on meaning rather than exact keywords.
Sentences are tokenized, embedded with a lightweight AI model, and stored as vectors in a FAISS index. At query time, the query is also embedded and compared via cosine similarity; high similarity (e.g., >0.9) indicates strong semantic relevance.
from sentence_transformers import SentenceTransformer
import faiss, numpy as np
sentences = ["schedule meeting tomorrow", "send project report", "prepare meeting agenda"]
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(sentences)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))
query = "plan meeting"
query_vector = model.encode([query])
_, indices = index.search(np.array(query_vector), k=2)
print([sentences[i] for i in indices[0]])Semantic Experience Memory
Instead of treating each user query as a clean slate, the system can retrieve semantically similar past interactions and combine them with the recent rolling window. This hybrid context gives the agent a sense of user preferences and prior knowledge, producing more consistent and personalized responses.
GraphRAG
When relevant knowledge is scattered across many interactions or documents, pure semantic similarity may miss logically related details. GraphRAG represents information as a graph—nodes are documents or concepts, edges are relationships—enabling multi‑hop reasoning.
Typical steps to build a GraphRAG pipeline:
Identify nodes (entities or concepts).
Identify relationships between nodes.
Define the graph schema.
Populate the graph with data.
Microsoft’s GraphRAG library automates entity and relation extraction and graph construction.
Step 1: Initialize the project
pip install graphrag
graphrag init --root ./graphrag_demoDirectory layout:
graphrag_demo/
├── input/
├── settings.yaml
└── prompts/
# Place source documents in input/Step 2: Build the knowledge graph
graphrag index --root ./graphrag_demoStep 3: Query the graph
graphrag query \
--root ./graphrag_demo \
--method global \
"Why was the payment delayed?"The system performs cross‑graph retrieval and supports multi‑hop inference, going beyond simple similarity matches.
Conclusion
Modern AI agents need more than keyword matching or isolated semantic similarity. Inverted indexes efficiently trim context, semantic retrieval preserves intent, and GraphRAG adds structured, relational knowledge for multi‑hop reasoning. Combining rolling windows, semantic similarity, experience memory, and graph‑based retrieval yields agents that not only recall information but also reason over it, delivering more complete, less hallucinatory, and more personalized responses.
by Mudassir Fazal
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeepHub IMBA
A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
