Personalizing AI Agents: Memory, Rolling Context, and Advanced Retrieval Techniques
The article explains how AI agents use memory to retain conversation context, why sending the full history to large language models is inefficient, and presents rolling context windows, inverted‑index pruning, semantic embedding retrieval, and GraphRAG as complementary strategies to build more accurate and personalized agents.
Memory is the foundational component of any effective AI agent, storing past user interactions and context so the agent can respond accurately over time.
Sending the entire conversation history to a large language model (LLM) quickly inflates token usage, overwhelms the attention mechanism, raises costs, increases latency, and makes debugging difficult.
Rolling Context Window
A rolling context window limits the number of recent interactions sent to the model, controlling token consumption, latency, and cost, though it may discard older but relevant information.
WINDOW_SIZE = 6
def rolling_add_messages(old, new):
combined = add_messages(old, new)
return combined[-WINDOW_SIZE:]
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], rolling_add_messages]Inverted Index for Smarter Context Pruning
Building an inverted index over chat history enables retrieval of only those messages that contain query terms, dramatically shrinking the context sent to the LLM while preserving relevance.
def build_inverted_index(messages):
index = defaultdict(set)
for i, msg in enumerate(messages):
for word in msg.lower().split():
index[word].add(i)
return index
def search_context(query, index, messages):
matched_ids = set()
for word in query.lower().split():
if word in index:
matched_ids.update(index[word])
return [messages[i] for i in matched_ids]Semantic Retrieval
Semantic retrieval uses lightweight sentence‑embedding models and vector similarity (cosine) to fetch context that matches the user’s intent, avoiding reliance on exact keyword matches.
from sentence_transformers import SentenceTransformer
import faiss, numpy as np
sentences = ["schedule meeting tomorrow", "send project report", "prepare meeting agenda"]
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(sentences)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))
query = "plan meeting"
query_vector = model.encode([query])
distances, indices = index.search(np.array(query_vector), k=2)
print([sentences[i] for i in indices[0]])GraphRAG
GraphRAG represents knowledge as a graph where nodes are documents or concepts and edges are relationships, enabling multi‑hop reasoning across dispersed information.
pip install graphrag
graphrag init --root ./graphrag_demo
graphrag index --root ./graphrag_demo
graphrag query --root ./graphrag_demo --method global "Why was the payment delayed?"By combining rolling context windows, semantic similarity search, experience‑based memory, and graph‑based retrieval, developers can build agents that not only recall relevant facts but also reason across them, resulting in more complete answers, fewer hallucinations, and a more personalized user experience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
