Add Long-Term Memory to Your Agent with Lightweight RAG (Lesson 5)
This tutorial shows how to equip an AI agent with long‑term memory using Retrieval‑Augmented Generation (RAG), covering the concepts of vector embeddings, FAISS indexing, building and querying a knowledge base, and providing complete Python code examples.
Lesson Recap
Previous lesson introduced skill tools that let an Agent call specialized skills, but the Agent still starts from scratch for each answer and has no memory of past interactions.
Why Retrieval‑Augmented Generation (RAG)
Without RAG the Agent can only answer from its training data, has no access to private documents, and essentially guesses.
With RAG the Agent retrieves relevant information from a private knowledge base, answers based on real documents, and functions like a “second brain”.
RAG Definition
RAG = Retrieval‑Augmented Generation.
Retrieval : Find relevant documents in a knowledge base.
Augmentation : Insert the retrieved documents into the prompt.
Generation : Let the LLM generate answers based on the provided documents.
Feeding the entire knowledge base to the LLM would cause token and cost explosion; selecting only relevant documents is the core idea of RAG.
Lightweight RAG Stack
Vector database: FAISS (pure Python/C++)
Embedding model: Sentence‑Transformers – all-MiniLM-L6-v2 (384‑dimensional vectors)
FAISS provides zero‑configuration local indexing, the embedding model is open‑source and free, and the whole stack runs with only Python dependencies.
One‑Sentence RAG Flow
Split documents into chunks → encode to vectors → store in a vector database → at query time, use vector similarity to retrieve semantically similar documents.
What Is a Vector?
A vector (embedding) is a numeric representation of text that captures its meaning; semantically similar texts produce nearby vectors.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
texts = ["苹果", "iPhone", "香蕉"]
vectors = model.encode(texts)
print(vectors.shape) # (3, 384)
print(vectors[0][:10]) # first 10 dimensionsVector similarity (e.g., cosine similarity) shows that semantically related sentences have high similarity scores.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
v1 = model.encode("苹果手机很好用")
v2 = model.encode("iPhone 使用体验不错")
v3 = model.encode("今天天气晴朗")
from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity([v1], [v2, v3])
print(sim) # [[0.89, 0.12]]Lesson Goal
Implement a RAG‑enabled Agent with three components: rag: Build knowledge base & retrieve documents (FAISS + Sentence‑Transformers) mcp: Call external services (e.g., Context7, GitHub) via API terminate: Stop the Agent loop and return the final answer
Demo Tasks
Store a document in the vector database.
Ask the Agent a related question.
The Agent retrieves from the knowledge base and answers.
Code Implementation
Install Dependencies
uv pip install faiss-cpu sentence-transformersRAG Tool Core Logic
The tool supports two actions (full code in exercise/05_light_rag/tools/rag.py): build: Construct the knowledge base from a list of documents. query: Search the knowledge base with a question string.
class RAGTool(BaseTool):
def __init__(self):
self.model = SentenceTransformer("all-MiniLM-L6-v2") # lightweight embedding model
self.dimension = 384
self.index = None
self.metadata = []
def execute(self, **kwargs) -> tuple[bool, str]:
action = kwargs.get("action", "")
if action == "build":
documents = kwargs.get("documents", [])
return self._build_index(documents)
elif action == "query":
question = kwargs.get("question", "")
return self._query(question)Vectorization & Retrieval Steps
Encode documents to vectors: self.model.encode(texts) Create FAISS index: faiss.IndexFlatIP(self.dimension) Normalize vectors: faiss.normalize_L2(vectors) Add vectors to index: self.index.add(vectors) Search similar documents:
self.index.search(question_vector, k) def _build_index(self, documents: list[dict]) -> tuple[bool, str]:
texts = [doc.get("content", "") for doc in documents]
vectors = self.model.encode(texts).astype("float32")
if self.index is None:
self.index = faiss.IndexFlatIP(self.dimension)
faiss.normalize_L2(vectors)
self.index.add(vectors)
self.metadata.extend(documents)
return True, f"Indexed {len(documents)} documents"
def _query(self, question: str) -> tuple[bool, str]:
question_vector = self.model.encode([question]).astype("float32")
faiss.normalize_L2(question_vector)
k = min(3, self.index.ntotal)
distances, indices = self.index.search(question_vector, k)
context_parts = []
for i, idx in enumerate(indices[0]):
if idx >= 0:
meta = self.metadata[idx]
context_parts.append(f"[{meta['source']}] (score: {distances[0][i]:.4f})
{meta['content']}")
if not context_parts:
return False, "No relevant documents found"
return False, "Relevant context:
" + "
---
".join(context_parts)Running Example
Build a knowledge base with three simple sentences:
uv run python 05_light_rag/main.py --task "构建知识库,文档如下:1. Python 是一种高级编程语言 2. JavaScript 是 Web 开发语言 3. Rust 以安全性著称"
# Output: Indexed 3 documentsQuery the knowledge base:
uv run python 05_light_rag/main.py --task "哪种语言以安全性著称?"
# Agent retrieves "Rust 以安全性著称" and returns the answer.Advanced: RSS News Knowledge Base
# Simulated RSS news data
news_articles = [
{"content": "OpenAI 发布 GPT-5,性能提升 50%", "source": "HN"},
{"content": "Claude 3.5 发布,新增计算机使用能力", "source": "HN"},
{"content": "Google 发布 Gemini 2.0", "source": "Reddit"},
]
rag(action="build", documents=news_articles)
rag(action="query", question="最近有什么 AI 大新闻?")RAG Advanced Tips
Chunking Strategies
Fixed size: General token‑based splitting.
By paragraph: Documents with clear structure.
By heading: Markdown or HTML documents.
def chunk_text(text: str, chunk_size: int = 500) -> list[str]:
"""Split text into fixed‑size chunks"""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = " ".join(words[i:i+chunk_size])
chunks.append(chunk)
return chunksHybrid Retrieval
Combining keyword (e.g., BM25) and vector retrieval can improve precision; the current implementation keeps it simple with vector‑first retrieval.
Re‑ranking
After an initial top‑k vector search, a stronger model can re‑rank the results (not implemented in this lesson).
RAG vs. Context Memory
RAG answers “what I know” from an external knowledge base using vector similarity.
Context Memory remembers “what we talked about” by keeping the dialogue history in chronological order.
Summary of Components
Vector Database: FAISS (Facebook AI Similarity Search)
Embedding Model: Sentence‑Transformers ( all-MiniLM-L6-v2)
Vector Dimension: 384
Learning Outcomes
Understand the core principle of RAG.
Build a local FAISS knowledge base.
Enable an Agent to answer questions based on private documents.
Next Steps
Level 2: Explore chunking strategies and hybrid retrieval.
Level 3: Build a full RSS news knowledge base.
Open‑source repository: https://github.com/HUANGLIWEN/mini-manus
AI Tech Publishing
In the fast-evolving AI era, we thoroughly explain stable technical foundations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
