Add Long-Term Memory to Your Agent with Lightweight RAG (Lesson 5)

This tutorial shows how to equip an AI agent with long‑term memory using Retrieval‑Augmented Generation (RAG), covering the concepts of vector embeddings, FAISS indexing, building and querying a knowledge base, and providing complete Python code examples.

AI Tech Publishing
AI Tech Publishing
AI Tech Publishing
Add Long-Term Memory to Your Agent with Lightweight RAG (Lesson 5)

Lesson Recap

Previous lesson introduced skill tools that let an Agent call specialized skills, but the Agent still starts from scratch for each answer and has no memory of past interactions.

Why Retrieval‑Augmented Generation (RAG)

Without RAG the Agent can only answer from its training data, has no access to private documents, and essentially guesses.

With RAG the Agent retrieves relevant information from a private knowledge base, answers based on real documents, and functions like a “second brain”.

RAG Definition

RAG = Retrieval‑Augmented Generation.

Retrieval : Find relevant documents in a knowledge base.

Augmentation : Insert the retrieved documents into the prompt.

Generation : Let the LLM generate answers based on the provided documents.

Feeding the entire knowledge base to the LLM would cause token and cost explosion; selecting only relevant documents is the core idea of RAG.

Lightweight RAG Stack

Vector database: FAISS (pure Python/C++)

Embedding model: Sentence‑Transformers – all-MiniLM-L6-v2 (384‑dimensional vectors)

FAISS provides zero‑configuration local indexing, the embedding model is open‑source and free, and the whole stack runs with only Python dependencies.

One‑Sentence RAG Flow

Split documents into chunks → encode to vectors → store in a vector database → at query time, use vector similarity to retrieve semantically similar documents.

What Is a Vector?

A vector (embedding) is a numeric representation of text that captures its meaning; semantically similar texts produce nearby vectors.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
texts = ["苹果", "iPhone", "香蕉"]
vectors = model.encode(texts)
print(vectors.shape)  # (3, 384)
print(vectors[0][:10])  # first 10 dimensions

Vector similarity (e.g., cosine similarity) shows that semantically related sentences have high similarity scores.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
v1 = model.encode("苹果手机很好用")
v2 = model.encode("iPhone 使用体验不错")
v3 = model.encode("今天天气晴朗")
from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity([v1], [v2, v3])
print(sim)  # [[0.89, 0.12]]

Lesson Goal

Implement a RAG‑enabled Agent with three components: rag: Build knowledge base & retrieve documents (FAISS + Sentence‑Transformers) mcp: Call external services (e.g., Context7, GitHub) via API terminate: Stop the Agent loop and return the final answer

Demo Tasks

Store a document in the vector database.

Ask the Agent a related question.

The Agent retrieves from the knowledge base and answers.

Code Implementation

Install Dependencies

uv pip install faiss-cpu sentence-transformers

RAG Tool Core Logic

The tool supports two actions (full code in exercise/05_light_rag/tools/rag.py): build: Construct the knowledge base from a list of documents. query: Search the knowledge base with a question string.

class RAGTool(BaseTool):
    def __init__(self):
        self.model = SentenceTransformer("all-MiniLM-L6-v2")  # lightweight embedding model
        self.dimension = 384
        self.index = None
        self.metadata = []

    def execute(self, **kwargs) -> tuple[bool, str]:
        action = kwargs.get("action", "")
        if action == "build":
            documents = kwargs.get("documents", [])
            return self._build_index(documents)
        elif action == "query":
            question = kwargs.get("question", "")
            return self._query(question)

Vectorization & Retrieval Steps

Encode documents to vectors: self.model.encode(texts) Create FAISS index: faiss.IndexFlatIP(self.dimension) Normalize vectors: faiss.normalize_L2(vectors) Add vectors to index: self.index.add(vectors) Search similar documents:

self.index.search(question_vector, k)
def _build_index(self, documents: list[dict]) -> tuple[bool, str]:
    texts = [doc.get("content", "") for doc in documents]
    vectors = self.model.encode(texts).astype("float32")
    if self.index is None:
        self.index = faiss.IndexFlatIP(self.dimension)
    faiss.normalize_L2(vectors)
    self.index.add(vectors)
    self.metadata.extend(documents)
    return True, f"Indexed {len(documents)} documents"

def _query(self, question: str) -> tuple[bool, str]:
    question_vector = self.model.encode([question]).astype("float32")
    faiss.normalize_L2(question_vector)
    k = min(3, self.index.ntotal)
    distances, indices = self.index.search(question_vector, k)
    context_parts = []
    for i, idx in enumerate(indices[0]):
        if idx >= 0:
            meta = self.metadata[idx]
            context_parts.append(f"[{meta['source']}] (score: {distances[0][i]:.4f})
{meta['content']}")
    if not context_parts:
        return False, "No relevant documents found"
    return False, "Relevant context:

" + "

---

".join(context_parts)

Running Example

Build a knowledge base with three simple sentences:

uv run python 05_light_rag/main.py --task "构建知识库,文档如下:1. Python 是一种高级编程语言 2. JavaScript 是 Web 开发语言 3. Rust 以安全性著称"
# Output: Indexed 3 documents

Query the knowledge base:

uv run python 05_light_rag/main.py --task "哪种语言以安全性著称?"
# Agent retrieves "Rust 以安全性著称" and returns the answer.

Advanced: RSS News Knowledge Base

# Simulated RSS news data
news_articles = [
    {"content": "OpenAI 发布 GPT-5,性能提升 50%", "source": "HN"},
    {"content": "Claude 3.5 发布,新增计算机使用能力", "source": "HN"},
    {"content": "Google 发布 Gemini 2.0", "source": "Reddit"},
]
rag(action="build", documents=news_articles)
rag(action="query", question="最近有什么 AI 大新闻?")

RAG Advanced Tips

Chunking Strategies

Fixed size: General token‑based splitting.

By paragraph: Documents with clear structure.

By heading: Markdown or HTML documents.

def chunk_text(text: str, chunk_size: int = 500) -> list[str]:
    """Split text into fixed‑size chunks"""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks

Hybrid Retrieval

Combining keyword (e.g., BM25) and vector retrieval can improve precision; the current implementation keeps it simple with vector‑first retrieval.

Re‑ranking

After an initial top‑k vector search, a stronger model can re‑rank the results (not implemented in this lesson).

RAG vs. Context Memory

RAG answers “what I know” from an external knowledge base using vector similarity.

Context Memory remembers “what we talked about” by keeping the dialogue history in chronological order.

Summary of Components

Vector Database: FAISS (Facebook AI Similarity Search)

Embedding Model: Sentence‑Transformers ( all-MiniLM-L6-v2)

Vector Dimension: 384

Learning Outcomes

Understand the core principle of RAG.

Build a local FAISS knowledge base.

Enable an Agent to answer questions based on private documents.

Next Steps

Level 2: Explore chunking strategies and hybrid retrieval.

Level 3: Build a full RSS news knowledge base.

Open‑source repository: https://github.com/HUANGLIWEN/mini-manus

PythonRAGAgentFAISSEmbeddingSentence Transformers
AI Tech Publishing
Written by

AI Tech Publishing

In the fast-evolving AI era, we thoroughly explain stable technical foundations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.