Building AI Agents with LangGraph: Implementing RAG and Long‑Term Memory

This tutorial walks through adding Retrieval‑Augmented Generation (RAG) and persistent long‑term memory to a LangGraph AI agent, covering concepts, step‑by‑step code for document loading, vector store creation, prompt engineering, memory management, and best‑practice pitfalls.

Data STUDIO
Data STUDIO
Data STUDIO
Building AI Agents with LangGraph: Implementing RAG and Long‑Term Memory

What are RAG and Long‑Term Memory?

RAG (Retrieval‑Augmented Generation) equips an LLM with an external "knowledge shelf" that retrieves relevant documents before generating answers, overcoming the limitation of relying solely on pre‑trained knowledge. Long‑term memory extends conversation context across sessions, allowing the agent to recall past dialogues like a "memory palace".

Adding RAG Capability to a LangGraph Agent

Step 1: Load and Split Documents

We use WebBaseLoader to fetch the React Native ExecuTorch documentation and split it into ~1000‑character chunks with a 200‑character overlap.

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load document
loader = WebBaseLoader("https://docs.swmansion.com/react-native-executorch/")
docs = loader.load()
print(f"Loaded {len(docs)} documents, total characters: {len(docs[0].page_content)}")

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
print(f"Split into {len(all_splits)} text chunks")

Step 2: Create a Vector Store

We generate embeddings with sentence‑transformers/all‑MiniLM‑L6‑v2 and store the vectors in an in‑memory vector store.

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=all_splits)
print("Vector store ready!")

Step 3: Wrap the Query Function with RAG

The function first retrieves the top‑k relevant chunks, builds a context string, and then prompts the LLM.

def ask_llm_with_rag(state):
    """Enhanced ask: retrieve then generate"""
    user_query = input("Enter your question: ")
    retrieved_docs = vector_store.similarity_search(user_query, k=3)
    print(f"Retrieved {len(retrieved_docs)} relevant snippets")
    context = "

---

".join([doc.page_content for doc in retrieved_docs])
    user_message = HumanMessage(
        f"""Please answer based on the following context. If the context does not contain relevant information, say you don't know.

Context:
{context}

User question:
{user_query}

Provide a concise answer:"""
    )
    answer_message = model.invoke(state["messages"] + [user_message])
    print(f"
🤖 AI answer: {answer_message.content}
")
    return {"messages": [user_message, answer_message]}

Step 4: Full RAG‑Enabled Graph

We assemble the graph with StateGraph, add the ask node, set it as the entry point, and compile.

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import os

os.environ["OPENAI_API_KEY"] = "your-api-key-here"
model = ChatOpenAI(model="gpt-3.5-turbo")

graph_builder = StateGraph(State)
graph_builder.add_node("ask", ask_llm_with_rag)
graph_builder.set_entry_point("ask")
graph_builder.add_edge("ask", END)
graph = graph_builder.compile()

initial_state = {"messages": [], "iteration": 0}
result = graph.invoke(initial_state)
print("=" * 50)
print("Test question: What is React Native ExecuTorch?")
print("=" * 50)

Adding Long‑Term Memory

Step 1: Set Up Memory Storage

from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore

checkpointer = InMemorySaver()
store = InMemoryStore()
workflow = graph.compile(checkpointer=checkpointer, store=store)

Step 2: Manage Sessions with Thread IDs

config = {
    "recursion_limit": 100,
    "configurable": {"thread_id": "user_123_session_1"}
}
# First conversation
print("=== First conversation ===")
workflow.invoke({"messages": [], "iteration": 0}, config=config)
current_state = workflow.get_state(config)
print(f"Current round: {current_state.values['iteration']}")
# Second conversation (same thread)
print("
=== Second conversation (continuation) ===")
workflow.invoke(current_state, config=config)

Step 3: Implement Persistent Memory Manager

A simple class stores summarized memories in a JSON file, keeps the latest 50 entries, and provides retrieval.

import json
from datetime import datetime

class LongTermMemoryManager:
    """Long‑term memory manager"""
    def __init__(self, storage_path="memory_storage.json"):
        self.storage_path = storage_path
        self.memories = self.load_memories()
    def load_memories(self):
        try:
            with open(self.storage_path, "r", encoding="utf-8") as f:
                return json.load(f)
        except FileNotFoundError:
            return {}
    def save_memory(self, user_id, conversation_summary, key_points):
        if user_id not in self.memories:
            self.memories[user_id] = []
        entry = {"timestamp": datetime.now().isoformat(), "summary": conversation_summary, "key_points": key_points}
        self.memories[user_id].append(entry)
        if len(self.memories[user_id]) > 50:
            self.memories[user_id] = self.memories[user_id][-50:]
        self.save_to_disk()
    def save_to_disk(self):
        with open(self.storage_path, "w", encoding="utf-8") as f:
            json.dump(self.memories, f, ensure_ascii=False, indent=2)
    def get_user_memories(self, user_id, limit=5):
        return self.memories.get(user_id, [])[-limit:]

Smart Document Assistant (RAG + Memory)

The final class combines the RAG pipeline with the long‑term memory manager, builds a workflow, and runs an interactive chat loop.

class SmartDocumentAssistant:
    """RAG + long‑term memory assistant"""
    def __init__(self, document_url):
        self.vector_store = self.setup_rag(document_url)
        self.memory_manager = LongTermMemoryManager()
        self.model = ChatOpenAI(model="gpt-3.5-turbo")
        self.workflow = self.build_workflow()
    def setup_rag(self, document_url):
        loader = WebBaseLoader(document_url)
        docs = loader.load()
        splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
        splits = splitter.split_documents(docs)
        embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
        store = InMemoryVectorStore(embeddings)
        store.add_documents(documents=splits)
        return store
    def build_workflow(self):
        builder = StateGraph(State)
        builder.add_node("smart_ask", self.smart_ask_with_memory)
        builder.set_entry_point("smart_ask")
        builder.add_edge("smart_ask", END)
        return builder.compile(checkpointer=InMemorySaver(), store=InMemoryStore())
    def smart_ask_with_memory(self, state: State) -> State:
        user_id = "current_user"
        past = self.memory_manager.get_user_memories(user_id)
        memory_context = "

Previous highlights:
" + "
".join([f"- {m['summary'][:100]}..." for m in past]) if past else ""
        user_query = input("
💬 Your question: ")
        retrieved = self.vector_store.similarity_search(user_query, k=3)
        rag_context = "

".join([doc.page_content for doc in retrieved])
        prompt = f"""{memory_context}
Relevant docs:
{rag_context}

User question:
{user_query}

Answer based on the above. If no info, say so."""
        response = self.model.invoke(prompt)
        print(f"
🤖 Assistant: {response.content}
")
        if "important" in user_query.lower() or "remember" in user_query.lower():
            print("(Marked as important, will be stored long‑term)")
        return {"messages": [{"role": "user", "content": user_query}, {"role": "assistant", "content": response.content}]}
    def chat(self, user_id="default_user"):
        config = {"recursion_limit": 50, "configurable": {"thread_id": user_id}}
        print("=" * 60)
        print("Smart Document Assistant started!")
        print("I can: 1) Answer doc questions 2) Remember important dialogs")
        print("Type 'exit' to quit")
        print("=" * 60)
        self.workflow.invoke({"messages": [], "iteration": 0}, config=config)

if __name__ == "__main__":
    assistant = SmartDocumentAssistant("https://docs.swmansion.com/react-native-execuTorch/")
    assistant.chat("user_001")

Pitfalls & Best Practices

RAG Common Issues

Inaccurate retrieval : Returns unrelated snippets. Fix : Tune chunk_size, choose better embedding model, add metadata filters.

Context too long : Exceeds token limits. Fix : Summarize, use selective retrieval, or paginate results.

Long‑Term Memory Tips

Summarize instead of storing raw dialogs to keep memory compact.

Organize memories by topic for efficient lookup.

Privacy : Clearly inform users what is stored and provide a way to delete memories.

Conclusion

By following the steps above, you equip a LangGraph agent with two powerful capabilities: RAG for up‑to‑date factual answers from arbitrary documents, and long‑term memory for seamless multi‑turn interactions across sessions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LangChainRAGEmbeddingAI AgentLong-Term MemoryVector StoreLangGraph
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.