Building AI Agents with LangGraph: Implementing RAG and Long‑Term Memory
This tutorial walks through adding Retrieval‑Augmented Generation (RAG) and persistent long‑term memory to a LangGraph AI agent, covering concepts, step‑by‑step code for document loading, vector store creation, prompt engineering, memory management, and best‑practice pitfalls.
What are RAG and Long‑Term Memory?
RAG (Retrieval‑Augmented Generation) equips an LLM with an external "knowledge shelf" that retrieves relevant documents before generating answers, overcoming the limitation of relying solely on pre‑trained knowledge. Long‑term memory extends conversation context across sessions, allowing the agent to recall past dialogues like a "memory palace".
Adding RAG Capability to a LangGraph Agent
Step 1: Load and Split Documents
We use WebBaseLoader to fetch the React Native ExecuTorch documentation and split it into ~1000‑character chunks with a 200‑character overlap.
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load document
loader = WebBaseLoader("https://docs.swmansion.com/react-native-executorch/")
docs = loader.load()
print(f"Loaded {len(docs)} documents, total characters: {len(docs[0].page_content)}")
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
print(f"Split into {len(all_splits)} text chunks")Step 2: Create a Vector Store
We generate embeddings with sentence‑transformers/all‑MiniLM‑L6‑v2 and store the vectors in an in‑memory vector store.
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=all_splits)
print("Vector store ready!")Step 3: Wrap the Query Function with RAG
The function first retrieves the top‑k relevant chunks, builds a context string, and then prompts the LLM.
def ask_llm_with_rag(state):
"""Enhanced ask: retrieve then generate"""
user_query = input("Enter your question: ")
retrieved_docs = vector_store.similarity_search(user_query, k=3)
print(f"Retrieved {len(retrieved_docs)} relevant snippets")
context = "
---
".join([doc.page_content for doc in retrieved_docs])
user_message = HumanMessage(
f"""Please answer based on the following context. If the context does not contain relevant information, say you don't know.
Context:
{context}
User question:
{user_query}
Provide a concise answer:"""
)
answer_message = model.invoke(state["messages"] + [user_message])
print(f"
🤖 AI answer: {answer_message.content}
")
return {"messages": [user_message, answer_message]}Step 4: Full RAG‑Enabled Graph
We assemble the graph with StateGraph, add the ask node, set it as the entry point, and compile.
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
model = ChatOpenAI(model="gpt-3.5-turbo")
graph_builder = StateGraph(State)
graph_builder.add_node("ask", ask_llm_with_rag)
graph_builder.set_entry_point("ask")
graph_builder.add_edge("ask", END)
graph = graph_builder.compile()
initial_state = {"messages": [], "iteration": 0}
result = graph.invoke(initial_state)
print("=" * 50)
print("Test question: What is React Native ExecuTorch?")
print("=" * 50)Adding Long‑Term Memory
Step 1: Set Up Memory Storage
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
checkpointer = InMemorySaver()
store = InMemoryStore()
workflow = graph.compile(checkpointer=checkpointer, store=store)Step 2: Manage Sessions with Thread IDs
config = {
"recursion_limit": 100,
"configurable": {"thread_id": "user_123_session_1"}
}
# First conversation
print("=== First conversation ===")
workflow.invoke({"messages": [], "iteration": 0}, config=config)
current_state = workflow.get_state(config)
print(f"Current round: {current_state.values['iteration']}")
# Second conversation (same thread)
print("
=== Second conversation (continuation) ===")
workflow.invoke(current_state, config=config)Step 3: Implement Persistent Memory Manager
A simple class stores summarized memories in a JSON file, keeps the latest 50 entries, and provides retrieval.
import json
from datetime import datetime
class LongTermMemoryManager:
"""Long‑term memory manager"""
def __init__(self, storage_path="memory_storage.json"):
self.storage_path = storage_path
self.memories = self.load_memories()
def load_memories(self):
try:
with open(self.storage_path, "r", encoding="utf-8") as f:
return json.load(f)
except FileNotFoundError:
return {}
def save_memory(self, user_id, conversation_summary, key_points):
if user_id not in self.memories:
self.memories[user_id] = []
entry = {"timestamp": datetime.now().isoformat(), "summary": conversation_summary, "key_points": key_points}
self.memories[user_id].append(entry)
if len(self.memories[user_id]) > 50:
self.memories[user_id] = self.memories[user_id][-50:]
self.save_to_disk()
def save_to_disk(self):
with open(self.storage_path, "w", encoding="utf-8") as f:
json.dump(self.memories, f, ensure_ascii=False, indent=2)
def get_user_memories(self, user_id, limit=5):
return self.memories.get(user_id, [])[-limit:]Smart Document Assistant (RAG + Memory)
The final class combines the RAG pipeline with the long‑term memory manager, builds a workflow, and runs an interactive chat loop.
class SmartDocumentAssistant:
"""RAG + long‑term memory assistant"""
def __init__(self, document_url):
self.vector_store = self.setup_rag(document_url)
self.memory_manager = LongTermMemoryManager()
self.model = ChatOpenAI(model="gpt-3.5-turbo")
self.workflow = self.build_workflow()
def setup_rag(self, document_url):
loader = WebBaseLoader(document_url)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
store = InMemoryVectorStore(embeddings)
store.add_documents(documents=splits)
return store
def build_workflow(self):
builder = StateGraph(State)
builder.add_node("smart_ask", self.smart_ask_with_memory)
builder.set_entry_point("smart_ask")
builder.add_edge("smart_ask", END)
return builder.compile(checkpointer=InMemorySaver(), store=InMemoryStore())
def smart_ask_with_memory(self, state: State) -> State:
user_id = "current_user"
past = self.memory_manager.get_user_memories(user_id)
memory_context = "
Previous highlights:
" + "
".join([f"- {m['summary'][:100]}..." for m in past]) if past else ""
user_query = input("
💬 Your question: ")
retrieved = self.vector_store.similarity_search(user_query, k=3)
rag_context = "
".join([doc.page_content for doc in retrieved])
prompt = f"""{memory_context}
Relevant docs:
{rag_context}
User question:
{user_query}
Answer based on the above. If no info, say so."""
response = self.model.invoke(prompt)
print(f"
🤖 Assistant: {response.content}
")
if "important" in user_query.lower() or "remember" in user_query.lower():
print("(Marked as important, will be stored long‑term)")
return {"messages": [{"role": "user", "content": user_query}, {"role": "assistant", "content": response.content}]}
def chat(self, user_id="default_user"):
config = {"recursion_limit": 50, "configurable": {"thread_id": user_id}}
print("=" * 60)
print("Smart Document Assistant started!")
print("I can: 1) Answer doc questions 2) Remember important dialogs")
print("Type 'exit' to quit")
print("=" * 60)
self.workflow.invoke({"messages": [], "iteration": 0}, config=config)
if __name__ == "__main__":
assistant = SmartDocumentAssistant("https://docs.swmansion.com/react-native-execuTorch/")
assistant.chat("user_001")Pitfalls & Best Practices
RAG Common Issues
Inaccurate retrieval : Returns unrelated snippets. Fix : Tune chunk_size, choose better embedding model, add metadata filters.
Context too long : Exceeds token limits. Fix : Summarize, use selective retrieval, or paginate results.
Long‑Term Memory Tips
Summarize instead of storing raw dialogs to keep memory compact.
Organize memories by topic for efficient lookup.
Privacy : Clearly inform users what is stored and provide a way to delete memories.
Conclusion
By following the steps above, you equip a LangGraph agent with two powerful capabilities: RAG for up‑to‑date factual answers from arbitrary documents, and long‑term memory for seamless multi‑turn interactions across sessions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
