Building Persistent Long‑Term Memory for LLM Agents with LangGraph – A Complete Guide

This article explains how to give large language model agents lasting memory by combining short‑term and long‑term storage in LangGraph, covering concepts, implementation details, database persistence, tool integration, semantic search, memory‑management strategies, checkpoint handling, and a multi‑agent supervisor example.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
Building Persistent Long‑Term Memory for LLM Agents with LangGraph – A Complete Guide

Why Persistent Memory Matters for LLM Agents

Agents need to retain information across interactions to act like humans, enabling context‑aware responses, personalization, and learning from feedback. The article introduces the core challenges and outlines a solution using LangGraph’s memory framework.

Agent Memory Basics

Memory is divided into short‑term (session‑level) and long‑term (persistent) stores. Short‑term memory keeps recent dialogue for immediate context, while long‑term memory holds durable knowledge that can be retrieved across sessions.

Short‑Term Memory Implementation

Using InMemorySaver as a checkpoint, the following code stores conversation state in memory:

from langchain_openai import ChatOpenAI
from langchain.chat_models import init_chat_model
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.prebuilt import create_react_agent

BASE_URL = ""
TOKEN = ""
MODEL_NAME = ""

model = init_chat_model(
    model=MODEL_NAME,
    model_provider="openai",
    base_url=BASE_URL,
    api_key=TOKEN,
    temperature=0,
)

checkpointer = InMemorySaver()
agent = create_react_agent(model=model, tools=[], checkpointer=checkpointer)

Messages are sent to the model, and the checkpoint preserves the state for each thread_id, enabling the agent to recall a user's name when the same thread is reused.

Long‑Term Memory with Databases

For production, a persistent store such as PostgreSQL is used. The example installs the required packages, connects to the database, and defines a PostgresStore alongside a PostgresSaver checkpoint:

pip install -U "psycopg[binary,pool]" langgraph-checkpoint-postgres

DB_URI = "postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable"

with PostgresStore.from_conn_string(DB_URI) as store, \
     PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    store.setup()
    # define a node that reads/writes the store
    def call_model(state, config, *, store):
        # read from store
        user_id = config["configurable"]["user_id"]
        namespace = ("memories", user_id)
        memories = store.search(namespace, query=str(state["messages"][-1].content))
        # optionally write new memory
        if "remember" in state["messages"][-1].content.lower():
            store.put(namespace, str(uuid.uuid4()), {"data": "User name is Ada"})
        response = model.invoke(state["messages"])
        return {"messages": [response]}

Two tables are created: checkpoints (state snapshots) and checkpoint_writes (actual message content). The same pattern works with Redis or MongoDB.

Memory in Subgraphs

When nesting graphs, memory can be inherited (default) or isolated by passing checkpointer=True to the subgraph compile step. Example:

subgraph_builder = StateGraph(State)
subgraph = subgraph_builder.compile(checkpointer=True)

Tools Accessing Memory

Tools can read the current state via InjectedState and write back using Command(update={...}). Example of reading user info:

def get_user_info(state: Annotated[CustomState, InjectedState]):
    user_id = state["user_id"]
    return f"User name is {user_id}"

And writing new information:

def update_user_info(tool_call_id, config):
    store = get_store()
    user_id = config["configurable"]["user_id"]
    store.put(("users",), user_id, {"name": "ada"})
    return Command(update={"messages": [ToolMessage("Saved", tool_call_id=tool_call_id)]})

Semantic Search with Custom Embeddings

A custom embedding class calls an external API to generate vectors, which are then used by InMemoryStore for similarity search:

class SelfAPIEmbeddings(Embeddings):
    def embed_documents(self, texts):
        # call external service and return list of vectors
        ...

After initializing the store with the embedding index, a semantic query retrieves relevant memories even when keywords differ.

Memory‑Management Strategies

Trim messages using trim_messages to stay within token limits.

Delete specific messages with RemoveMessage.

Summarize history via SummarizationNode from the langmem library, preserving essential information while reducing token count.

Example of trimming:

from langchain_core.messages.utils import trim_messages, count_tokens_approximately
messages = trim_messages(state["messages"], strategy="last", max_tokens=128, include_system=True)

Example of summarization:

from langmem.short_term import SummarizationNode
summarization_node = SummarizationNode(
    model=model.bind(max_tokens=128),
    max_tokens=256,
    initial_summary_prompt=summary_prompt,
    existing_summary_prompt=update_summary_prompt,
)

Checkpoint Utilities

LangGraph provides helpers to inspect and manage checkpoints:

graph.get_state(config=config)               # latest short‑term state
graph.get_state_history(config=config)       # full history for a thread
checkpointer.delete_thread(thread_id)      # reset a conversation

Multi‑Agent Supervisor with MCP Protocol

The article builds a realistic multi‑agent system using the Model Context Protocol (MCP). Three sub‑agents are created: a search assistant, a hotel‑booking assistant, and a booking‑query assistant that requires admin verification. The supervisor routes user inputs to the appropriate agent and handles interruptions.

Key steps include:

Install MCP adapters: pip install langchain-mcp-adapters Initialize the LLM model.

Create InMemoryStore and InMemorySaver for short‑ and long‑term memory.

Define tools (search, booking, query) and agents with create_react_agent.

Implement an authentication node that calls interrupt() to request an admin ID before accessing long‑term booking data.

Compile the subgraph and give it a name (e.g., booking_info_assistant).

Build the supervisor workflow with a prompt that directs the user to the appropriate assistant.

Sample interaction shows the supervisor remembering earlier queries, performing a hotel reservation, and then, after an admin interruption, retrieving the stored booking information.

Future Directions

More intelligent memory‑management policies (automatic forgetting, relevance‑based updates) and alternative multi‑agent architectures (hierarchical, custom) are suggested to scale beyond the supervisor pattern.

Agent Memory illustration
Agent Memory illustration
Memory comparison
Memory comparison
Checkpoint tables
Checkpoint tables
Postgres tables
Postgres tables
Checkpoints content
Checkpoints content
Checkpoint writes
Checkpoint writes
Message storage
Message storage
Supervisor flowchart
Supervisor flowchart
Multi‑agent architecture
Multi‑agent architecture
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMAgent Memorylong-term memoryLangGraph
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.