Artificial Intelligence 12 min read

LangChain Memory Best Practices: Avoid Common Pitfalls and Choose the Right Module

This article dissects the most frequent LangChain Memory pitfalls—missing placeholders, wrong memory type, shared instances, and multi‑process issues—provides correct code patterns, compares the five built‑in memory classes, introduces the new RunnableWithMessageHistory approach, and offers a production‑ready checklist.

James' Growth Diary

Apr 25, 2026

LangChain Memory Best Practices: Avoid Common Pitfalls and Choose the Right Module

01 Understanding Memory in LangChain

Memory injects historical messages into the prompt on each call. The execution flow is:

user input
↓
Memory.load_memory_variables()   ← read history, inject into Prompt
↓
LLM inference
↓
Memory.save_context()           ← write history
↓
return output

02 Most Frequent Pitfall: Memory Not Invoked

Wrong code omits the {history} placeholder, so no history is injected:

import { ConversationBufferMemory } from "langchain/memory";
import { ChatOpenAI } from "@langchain/openai";
import { LLMChain } from "langchain/chains";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const memory = new ConversationBufferMemory();
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });

// ❌ Prompt lacks {history}
const prompt = ChatPromptTemplate.fromMessages([
  ["system", "你是一个助手。"],
  ["human", "{input}"],
]);

const chain = new LLMChain({ llm, prompt, memory });
await chain.call({ input: "我叫张三" });
await chain.call({ input: "我叫什么名字？" }); // forgets!

Correct code adds a MessagesPlaceholder("history") and aligns the memory key:

import { ConversationBufferMemory } from "langchain/memory";
import { ChatOpenAI } from "@langchain/openai";
import { LLMChain } from "langchain/chains";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";

const memory = new ConversationBufferMemory({
  returnMessages: true,
  memoryKey: "history",
});
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "你是一个助手。"],
  new MessagesPlaceholder("history"), // ← required
  ["human", "{input}"],
]);

const chain = new LLMChain({ llm, prompt, memory });
await chain.call({ input: "我叫张三" });
const result = await chain.call({ input: "我叫什么名字？" });
console.log(result.text); // 输出：你叫张三

03 Five Built‑In Memory Types and When to Use Them

LangChain provides five built‑in memory classes, each suited to different conversation lengths and resource constraints:

ConversationBufferMemory – stores the full history; appropriate for short dialogues (<20 turns).

ConversationBufferWindowMemory – keeps only the most recent K turns; useful for medium‑length dialogues such as customer‑service Q&A.

ConversationTokenBufferMemory – trims the buffer by token count to stay within a token limit; fits resource‑sensitive deployments.

ConversationSummaryBufferMemory – retains recent messages verbatim and summarizes earlier turns; ideal for very long sessions where precision matters.

VectorStoreRetrieverMemory – performs similarity search over a vector store to retrieve relevant past snippets; suited for knowledge‑intensive agents.

Typical production usage finds about 90 % of scenarios satisfied by ConversationSummaryBufferMemory.

04 ConversationSummaryBufferMemory: The Underrated Choice

This memory keeps recent messages raw while compressing earlier turns into an LLM‑generated summary, mimicking human memory.

import { ConversationSummaryBufferMemory } from "langchain/memory";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });
const memory = new ConversationSummaryBufferMemory({
  llm,
  maxTokenLimit: 2000, // triggers summarization when exceeded
  returnMessages: true,
  memoryKey: "history",
});
// As the dialogue grows, early turns are compressed into a summary while recent turns stay raw.

maxTokenLimit

is the threshold that starts compression; the actual prompt may exceed it slightly because the summary itself consumes tokens. A practical setting is roughly 30 % of the model’s context window, leaving room for user input and LLM output.

05 Fatal Multi‑User Pitfall: Shared Memory Instance

Using a global singleton memory causes histories of different users to mix, which only appears under load.

// ❌ Dangerous: global singleton shared by all users
const sharedMemory = new ConversationBufferMemory();
app.post("/chat", async (req, res) => {
  const chain = new LLMChain({ llm, prompt, memory: sharedMemory });
  const result = await chain.call({ input: req.body.message });
  res.json({ reply: result.text }); // User A sees User B's history!
});

Correct approach: create a separate memory per session and optionally clean up idle sessions.

import { ConversationSummaryBufferMemory } from "langchain/memory";
import { ChatOpenAI } from "@langchain/openai";

const memoryStore = new Map<string, ConversationSummaryBufferMemory>();
function getMemory(sessionId) {
  if (!memoryStore.has(sessionId)) {
    memoryStore.set(sessionId, new ConversationSummaryBufferMemory({
      llm: new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
      maxTokenLimit: 2000,
      returnMessages: true,
      memoryKey: "history",
    }));
  }
  return memoryStore.get(sessionId);
}

app.post("/chat", async (req, res) => {
  const { sessionId, message } = req.body;
  const memory = getMemory(sessionId); // ← per‑user isolation
  const chain = new LLMChain({ llm, prompt, memory });
  const result = await chain.call({ input: message });
  res.json({ reply: result.text });
});

// Simple TTL cleanup (30 min inactivity)
const lastActivity = new Map();
setInterval(() => {
  const now = Date.now();
  for (const [sid, ts] of lastActivity) {
    if (now - ts > 30 * 60 * 1000) {
      memoryStore.delete(sid);
      lastActivity.delete(sid);
    }
  }
}, 5 * 60 * 1000);

06 New Recommended Pattern: RunnableWithMessageHistory (LCEL)

Since LangChain v0.3, the library recommends using RunnableWithMessageHistory instead of coupling Memory to a Chain. This separates history management from the LLM logic and integrates cleanly with LangGraph.

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { InMemoryChatMessageHistory } from "@langchain/core/chat_history";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });
const prompt = ChatPromptTemplate.fromMessages([
  ["system", "你是一个助手。"],
  new MessagesPlaceholder("history"),
  ["human", "{input}"],
]);
const chain = prompt.pipe(llm);

const store = {};
function getHistory(sessionId) {
  if (!store[sessionId]) store[sessionId] = new InMemoryChatMessageHistory();
  return store[sessionId];
}

const chainWithHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: getHistory,
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

const result = await chainWithHistory.invoke(
  { input: "我叫张三" },
  { configurable: { sessionId: "user-123" } } // ← key
);

This pattern aligns with LangGraph’s checkpoint mechanism; when using LangGraph, its own persistence can replace the explicit Memory module.

07 Production Self‑Check Checklist

□ Prompt contains MessagesPlaceholder or {history}
□ Memory instances are isolated per sessionId (no global singleton)
□ maxTokenLimit is set (≈30 % of context window)
□ SummaryBufferMemory’s llm parameter is specified
□ In multi‑process/K8s deployments, use external storage (Redis/DB) for history
□ TTL cleanup prevents unbounded Map growth
□ Tested ≥20 turns with summary‑preserved semantics
□ Multi‑user concurrency tests confirm no cross‑talk

When deploying multiple processes or containers, in‑memory Maps are not shared; external stores like RedisChatMessageHistory should be used:

import { RedisChatMessageHistory } from "@langchain/redis";

function getHistory(sessionId) {
  return new RedisChatMessageHistory({
    sessionId,
    client: redisClient,
    ttl: 1800, // 30 min expiration
  });
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LangChain Memory Prompt LLMChain ChatOpenAI ConversationSummaryBufferMemory RunnableWithMessageHistory

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.