Artificial Intelligence 23 min read

Fast Guide to LangChain DeepAgents: Using Summarization Middleware to Optimize Agent Memory

This article explains how LangChain DeepAgents' Summarization middleware automatically compresses conversation history to overcome large‑model context window limits, detailing its core mechanism, applicable scenarios, configuration parameters (trigger, keep, model, summary_prompt), and step‑by‑step Python examples that illustrate its integration and internal message flow.

Fun with Large Models

Feb 25, 2026

Fast Guide to LangChain DeepAgents: Using Summarization Middleware to Optimize Agent Memory

The article focuses on the Summarization middleware built into LangChain DeepAgents, which addresses the common problem of token overflow when an Agent accumulates long conversation histories or high‑redundancy tool outputs. By summarizing older messages before they exceed the model's context window, the middleware preserves recent critical messages while compressing earlier content into a concise summary.

1. Why Summarization Is Needed

Large‑model agents must send the entire message list to each model call, but context windows are limited (e.g., 8K or 32K tokens). In long‑running tasks, multi‑turn dialogs, or when tools return verbose data, the message list quickly exhausts the token budget, causing information loss or task interruption.

2. Applicable Scenarios

Long‑text processing : Summarize accumulated page content before it fills the context.

Multi‑turn conversations : Collapse early chit‑chat or confirmations to keep focus on the current issue.

High‑redundancy tool calls : Extract key information from noisy tool outputs such as web‑scraping results.

3. Core Mechanism

The middleware inherits from AgentMiddleware and overrides the before_model hook, executing its logic before each model invocation. Its workflow is:

Check message list : Retrieve the full list of historical, user, and tool messages.

Determine trigger : Evaluate the user‑defined trigger (e.g., message count, total tokens, or fraction of the context window) to decide whether summarization is required.

Execute summarization : Preserve the newest keep messages, send the remaining older messages to a summary model, and obtain a concise summary.

Reassemble list : Wrap the summary in a HumanMessage, combine it with the retained recent messages, and pass the new shortened list to the model.

4. Quick Start – Configuring the Middleware

First install the required dependencies (e.g., langchain>=1.0.5, Python ≥ 3.12) and import the necessary classes:

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langchain_deepseek import ChatDeepSeek
from dotenv import load_dotenv
from langchain.messages import HumanMessage, AIMessage, ToolMessage
import os
from typing import Literal
from langchain_core.tools import tool
from tavily import TavilyClient

Define a simple tool for internet search and a calculator tool, then instantiate the model:

tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
load_dotenv()

@tool
def internet_search(query: str, max_results: int = 5, topic: Literal["general", "news", "finance"] = "general", include_raw_content: bool = False):
    search_docs = tavily_client.search(query, max_results=max_results, include_raw_content=include_raw_content, topic=topic)
    return search_docs

@tool
def calculate(expression: str) -> str:
    """Perform mathematical calculations and return the result."""
    return str(eval(expression))

model = ChatDeepSeek(model="deepseek-chat")

Configure the Summarization middleware with three key parameters: trigger: When the message count exceeds 5 (e.g., ("messages", 5)). keep: Retain the latest 3 messages after summarization (e.g., ("messages", 3)). model: The model used to generate the summary.

agent = create_agent(
    model=model,
    tools=[internet_search, calculate],
    middleware=[
        SummarizationMiddleware(
            model=model,
            trigger=("messages", 5),
            keep=("messages", 3),
        ),
    ],
)

Run a test where the initial state contains four historical messages, then add a new user query and invoke the agent. The result shows that the first HumanMessage in the reply is the generated summary, confirming that the middleware performed the compression.

status = {"messages": [
    HumanMessage(content='deepseek公司最近有什么最新的资讯?'),
    AIMessage(content=''),
    ToolMessage(content='...'),
]}
status["messages"].append(HumanMessage("deepseek的新模型有哪些特点与突破?"))
result = agent.invoke(status)
print(result)

5. Detailed Internal Flow

Initial state : 4 history messages + 1 new query = 5 messages, exactly at the trigger threshold (no summarization yet).

First model call : The model decides to invoke the internet_search tool.

Tool response : A ToolMessage is added, raising the total to 7 messages, which exceeds the trigger.

Pre‑second call : The middleware detects 7 > 5, triggers summarization.

Summarization step : Keeps the latest 3 messages, sends the older ones to the summary model, receives a concise summary.

Reassembly : The summary becomes a new HumanMessage and is combined with the retained 3 messages, forming a 4‑message list for the final model call.

Final reply : The model generates the answer based on the shortened list.

6. Configuration Details

The middleware offers four main parameters:

trigger : Defines when summarization occurs. It can be a single tuple or a list of tuples (OR logic). Supported condition types include message count, token count, and fraction of the context window. Example: ("messages", ">=", 10), ("tokens", ">=", 3000), ("fraction", ">=", 0.7). Multiple conditions can be combined, e.g., trigger = [("messages", ">=", 8), ("fraction", ">=", 0.8)].

keep : Specifies how many of the newest original messages to retain after summarization. It must be a single tuple, such as keep = ("messages", "=", 5) or keep = ("tokens", "<=", 1000). Lists are not allowed for keep because the retention policy must be deterministic.

model : The language model used for generating the summary. It can be a model name string (e.g., "gpt-3.5-turbo") or an instantiated model object. Lightweight models are recommended to avoid consuming the main Agent model's resources.

summary_prompt (optional): Custom prompt template for the summary model. The template must contain the {messages} placeholder. Example:

custom_prompt = PromptTemplate.from_template(
    "请将以下对话浓缩成一段简洁的总结，重点关注事实和行动项：

{messages}"
)

By configuring these parameters, developers can fine‑tune when and how the Agent’s memory is compressed, enabling efficient handling of long‑running or data‑heavy tasks.

7. Conclusion

The Summarization middleware automatically compresses dialogue history when configured thresholds are reached, preserving recent context while summarizing older messages. This “memory subtraction” helps Agents stay within token limits and maintain task continuity. The article demonstrated the middleware’s inheritance, hook type, workflow, and detailed configuration, providing a practical template for integrating summarization into LangChain DeepAgents projects.

Python AI agents LangChain token limit context window DeepAgents SummarizationMiddleware

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.