How to Build a Multi‑Turn LangChain Chatbot with Memory Storage (Part 4)

This tutorial revisits LangChain basics, demonstrates how to add multi‑turn memory to a chatbot, shows streaming output via the stream method, and walks through creating a full‑stack web interface with Gradio, providing complete Python code examples for each step.

Fun with Large Models
Fun with Large Models
Fun with Large Models
How to Build a Multi‑Turn LangChain Chatbot with Memory Storage (Part 4)

Recap and Goal

The article reviews the first three parts of the LangChain AI Agent series—basic concepts, large‑model integration, and chain construction—then guides the reader to build a full‑stack chatbot that supports multi‑turn dialogue and memory storage.

1. Single‑Turn Chatbot Example

The basic workflow for a LangChain chatbot includes importing dependencies, initializing a prompt, creating a model with init_chat_model, linking the prompt and model with LCEL syntax, and invoking the chain:

from langchain_core.output_parsers import StrOutputParser
from langchain.chat_models import init_chat_model
from langchain.prompts import ChatPromptTemplate

chatbot_prompt = ChatPromptTemplate.from_messages([
    ("system", "你叫苍井空,是日本著名女演员。"),
    ("user", "{input}")
])
model = init_chat_model(
    model="Qwen/Qwen3-8B",
    model_provider="openai",
    base_url="https://api.siliconflow.cn/v1/",
    api_key="你注册的硅基流动api key"
)
basic_qa_chain = chatbot_prompt | model | StrOutputParser()
question = "你好,请你介绍一下你自己。"
result = basic_qa_chain.invoke(question)
print(result)

The execution result (shown in the first screenshot) confirms that the model returns a correct answer.

2. Adding Multi‑Turn Memory

To enable multi‑turn dialogue, a message list is manually managed and passed to a MessagesPlaceholder in the prompt. The prompt is defined as:

prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="你叫苍井空,是日本著名女演员。"),
    MessagesPlaceholder(variable_name="messages")
])

During each turn, the user message is appended to messages_list, the chain is invoked with the full list, and the assistant reply is appended back. The core loop looks like:

messages_list.append(HumanMessage(content=user_query))
assistant_reply = chain.invoke({"messages": messages_list})
print("小苍:", assistant_reply)
messages_list.append(AIMessage(content=assistant_reply))
messages_list = messages_list[-50:]  # keep recent 50 messages

The resulting multi‑turn interaction is displayed in the second screenshot.

3. Streaming Output

Standard LangChain calls wait for the entire response before printing, which feels slow. Replacing invoke with stream (or astream for async) yields token‑by‑token output. The modified section is:

# 2) Call model with streaming
assistant_reply = ''
print('苍老师:', end=' ')
for chunk in chain.stream({"messages": messages_list}):
    assistant_reply += chunk
    print(chunk, end="", flush=True)
print()
messages_list.append(AIMessage(content=assistant_reply))

The streaming effect is illustrated in the third screenshot.

4. Full‑Stack Web Chatbot with Gradio

The front‑end uses Gradio (installed via pip install gradio==5.23.0) to create an interactive UI. The key components are: gr.State() stores the message list.

Event binding connects the input box, submit button, and state to the async respond function.

The respond function handles empty input, appends the user message, streams the model response with qa_chain.astream, updates the chat history, and trims the history to the latest 50 entries.

A clear_history function resets the state.

import gradio as gr
from langchain.chat_models import init_chat_model
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

model = init_chat_model(
    model="Qwen/Qwen3-8B",
    model_provider="openai",
    base_url="https://api.siliconflow.cn/v1/",
    api_key="你注册的硅基流动api_key"
)
parser = StrOutputParser()
chatbot_prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="你叫苍老师,是日本著名女演员。"),
    MessagesPlaceholder(variable_name="messages")
])
qa_chain = chatbot_prompt | model | parser

CSS = """
.main-container {max-width: 1200px; margin: 0 auto; padding: 20px;}
.header-text {text-align: center; margin-bottom: 20px;}
"""

def create_chatbot():
    with gr.Blocks(title="聊天机器人", css=CSS) as demo:
        with gr.Column(elem_classes=["main-container"]):
            gr.Markdown("# 🤖 LangChain智能对话机器人系统", elem_classes=["header-text"])
            chatbot = gr.Chatbot(height=500, show_copy_button=True,
                               avatar_images=(
                                   "https://cdn.jsdelivr.net/gh/twitter/[email protected]/assets/72x72/1f464.png",
                                   "https://cdn.jsdelivr.net/gh/twitter/[email protected]/assets/72x72/1f916.png"
                               ))
            msg = gr.Textbox(placeholder="请输入您的问题...", container=False, scale=7)
            submit = gr.Button("发送", scale=1, variant="primary")
            clear = gr.Button("清空", scale=1)
        state = gr.State([])

        async def respond(user_msg: str, chat_hist: list, messages_list: list):
            if not user_msg.strip():
                yield "", chat_hist, messages_list
                return
            messages_list.append(HumanMessage(content=user_msg))
            chat_hist = chat_hist + [(user_msg, None)]
            yield "", chat_hist, messages_list
            partial = ""
            async for chunk in qa_chain.astream({"messages": messages_list}):
                partial += chunk
                chat_hist[-1] = (user_msg, partial)
                yield "", chat_hist, messages_list
            messages_list.append(AIMessage(content=partial))
            messages_list = messages_list[-50:]
            yield "", chat_hist, messages_list

        def clear_history():
            return [], "", []

        msg.submit(respond, [msg, chatbot, state], [msg, chatbot, state])
        submit.click(respond, [msg, chatbot, state], [msg, chatbot, state])
        clear.click(clear_history, outputs=[chatbot, msg, state])
    return demo

demo = create_chatbot()
demo.launch(server_name="0.0.0.0", server_port=7860, share=False, debug=True)

Running the application launches a web page where users can converse with the chatbot in real time; the GIF at the end of the article shows the interactive multi‑turn dialogue.

Conclusion

The tutorial demonstrates how to combine LangChain's chain and memory mechanisms with Gradio's UI components to create a full‑stack, streaming, multi‑turn chatbot. It also hints at future topics such as function calling and tool integration.

Single‑turn chatbot output
Single‑turn chatbot output
Multi‑turn chatbot output
Multi‑turn chatbot output
Streaming output
Streaming output
Web UI interaction
Web UI interaction
PythonLangChainMemoryChatbotMulti-turn ConversationGradio
Fun with Large Models
Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.