How to Build a Multi‑Turn LangChain Chatbot with Memory Storage (Part 4)
This tutorial revisits LangChain basics, demonstrates how to add multi‑turn memory to a chatbot, shows streaming output via the stream method, and walks through creating a full‑stack web interface with Gradio, providing complete Python code examples for each step.
Recap and Goal
The article reviews the first three parts of the LangChain AI Agent series—basic concepts, large‑model integration, and chain construction—then guides the reader to build a full‑stack chatbot that supports multi‑turn dialogue and memory storage.
1. Single‑Turn Chatbot Example
The basic workflow for a LangChain chatbot includes importing dependencies, initializing a prompt, creating a model with init_chat_model, linking the prompt and model with LCEL syntax, and invoking the chain:
from langchain_core.output_parsers import StrOutputParser
from langchain.chat_models import init_chat_model
from langchain.prompts import ChatPromptTemplate
chatbot_prompt = ChatPromptTemplate.from_messages([
("system", "你叫苍井空,是日本著名女演员。"),
("user", "{input}")
])
model = init_chat_model(
model="Qwen/Qwen3-8B",
model_provider="openai",
base_url="https://api.siliconflow.cn/v1/",
api_key="你注册的硅基流动api key"
)
basic_qa_chain = chatbot_prompt | model | StrOutputParser()
question = "你好,请你介绍一下你自己。"
result = basic_qa_chain.invoke(question)
print(result)The execution result (shown in the first screenshot) confirms that the model returns a correct answer.
2. Adding Multi‑Turn Memory
To enable multi‑turn dialogue, a message list is manually managed and passed to a MessagesPlaceholder in the prompt. The prompt is defined as:
prompt = ChatPromptTemplate.from_messages([
SystemMessage(content="你叫苍井空,是日本著名女演员。"),
MessagesPlaceholder(variable_name="messages")
])During each turn, the user message is appended to messages_list, the chain is invoked with the full list, and the assistant reply is appended back. The core loop looks like:
messages_list.append(HumanMessage(content=user_query))
assistant_reply = chain.invoke({"messages": messages_list})
print("小苍:", assistant_reply)
messages_list.append(AIMessage(content=assistant_reply))
messages_list = messages_list[-50:] # keep recent 50 messagesThe resulting multi‑turn interaction is displayed in the second screenshot.
3. Streaming Output
Standard LangChain calls wait for the entire response before printing, which feels slow. Replacing invoke with stream (or astream for async) yields token‑by‑token output. The modified section is:
# 2) Call model with streaming
assistant_reply = ''
print('苍老师:', end=' ')
for chunk in chain.stream({"messages": messages_list}):
assistant_reply += chunk
print(chunk, end="", flush=True)
print()
messages_list.append(AIMessage(content=assistant_reply))The streaming effect is illustrated in the third screenshot.
4. Full‑Stack Web Chatbot with Gradio
The front‑end uses Gradio (installed via pip install gradio==5.23.0) to create an interactive UI. The key components are: gr.State() stores the message list.
Event binding connects the input box, submit button, and state to the async respond function.
The respond function handles empty input, appends the user message, streams the model response with qa_chain.astream, updates the chat history, and trims the history to the latest 50 entries.
A clear_history function resets the state.
import gradio as gr
from langchain.chat_models import init_chat_model
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
model = init_chat_model(
model="Qwen/Qwen3-8B",
model_provider="openai",
base_url="https://api.siliconflow.cn/v1/",
api_key="你注册的硅基流动api_key"
)
parser = StrOutputParser()
chatbot_prompt = ChatPromptTemplate.from_messages([
SystemMessage(content="你叫苍老师,是日本著名女演员。"),
MessagesPlaceholder(variable_name="messages")
])
qa_chain = chatbot_prompt | model | parser
CSS = """
.main-container {max-width: 1200px; margin: 0 auto; padding: 20px;}
.header-text {text-align: center; margin-bottom: 20px;}
"""
def create_chatbot():
with gr.Blocks(title="聊天机器人", css=CSS) as demo:
with gr.Column(elem_classes=["main-container"]):
gr.Markdown("# 🤖 LangChain智能对话机器人系统", elem_classes=["header-text"])
chatbot = gr.Chatbot(height=500, show_copy_button=True,
avatar_images=(
"https://cdn.jsdelivr.net/gh/twitter/[email protected]/assets/72x72/1f464.png",
"https://cdn.jsdelivr.net/gh/twitter/[email protected]/assets/72x72/1f916.png"
))
msg = gr.Textbox(placeholder="请输入您的问题...", container=False, scale=7)
submit = gr.Button("发送", scale=1, variant="primary")
clear = gr.Button("清空", scale=1)
state = gr.State([])
async def respond(user_msg: str, chat_hist: list, messages_list: list):
if not user_msg.strip():
yield "", chat_hist, messages_list
return
messages_list.append(HumanMessage(content=user_msg))
chat_hist = chat_hist + [(user_msg, None)]
yield "", chat_hist, messages_list
partial = ""
async for chunk in qa_chain.astream({"messages": messages_list}):
partial += chunk
chat_hist[-1] = (user_msg, partial)
yield "", chat_hist, messages_list
messages_list.append(AIMessage(content=partial))
messages_list = messages_list[-50:]
yield "", chat_hist, messages_list
def clear_history():
return [], "", []
msg.submit(respond, [msg, chatbot, state], [msg, chatbot, state])
submit.click(respond, [msg, chatbot, state], [msg, chatbot, state])
clear.click(clear_history, outputs=[chatbot, msg, state])
return demo
demo = create_chatbot()
demo.launch(server_name="0.0.0.0", server_port=7860, share=False, debug=True)Running the application launches a web page where users can converse with the chatbot in real time; the GIF at the end of the article shows the interactive multi‑turn dialogue.
Conclusion
The tutorial demonstrates how to combine LangChain's chain and memory mechanisms with Gradio's UI components to create a full‑stack, streaming, multi‑turn chatbot. It also hints at future topics such as function calling and tool integration.
Fun with Large Models
Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
