Artificial Intelligence 13 min read

Build a Retrieval‑Augmented Generation (RAG) Chatbot with LangChain and Streamlit

This guide walks through the complete process of creating a RAG‑powered question‑answering bot using LangChain, Streamlit, and vector‑store embeddings, covering theory, architecture, data loading, chunking, vector indexing, retrieval, LLM integration, and full code implementation with practical examples.

Qborfy AI

Jun 7, 2025

Build a Retrieval‑Augmented Generation (RAG) Chatbot with LangChain and Streamlit

RAG概念

RAG (Retrieval‑Augmented Generation) combines a retriever that selects relevant document chunks with a generator (LLM) that produces answers using those chunks.

Typical RAG pipeline

Load documents → split into chunks → embed chunks → store vectors in a vector store → retrieve relevant chunks for a query → feed retrieved chunks into LLM prompt → generate final answer.

Indexing (vector store creation)

Load data with LangChain loaders (e.g., PDFLoader, TextLoader, ImageLoader).

Split documents using RecursiveCharacterTextSplitter (chunk size 1000, overlap 200).

Generate embeddings with an OllamaEmbeddings model (e.g., shaw/dmeta-embedding-zh) and store them in a Chroma vector store.

Embedding model role

Embeddings map high‑dimensional content (text, images, video) to low‑dimensional vectors, enabling efficient similarity search.

Retrieval and generation

Convert user query to a vector with the same embedding model.

Search the vector store for the top‑k most similar chunks.

Insert those chunks into the LLM prompt and generate the final answer.

Implementation details

Environment setup

# Install models via Ollama
ollama install deepseek-r1:7b
ollama install shaw/dmeta-embedding-zh:latest

# Python dependencies (versions used)
pip install streamlit==1.39.0
pip install langchain==0.3.21
pip install langchain-chroma==0.2.2
pip install langchain-community==0.3.20
pip install langchain-ollama==0.2.3

Streamlit UI (bot_chat.py)

Key sections:

File uploader – sidebar widget that accepts .txt files.

import streamlit as st
st.set_page_config(page_title="RAG测试问答", layout="wide")
st.title("RAG测试问答")
upload_file = st.sidebar.file_uploader(label="上传文件", type=["txt"])
if not upload_file:
    st.info("请上传 txt 文件")
    st.stop()

Knowledge‑base construction – cached function that writes the uploaded file to /tmp, loads it with TextLoader, splits, embeds, and creates a Chroma retriever.

@st.cache_resource(ttl="1h")
def get_knowledge_base(uploaded_file):
    import tempfile, os
    from langchain_community.document_loaders import TextLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    from langchain_ollama.embeddings import OllamaEmbeddings
    from langchain_chroma import Chroma

    temp_dir = tempfile.TemporaryDirectory(dir="/tmp")
    temp_path = os.path.join(temp_dir.name, uploaded_file.name)
    with open(temp_path, "wb") as f:
        f.write(uploaded_file.getvalue())

    docs = TextLoader(temp_path, encoding="utf-8").load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = splitter.split_documents(docs)

    embeddings = OllamaEmbeddings(base_url="http://127.0.0.1:11434",
                                 model="shaw/dmeta-embedding-zh")
    chroma_db = Chroma.from_documents(splits, embeddings)
    return chroma_db.as_retriever()

retriever = get_knowledge_base(upload_file)

Chat history handling – session state stores messages; Streamlit displays them with st.chat_message.

if "messages" not in st.session_state or st.sidebar.button("清空聊天记录"):
    st.session_state["messages"] = [{"role": "assistant",
                                     "content": "我是测试 RAG 问答小助手"}]

for msg in st.session_state["messages"]:
    st.chat_message(msg["role"]).write(msg["content"])

user_query = st.chat_input(placeholder="请输入要测试的问题")

Retriever tool and ReAct agent – creates a LangChain tool wrapping the retriever, defines a system prompt, instantiates an OllamaLLM (deepseek‑r1:7b), and builds a ReAct agent.

from langchain.tools.retriever import create_retriever_tool
from langchain.prompts import PromptTemplate
from langchain_ollama import OllamaLLM
from langchain.agents import create_react_agent, AgentExecutor

tool = create_retriever_tool(retriever=retriever,
                             name="文档检索",
                             description="根据关键词检索相关文档")
tools = [tool]

instruction = """你是一个设计用于查询文档回答问题的代理...如果从文档找不到任何信息，返回‘非常抱歉，这个问题暂时没有录入到知识库中。’"""

base_template = """{instruction}
TOOLS:
{tools}
...
{input}
{agent_scratchpad}"""

prompt = PromptTemplate.from_template(base_template).partial(instruction=instruction)

llm = OllamaLLM(base_url="http://127.0.0.1:11434", model="deepseek-r1:7b")
agent = create_react_agent(llm=llm, prompt=prompt, tools=tools)

agent_executor = AgentExecutor(agent=agent,
                             tools=tools,
                             memory=None,
                             verbose=True,
                             handle_parsing_errors="从知识库没找到对应内容或者答案")

User query execution – appends the query to history, runs the agent with a StreamlitCallbackHandler to show intermediate reasoning, and displays the final answer.

if user_query:
    st.session_state["messages"].append({"role": "user", "content": user_query})
    st.chat_message("user").write(user_query)
    with st.chat_message("assistant"):
        from langchain.callbacks import StreamlitCallbackHandler
        callback = StreamlitCallbackHandler(st.container())
        response = agent_executor.invoke({"input": user_query},
                                         config={"callbacks": [callback]})
        answer = response["output"]
        st.session_state["messages"].append({"role": "assistant", "content": answer})
        st.write(answer)

Running the app

streamlit run bot_chat.py

Key observations

Chunk size 1000 tokens with 200‑token overlap balances retrieval relevance and vector store size.

Using OllamaEmbeddings locally avoids external API latency.

Chroma provides an in‑memory vector store suitable for prototyping; for production replace with a persistent store (e.g., Pinecone, Milvus).

The ReAct agent’s prompt explicitly forces a retrieval step even when the LLM “knows” the answer, ensuring consistency with the knowledge base.

References

LangChain documentation https://python.langchain.com/docs/introduction/

LangChain Chinese guide https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide

LangChain agents article https://blog.csdn.net/qq_56591814/article/details/135040694

DeepSeek‑R1 model https://chat.deepseek.com/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python LangChain RAG Chatbot VectorStore Embeddings Streamlit

Written by

Qborfy AI

A knowledge base that logs daily experiences and learning journeys, sharing them with you to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.