How to Build a Local RAG Knowledge Base with DeepSeek‑R1 and Ollama
This article walks through setting up a local Retrieval‑Augmented Generation (RAG) system using the open‑source DeepSeek‑R1 model run via Ollama, covering installation, model selection, PDF ingestion with LangChain, semantic chunking, FAISS vector store creation, RetrievalQA chain construction, and a Streamlit UI for querying.
Introduction
This guide shows how to build a Retrieval‑Augmented Generation (RAG) system that answers questions from a PDF using the open‑source inference model DeepSeek‑R1 and the local model runtime Ollama.
Why DeepSeek‑R1?
Focused retrieval : each answer is generated from only three document fragments.
Strict prompting : the model is instructed to reply "I don’t know" when uncertain, reducing hallucinations.
Local execution : no latency from external APIs.
1. Install Ollama
Ollama provides a local server for running DeepSeek‑R1 and other models.
ollama run deepseek-r1 # Runs the default 7B model2. Choose a DeepSeek‑R1 Variant
DeepSeek‑R1 is available from 1.5B to 671B parameters. For a lightweight RAG demo the 1.5B variant is used. ollama run deepseek-r1:1.5b Larger models (e.g., 70B) provide stronger reasoning but require more memory.
3. Import Required Libraries
Python packages used:
LangChain – document loading, chunking, embeddings, and vector stores.
Streamlit – web interface for user interaction.
import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, RetrievalQA, StuffDocumentsChain4. Upload and Process PDF
Streamlit’s file uploader saves the uploaded PDF temporarily and loads its raw text with PDFPlumberLoader.
# Streamlit file uploader
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")
if uploaded_file:
with open("temp.pdf", "wb") as f:
f.write(uploaded_file.getvalue())
loader = PDFPlumberLoader("temp.pdf")
docs = loader.load()5. Semantic Chunking
Text is split into coherent semantic chunks using SemanticChunker backed by HuggingFaceEmbeddings. Semantic chunking keeps related sentences together and avoids breaking tables or figures, which improves retrieval quality.
# Split text into semantic chunks
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)6. Create a Knowledge Base (FAISS Index)
Embeddings are generated for each chunk and stored in a FAISS vector store. The retriever is configured to return the top‑3 most relevant chunks.
# Generate embeddings and build FAISS index
embeddings = HuggingFaceEmbeddings()
vector_store = FAISS.from_documents(documents, embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 3}) # fetch top 3 chunks7. Configure DeepSeek‑R1 RetrievalQA Chain
An Ollama LLM instance points to the 1.5B DeepSeek‑R1 model. A strict prompt template forces the model to answer only from the provided context.
llm = Ollama(model="deepseek-r1:1.5b")
prompt = """
1. Use ONLY the context below.
2. If unsure, say \"I don’t know\".
3. Keep answers under 4 sentences.
Context: {context}
Question: {question}
Answer:
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)8. Build the RAG Pipeline
# Chain 1: generate answers
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)
# Chain 2: format document chunks
document_prompt = PromptTemplate(
template="Context:
content:{page_content}
source:{source}",
input_variables=["page_content", "source"]
)
# Final RetrievalQA pipeline
qa = RetrievalQA(
combine_documents_chain=StuffDocumentsChain(
llm_chain=llm_chain,
document_prompt=document_prompt
),
retriever=retriever
)This pipeline retrieves the most relevant chunks, feeds them to the LLM, and returns a concise answer grounded in the PDF.
9. Launch the Streamlit User Interface
# Streamlit UI for querying
user_input = st.text_input("Ask your PDF a question:")
if user_input:
with st.spinner("Thinking..."):
response = qa(user_input)["result"]
st.write(response)Any query triggers the RAG chain, which returns an answer based on the uploaded PDF.
Conclusion
DeepSeek‑R1 provides a cost‑effective foundation for local RAG applications. The source code for the full example is available at: https://gist.github.com/lisakim0/0204d7504d17cefceaf2d37261c1b7d5
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
