How to Build a Local RAG Knowledge Base with DeepSeek‑R1 and Ollama

This article walks through setting up a local Retrieval‑Augmented Generation (RAG) system using the open‑source DeepSeek‑R1 model run via Ollama, covering installation, model selection, PDF ingestion with LangChain, semantic chunking, FAISS vector store creation, RetrievalQA chain construction, and a Streamlit UI for querying.

AI Algorithm Path
AI Algorithm Path
AI Algorithm Path
How to Build a Local RAG Knowledge Base with DeepSeek‑R1 and Ollama

Introduction

This guide shows how to build a Retrieval‑Augmented Generation (RAG) system that answers questions from a PDF using the open‑source inference model DeepSeek‑R1 and the local model runtime Ollama.

Why DeepSeek‑R1?

Focused retrieval : each answer is generated from only three document fragments.

Strict prompting : the model is instructed to reply "I don’t know" when uncertain, reducing hallucinations.

Local execution : no latency from external APIs.

1. Install Ollama

Ollama provides a local server for running DeepSeek‑R1 and other models.

ollama run deepseek-r1  # Runs the default 7B model

2. Choose a DeepSeek‑R1 Variant

DeepSeek‑R1 is available from 1.5B to 671B parameters. For a lightweight RAG demo the 1.5B variant is used. ollama run deepseek-r1:1.5b Larger models (e.g., 70B) provide stronger reasoning but require more memory.

3. Import Required Libraries

Python packages used:

LangChain – document loading, chunking, embeddings, and vector stores.

Streamlit – web interface for user interaction.

import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, RetrievalQA, StuffDocumentsChain

4. Upload and Process PDF

Streamlit’s file uploader saves the uploaded PDF temporarily and loads its raw text with PDFPlumberLoader.

# Streamlit file uploader
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")
if uploaded_file:
    with open("temp.pdf", "wb") as f:
        f.write(uploaded_file.getvalue())
    loader = PDFPlumberLoader("temp.pdf")
    docs = loader.load()

5. Semantic Chunking

Text is split into coherent semantic chunks using SemanticChunker backed by HuggingFaceEmbeddings. Semantic chunking keeps related sentences together and avoids breaking tables or figures, which improves retrieval quality.

# Split text into semantic chunks
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)

6. Create a Knowledge Base (FAISS Index)

Embeddings are generated for each chunk and stored in a FAISS vector store. The retriever is configured to return the top‑3 most relevant chunks.

# Generate embeddings and build FAISS index
embeddings = HuggingFaceEmbeddings()
vector_store = FAISS.from_documents(documents, embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 3})  # fetch top 3 chunks

7. Configure DeepSeek‑R1 RetrievalQA Chain

An Ollama LLM instance points to the 1.5B DeepSeek‑R1 model. A strict prompt template forces the model to answer only from the provided context.

llm = Ollama(model="deepseek-r1:1.5b")
prompt = """
1. Use ONLY the context below.
2. If unsure, say \"I don’t know\".
3. Keep answers under 4 sentences.
Context: {context}
Question: {question}
Answer:
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)

8. Build the RAG Pipeline

# Chain 1: generate answers
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)
# Chain 2: format document chunks
document_prompt = PromptTemplate(
    template="Context:
content:{page_content}
source:{source}",
    input_variables=["page_content", "source"]
)
# Final RetrievalQA pipeline
qa = RetrievalQA(
    combine_documents_chain=StuffDocumentsChain(
        llm_chain=llm_chain,
        document_prompt=document_prompt
    ),
    retriever=retriever
)

This pipeline retrieves the most relevant chunks, feeds them to the LLM, and returns a concise answer grounded in the PDF.

9. Launch the Streamlit User Interface

# Streamlit UI for querying
user_input = st.text_input("Ask your PDF a question:")
if user_input:
    with st.spinner("Thinking..."):
        response = qa(user_input)["result"]
        st.write(response)

Any query triggers the RAG chain, which returns an answer based on the uploaded PDF.

Conclusion

DeepSeek‑R1 provides a cost‑effective foundation for local RAG applications. The source code for the full example is available at: https://gist.github.com/lisakim0/0204d7504d17cefceaf2d37261c1b7d5

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLangChainRAGFAISSDeepSeek-R1OllamaStreamlit
AI Algorithm Path
Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.