How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

This guide walks you through installing Ollama, pulling the open‑source DeepSeek R1 model, and using LangChain and Streamlit to create a locally hosted Retrieval‑Augmented Generation (RAG) system that can answer questions from uploaded PDFs without any cloud API.

Infra Learning Club
Infra Learning Club
Infra Learning Club
How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

Core Concepts

Ollama is a framework for running large language models locally without cloud APIs. It enables downloading, running, and interacting with AI models offline.

LangChain is a Python/JS library that connects LLMs to external data sources, APIs, and memory, supporting applications such as chatbots, document processing, and retrieval‑augmented generation (RAG).

RAG retrieves external data (e.g., PDFs, databases) and injects it into LLM responses to improve accuracy and reduce hallucinations.

DeepSeek R1 is an open‑source model optimized for reasoning, problem‑solving, and factual retrieval. Its strong logical capabilities make it suitable for RAG, and it can be run locally via Ollama.

Benefit Comparison (Local DeepSeek R1 vs Cloud Model)

Privacy: Local execution keeps data 100 % on‑device; cloud models send data to external servers.

Speed: Local inference is instantaneous; cloud models incur API and network latency.

Cost: After setup the local stack is free; cloud services charge per request.

Customization: Full model control locally versus limited fine‑tuning in the cloud.

Deployment: Offline local deployment versus cloud‑dependent operation.

Step‑by‑Step Setup

Step 1 – Install Ollama

Download the installer for macOS, Linux, or Windows from https://ollama.com/download.

Run the installer and follow the OS‑specific instructions.

Step 2 – Run DeepSeek R1 with Ollama

$ ollama serve

Starts the Ollama service. $ ollama pull deepseek-r1:1.5b Downloads the 1.5 B DeepSeek R1 model. $ ollama run deepseek-r1:1.5b Initializes the model and opens a prompt for queries.

Step 3 – Install Python Packages for RAG

$ pip install -U langchain langchain-community
$ pip install streamlit
$ pip install pdfplumber
$ pip install semantic-chunkers
$ pip install open-text-embeddings
$ pip install faiss
$ pip install ollama
$ pip install prompt-template
$ pip install langchain_experimental
$ pip install sentence-transformers
$ pip install faiss-cpu

Step 4 – Create the RAG Project

$ mkdir rag-system && cd rag-system
$ touch app.py

Copy the app.py script from the repository https://github.com/lengrongfu/study-demo/blob/main/llm-study/rag-system/app.py into the file.

Step 5 – Run the Streamlit Application

$ streamlit run app.py

Open http://localhost:8502/ in a browser, upload a PDF, and ask questions. The demo was verified on a macOS M3 machine; sufficient RAM is required.

Demo screenshot
Demo screenshot

Usage Flow

Ollama runs DeepSeek R1 locally.

LangChain connects the model to external data.

RAG retrieves relevant information to augment model responses.

References

[1] Ollama download: https://ollama.com/download

[2] app.py source: https://github.com/lengrongfu/study-demo/blob/main/llm-study/rag-system/app.py

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMLangChainRAGDeepSeekOllamaStreamlit
Infra Learning Club
Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.