How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain
This guide walks you through installing Ollama, pulling the open‑source DeepSeek R1 model, and using LangChain and Streamlit to create a locally hosted Retrieval‑Augmented Generation (RAG) system that can answer questions from uploaded PDFs without any cloud API.
Core Concepts
Ollama is a framework for running large language models locally without cloud APIs. It enables downloading, running, and interacting with AI models offline.
LangChain is a Python/JS library that connects LLMs to external data sources, APIs, and memory, supporting applications such as chatbots, document processing, and retrieval‑augmented generation (RAG).
RAG retrieves external data (e.g., PDFs, databases) and injects it into LLM responses to improve accuracy and reduce hallucinations.
DeepSeek R1 is an open‑source model optimized for reasoning, problem‑solving, and factual retrieval. Its strong logical capabilities make it suitable for RAG, and it can be run locally via Ollama.
Benefit Comparison (Local DeepSeek R1 vs Cloud Model)
Privacy: Local execution keeps data 100 % on‑device; cloud models send data to external servers.
Speed: Local inference is instantaneous; cloud models incur API and network latency.
Cost: After setup the local stack is free; cloud services charge per request.
Customization: Full model control locally versus limited fine‑tuning in the cloud.
Deployment: Offline local deployment versus cloud‑dependent operation.
Step‑by‑Step Setup
Step 1 – Install Ollama
Download the installer for macOS, Linux, or Windows from https://ollama.com/download.
Run the installer and follow the OS‑specific instructions.
Step 2 – Run DeepSeek R1 with Ollama
$ ollama serveStarts the Ollama service. $ ollama pull deepseek-r1:1.5b Downloads the 1.5 B DeepSeek R1 model. $ ollama run deepseek-r1:1.5b Initializes the model and opens a prompt for queries.
Step 3 – Install Python Packages for RAG
$ pip install -U langchain langchain-community
$ pip install streamlit
$ pip install pdfplumber
$ pip install semantic-chunkers
$ pip install open-text-embeddings
$ pip install faiss
$ pip install ollama
$ pip install prompt-template
$ pip install langchain_experimental
$ pip install sentence-transformers
$ pip install faiss-cpuStep 4 – Create the RAG Project
$ mkdir rag-system && cd rag-system
$ touch app.pyCopy the app.py script from the repository https://github.com/lengrongfu/study-demo/blob/main/llm-study/rag-system/app.py into the file.
Step 5 – Run the Streamlit Application
$ streamlit run app.pyOpen http://localhost:8502/ in a browser, upload a PDF, and ask questions. The demo was verified on a macOS M3 machine; sufficient RAM is required.
Usage Flow
Ollama runs DeepSeek R1 locally.
LangChain connects the model to external data.
RAG retrieves relevant information to augment model responses.
References
[1] Ollama download: https://ollama.com/download
[2] app.py source: https://github.com/lengrongfu/study-demo/blob/main/llm-study/rag-system/app.py
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
