How to Deploy DeepSeek R1 Locally: A Step‑by‑Step Guide for AI Enthusiasts
This guide explains what DeepSeek R1 is, compares its full and distilled versions, details hardware requirements for Linux, Windows, and macOS, and provides step‑by‑step instructions for local deployment using Ollama, LM Studio, Docker, and visual interfaces like Open‑WebUI and Dify.
1. What is DeepSeek R1
DeepSeek‑R1, released 2025‑01‑20, is the first inference model from DeepSeek AI, designed for complex reasoning tasks such as mathematics, code generation, and logic. It offers a full version (671 B parameters) and distilled versions ranging from 1.5 B to 70 B parameters.
The full version requires at least 350 GB VRAM/CPU memory; distilled versions run on modest hardware.
Full version (671 B): requires ≥350 GB VRAM/memory, suitable for professional servers.
Distilled versions: fine‑tuned from open‑source models (QWEN, LLaMA) with 1.5 B‑70 B parameters, suitable for local deployment.
2. Model Variants and Hardware Requirements
Linux
Minimum: NVIDIA GTX 1660 6 GB or AMD RX 5500 4 GB, 16 GB RAM, 50 GB storage.
Recommended: NVIDIA RTX 3060 12 GB or AMD RX 6700 10 GB, 32 GB RAM, 100 GB NVMe SSD.
High‑Performance: NVIDIA A100 40 GB or AMD MI250X 128 GB, 128 GB RAM, 200 GB NVMe SSD.
Windows
Minimum: NVIDIA GTX 1650 4 GB or AMD RX 5500 4 GB, 16 GB RAM, 50 GB storage.
Recommended: NVIDIA RTX 3060 12 GB or AMD RX 6700 10 GB, 32 GB RAM, 100 GB NVMe SSD.
High‑Performance: NVIDIA RTX 3090 24 GB or AMD RX 7900 XTX 24 GB, 64 GB RAM, 200 GB NVMe SSD.
Mac
Minimum: M2 MacBook Air (8 GB RAM).
Recommended: M2/M3 MacBook Pro (16 GB RAM).
High‑Performance: M2 Max/Ultra Mac Studio (64 GB RAM).
3. Local Installation of DeepSeek R1
Example environment: M2/M3/M4 MacBook Pro with 16 GB+ RAM, using model
deepseek-r1:8b.
Privacy: Data stays on the local device.
Offline use: No internet required after download.
Cost‑effective: No API fees.
Low latency: Direct access eliminates network delay.
Customizable: Full control over model parameters.
3.1 Deployment Tools
Supported tools include Ollama, LM Studio, and Docker.
Ollama
Available on Windows, Linux, and macOS. Command to run a 7 B model:
ollama run deepseek-r1:7bLM Studio
Desktop application for Windows and macOS with a visual interface, supports CPU+GPU hybrid inference.
Docker
Run the container with GPU support:
docker run -d --gpus=all -p 11434:11434 --name ollama ollama/ollamaFor low‑end hardware, LM Studio can be used as an alternative.
4. Visual Interfaces
Two self‑hosted UI options are described.
Open‑WebUI
Provides a web UI for interacting with LLMs without an API. Install via Docker:
<code>docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main</code>Access at
http://localhost:3000/and create an account.
Dify
An LLM application platform that supports API integration, RAG, AI agents, and various databases. After starting Dify, add Ollama as the model provider (e.g.,
http://host.docker.internal:11434).
Once configured, you can select the locally installed
deepseek-r1:8bmodel and interact with it.
5. Usage Experience
The distilled version works well for reasoning and text generation, though code generation may be less reliable. For full‑scale performance, the official DeepSeek API remains an affordable option, usable via VS Code plugins such as Continue.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.