Artificial Intelligence 5 min read

Deploy DeepSeek‑R1 Locally with Ollama and Connect It to Webman AI

This guide walks you through installing Ollama, selecting the appropriate DeepSeek‑R1 model size based on GPU memory, running the model locally, and optionally integrating it with Webman AI for a richer user experience.

Open Source Tech Hub

Feb 11, 2025

Deploy DeepSeek‑R1 Locally with Ollama and Connect It to Webman AI

Overview

DeepSeek‑R1 is an open‑source large language model released under the MIT license, permitting unrestricted commercial use. Its benchmark results are comparable to leading proprietary models.

Why Deploy Locally?

Public DeepSeek endpoints have suffered intermittent outages due to external attacks. Running the model on a local GPU guarantees reliable, continuous access.

Prerequisites

A computer equipped with a GPU. The amount of VRAM determines which model size can be run efficiently.

Install Ollama

Download the installer from https://ollama.com/download and follow the platform‑specific installation steps.

Select Model Version

Choose a model size that matches your GPU’s VRAM. Use the corresponding ollama run command to launch the model.

1.5B model – ~4 GB VRAM (e.g., GTX 1050) – ollama run deepseek-r1:1.5b 8B model – 8‑10 GB VRAM (e.g., GTX 1660) – ollama run deepseek-r1:8b 14B model – ≥12 GB VRAM, 16 GB recommended (e.g., RTX 3060) – ollama run deepseek-r1:14b 32B model – ≥16 GB VRAM, 21 GB recommended (e.g., RTX 3060) – ollama run deepseek-r1:32b 70B model – ≥24 GB VRAM, 40 GB recommended (e.g., RTX 3090 / RTX 4090) – ollama run deepseek-r1:70b 671B model – 1.34 TB VRAM (e.g., 16 × NVIDIA A100 80GB) –

ollama run deepseek-r1:671b

If the GPU does not meet the VRAM requirement, Ollama will fall back to a CPU‑plus‑GPU mode, which is functional but very slow (approximately 1–2 characters per second).