Deploying DeepSeek‑R1 Locally with Ollama and Accessing It via Spring Boot and Spring AI
This guide explains how to install Ollama, download and run the open‑source DeepSeek‑R1 language model locally, configure GPU acceleration, and integrate the model into a Spring Boot application using Spring AI to provide an API service for AI inference.
With the rapid development of large language models, developers are increasingly interested in running these powerful inference models locally. DeepSeek‑R1 is an open‑source AI model that offers high performance at low cost and supports local deployment, making it attractive for privacy‑preserving applications.
1. DeepSeek‑R1 Overview
DeepSeek‑R1 delivers inference capabilities comparable to leading models such as GPT‑4, and its open‑source nature allows customization and optimization for specific use cases. The model can be run locally via Ollama, reducing reliance on cloud services.
2. Ollama – the Local Runtime
Ollama simplifies the deployment of large language models on personal computers and supports Windows, macOS, and Linux. It provides a straightforward command‑line interface for downloading and running models.
3. Environment Setup
Step 1: Install Ollama
Download the appropriate installer from the Ollama website and run it. On Linux you can install with:
curl -fsSL https://ollama.com/install.sh | shStep 2: Verify Installation
Check the version to ensure a successful install:
ollama --versionStep 3: Download DeepSeek‑R1
Run the following command to pull the model (the download time depends on network speed):
ollama run deepseek-r1Step 4: Run DeepSeek‑R1
Start the model with the same command; Ollama will launch the inference service locally.
4. GPU Acceleration
For faster inference you can enable GPU support.
4.1 NVIDIA GPU
Specify the visible devices:
export CUDA_VISIBLE_DEVICES=0,1,2,3Or for a single GPU:
export CUDA_VISIBLE_DEVICES=04.2 AMD GPU
Set the HIP environment variable:
export HIP_VISIBLE_DEVICES=04.3 Hardware Requirements
Model Name
Size
Run Command
Hardware Config
DeepSeek‑R1
671B
ollama run deepseek-r1:671bVery high requirements, >336 GB VRAM
DeepSeek‑R1‑Distill‑Qwen‑1.5B
1.5B
ollama run deepseek-r1:1.5bMinimum 8 GB RAM, no GPU needed
DeepSeek‑R1‑Distill‑Qwen‑7B
7B
ollama run deepseek-r1:7b16 GB RAM, 8 GB VRAM (GPU)
DeepSeek‑R1‑Distill‑Llama‑8B
8B
ollama run deepseek-r1:8b16 GB RAM, 8 GB VRAM (GPU)
DeepSeek‑R1‑Distill‑Qwen‑14B
14B
ollama run deepseek-r1:14b32 GB RAM, 26 GB VRAM (GPU)
DeepSeek‑R1‑Distill‑Qwen‑32B
32B
ollama run deepseek-r1:32b64 GB RAM, 64 GB VRAM (GPU)
DeepSeek‑R1‑Distill‑Llama‑70B
70B
ollama run deepseek-r1:70b128 GB RAM, 140 GB VRAM (GPU)
5. Integrating with Spring Boot and Spring AI
After Ollama and the model are ready, you can call the model from a Spring Boot application.
5.1 Create a Spring Boot project using Spring Initializr and add the spring-ai-ollama-spring-boot-starter dependency:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>5.2 Configure Ollama in application.properties :
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=deepseek-r1:1.5b5.3 Write Java code to call the model . Example test class:
@SpringBootTest(classes = DemoApplication.class)
public class TestOllama {
@Autowired
private OllamaChatModel ollamaChatModel;
@Test
public void testChatModel() {
String prompt = "请将以下英文翻译成中文:";
String message = "Ollama now supports tool calling with popular models such as Llama 3.1.";
String result = ollamaChatModel.call(prompt + " " + message);
System.out.println(result);
}
}5.4 Test result – the API returns a JSON payload, for example:
{
"response": "Ollama现在支持使用如Llama 3.1等流行模型进行工具调用。",
"error": null
}6. Conclusion
By combining Ollama with Spring Boot, developers can quickly set up a local AI inference service based on DeepSeek‑R1, benefit from GPU acceleration when needed, and expose the model through a standard REST API, enabling efficient and privacy‑preserving AI applications.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.