Artificial Intelligence 12 min read

Deploying DeepSeek‑R1 Locally with Ollama and Accessing It via Spring Boot and Spring AI

This guide explains how to install Ollama, download and run the open‑source DeepSeek‑R1 language model locally, configure GPU acceleration, and integrate the model into a Spring Boot application using Spring AI to provide an API service for AI inference.

Architect
Architect
Architect
Deploying DeepSeek‑R1 Locally with Ollama and Accessing It via Spring Boot and Spring AI

With the rapid development of large language models, developers are increasingly interested in running these powerful inference models locally. DeepSeek‑R1 is an open‑source AI model that offers high performance at low cost and supports local deployment, making it attractive for privacy‑preserving applications.

1. DeepSeek‑R1 Overview

DeepSeek‑R1 delivers inference capabilities comparable to leading models such as GPT‑4, and its open‑source nature allows customization and optimization for specific use cases. The model can be run locally via Ollama, reducing reliance on cloud services.

2. Ollama – the Local Runtime

Ollama simplifies the deployment of large language models on personal computers and supports Windows, macOS, and Linux. It provides a straightforward command‑line interface for downloading and running models.

3. Environment Setup

Step 1: Install Ollama

Download the appropriate installer from the Ollama website and run it. On Linux you can install with:

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Verify Installation

Check the version to ensure a successful install:

ollama --version

Step 3: Download DeepSeek‑R1

Run the following command to pull the model (the download time depends on network speed):

ollama run deepseek-r1

Step 4: Run DeepSeek‑R1

Start the model with the same command; Ollama will launch the inference service locally.

4. GPU Acceleration

For faster inference you can enable GPU support.

4.1 NVIDIA GPU

Specify the visible devices:

export CUDA_VISIBLE_DEVICES=0,1,2,3

Or for a single GPU:

export CUDA_VISIBLE_DEVICES=0

4.2 AMD GPU

Set the HIP environment variable:

export HIP_VISIBLE_DEVICES=0

4.3 Hardware Requirements

Model Name

Size

Run Command

Hardware Config

DeepSeek‑R1

671B

ollama run deepseek-r1:671b

Very high requirements, >336 GB VRAM

DeepSeek‑R1‑Distill‑Qwen‑1.5B

1.5B

ollama run deepseek-r1:1.5b

Minimum 8 GB RAM, no GPU needed

DeepSeek‑R1‑Distill‑Qwen‑7B

7B

ollama run deepseek-r1:7b

16 GB RAM, 8 GB VRAM (GPU)

DeepSeek‑R1‑Distill‑Llama‑8B

8B

ollama run deepseek-r1:8b

16 GB RAM, 8 GB VRAM (GPU)

DeepSeek‑R1‑Distill‑Qwen‑14B

14B

ollama run deepseek-r1:14b

32 GB RAM, 26 GB VRAM (GPU)

DeepSeek‑R1‑Distill‑Qwen‑32B

32B

ollama run deepseek-r1:32b

64 GB RAM, 64 GB VRAM (GPU)

DeepSeek‑R1‑Distill‑Llama‑70B

70B

ollama run deepseek-r1:70b

128 GB RAM, 140 GB VRAM (GPU)

5. Integrating with Spring Boot and Spring AI

After Ollama and the model are ready, you can call the model from a Spring Boot application.

5.1 Create a Spring Boot project using Spring Initializr and add the spring-ai-ollama-spring-boot-starter dependency:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>

5.2 Configure Ollama in application.properties :

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=deepseek-r1:1.5b

5.3 Write Java code to call the model . Example test class:

@SpringBootTest(classes = DemoApplication.class)
public class TestOllama {

    @Autowired
    private OllamaChatModel ollamaChatModel;

    @Test
    public void testChatModel() {
        String prompt = "请将以下英文翻译成中文:";
        String message = "Ollama now supports tool calling with popular models such as Llama 3.1.";
        String result = ollamaChatModel.call(prompt + " " + message);
        System.out.println(result);
    }
}

5.4 Test result – the API returns a JSON payload, for example:

{
  "response": "Ollama现在支持使用如Llama 3.1等流行模型进行工具调用。",
  "error": null
}

6. Conclusion

By combining Ollama with Spring Boot, developers can quickly set up a local AI inference service based on DeepSeek‑R1, benefit from GPU acceleration when needed, and expose the model through a standard REST API, enabling efficient and privacy‑preserving AI applications.

GPU AccelerationSpring BootDeepSeek-R1AI model deploymentOllama
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.