Artificial Intelligence 11 min read

Create a Java RAG System Using DeepSeek R1, Milvus, and Spring

This guide walks through building a Java RAG system with DeepSeek R1, Milvus, and Spring, covering environment setup, vector model integration via OpenAI protocol, Maven dependencies, data embedding, and a chat endpoint that combines semantic retrieval with LLM generation.

Java Architecture Diary

Feb 13, 2025

Create a Java RAG System Using DeepSeek R1, Milvus, and Spring

DeepSeek R1 is popular for its chain‑of‑thought capabilities, but mainstream frameworks like Spring AI lack support, especially for chain content retention and streaming output. deepseek4j 1.4 adds vector model support.

Background

deepseek4j provides a set of powerful APIs covering Reasoner, Function Calling, and JSON parsing, simplifying DeepSeek integration for developers.

DeepSeek does not provide a vector model, so the initial design of this tool did not consider vector search integration.

Current Situation

deepseek4j fully supports DeepSeek's Reasoner, Function Calling, and JSON parsing features.

Demand for private knowledge bases on the R1 model is growing, with many developers wanting to build private knowledge stores on top of DeepSeek.

After thorough technical evaluation, we chose an elegant solution: integrate vector model capabilities by conforming to the OpenAI protocol standard. This approach offers three advantages:

Zero extra dependencies : No new libraries are required, keeping the framework lightweight.

Perfect compatibility : Seamlessly fits the existing architecture and ensures backward compatibility.

Standardized access : Uses the industry‑wide OpenAI protocol, reducing the learning curve.

Quick Start

This article leads you from scratch to build a basic RAG system. By coding in a white‑box manner, you can deeply understand RAG core principles and flexibly adjust each component to suit real‑world needs.

1. Environment Preparation

Before constructing the RAG system, prepare the following environment:

1.1 Ollama Model Preparation

Install Ollama and download the required models:

# Download inference model – for understanding and generating answers
ollama run deepseek-r1:14b

# Download embedding model – for text vectorization
ollama run bge-m3:latest

1.2 Vector Database Preparation

We use Milvus as the vector database. Choose one of the two installation methods:

Use Milvus test environment via Zilliz Cloud (https://cloud.zilliz.com.cn) and obtain connection information.

Docker installation:

# 1. Download install script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh

# 2. Start Docker container
bash standalone_embed.sh start

Note: If you choose the Docker method, ensure your network can access GitHub.

1.3 Project Dependencies

Add the following Maven dependencies to your project:

<dependency>
    <groupId>io.github.pig-mesh.ai</groupId>
    <artifactId>deepseek-spring-boot-starter</artifactId>
    <version>1.4.0</version>
</dependency>
<!-- Milvus SDK -->
<dependency>
    <groupId>io.milvus</groupId>
    <artifactId>milvus-sdk-java</artifactId>
    <version>2.5.3</version>
</dependency>

Application.yml Configuration

# Inference model configuration
deepseek:
  base-url: http://127.0.0.1:11434/v1
  model: deepseek-r1:14b
  api-key: ollama-local
# Embedding model configuration
embedding:
  api-key: ${deepseek.api-key}
  base-url: ${deepseek.base-url}
  model: bge-m3:latest

2. Initialize Private Knowledge

The first step of building a RAG system is converting existing knowledge into vectors and storing them in the vector database.

2.1 Create Connection Client

// Connect to Milvus server
ConnectConfig connectConfig = ConnectConfig.builder()
        .uri(CLUSTER_ENDPOINT) // Milvus endpoint obtained earlier
        .token(TOKEN)          // Milvus token
        .build();

MilvusClientV2 milvusClientV2 = new MilvusClientV2(connectConfig);

2.2 Prepare Data and Upload Vectors

The example below processes plain‑text data. For Office documents, images, PDFs, audio, or video, deepseek4j offers a complete solution (see the office2md project).

@Autowired
EmbeddingClient embeddingClient;

{
    String law = FileUtil.readString("/Users/lengleng/Downloads/law.txt", Charset.defaultCharset());
    String[] lawSplits = StrUtil.split(law, 400);

    List<JsonObject> data = new ArrayList<>();
    for (String lawSplit : lawSplits) {
        List<Float> floatList = embeddingClient.embed(lawSplit);
        JsonObject jsonObject = new JsonObject();
        JsonArray jsonArray = new JsonArray();
        for (Float value : floatList) {
            jsonArray.add(value);
        }
        jsonObject.add("vector", jsonArray);
        jsonObject.addProperty("text", lawSplit);
        data.add(jsonObject);
    }

    InsertReq insertReq = InsertReq.builder()
            .collectionName("deepseek4j_test")
            .data(data)
            .build();

    milvusClientV2.insert(insertReq);
}

3. Create RAG Endpoint

@GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ChatCompletionResponse> chat(String prompt) {
    MilvusClientV2 milvusClientV2 = new MilvusClientV2(connectConfig);
    List<Float> floatList = embeddingClientOptional.get().embed(prompt);

    SearchReq searchReq = SearchReq.builder()
            .collectionName("deepseek4j_test")
            .data(Collections.singletonList(new FloatVec(floatList)))
            .outputFields(Collections.singletonList("text"))
            .topK(3)
            .build();

    SearchResp searchResp = milvusClientV2.search(searchReq);
    List<String> resultList = new ArrayList<>();
    for (List<SearchResp.SearchResult> results : searchResp.getSearchResults()) {
        for (SearchResp.SearchResult result : results) {
            resultList.add(result.getEntity().get("text").toString());
        }
    }

    ChatCompletionRequest request = ChatCompletionRequest.builder()
            .model("deepseek-r1:14b")
            .addUserMessage(String.format("你要根据用户输入的问题：%s 
 
 参考如下内容： %s 

 整理处理最终结果", prompt, resultList))
            .build();

    return deepSeekClient.chatFluxCompletion(request);
}

Frontend Test

Conclusion

The article quickly builds a basic RAG system through the following core steps:

Environment preparation: deploy inference and vector models.

Knowledge base construction: vectorize and store data.

Retrieval augmentation: fetch relevant knowledge via semantic search.

Inference generation: combine context with LLM to produce final answers.

To make the RAG system production‑ready, each stage requires further optimization:

Retrieval strategy: combine keyword and semantic search for higher recall.

Re‑ranking: secondary sorting to ensure the most relevant results appear first.

Prompt engineering: refine prompt templates for more accurate model outputs.

Knowledge base management: regularly update and maintain data freshness.

Performance tuning: optimize vector search and model inference latency.

Official documentation: https://javaai.pig4cloud.com/deepseek

Issue feedback: https://github.com/pig-mesh/deepseek4j

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG Spring Boot Milvus vector search DeepSeek AI integration

Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.