Create a Java RAG System Using DeepSeek R1, Milvus, and Spring
This guide walks through building a Java RAG system with DeepSeek R1, Milvus, and Spring, covering environment setup, vector model integration via OpenAI protocol, Maven dependencies, data embedding, and a chat endpoint that combines semantic retrieval with LLM generation.
DeepSeek R1 is popular for its chain‑of‑thought capabilities, but mainstream frameworks like Spring AI lack support, especially for chain content retention and streaming output. deepseek4j 1.4 adds vector model support.
Background
deepseek4j provides a set of powerful APIs covering Reasoner, Function Calling, and JSON parsing, simplifying DeepSeek integration for developers.
DeepSeek does not provide a vector model, so the initial design of this tool did not consider vector search integration.
Current Situation
deepseek4j fully supports DeepSeek's Reasoner, Function Calling, and JSON parsing features.
Demand for private knowledge bases on the R1 model is growing, with many developers wanting to build private knowledge stores on top of DeepSeek.
After thorough technical evaluation, we chose an elegant solution: integrate vector model capabilities by conforming to the OpenAI protocol standard. This approach offers three advantages:
Zero extra dependencies : No new libraries are required, keeping the framework lightweight.
Perfect compatibility : Seamlessly fits the existing architecture and ensures backward compatibility.
Standardized access : Uses the industry‑wide OpenAI protocol, reducing the learning curve.
Quick Start
This article leads you from scratch to build a basic RAG system. By coding in a white‑box manner, you can deeply understand RAG core principles and flexibly adjust each component to suit real‑world needs.
1. Environment Preparation
Before constructing the RAG system, prepare the following environment:
1.1 Ollama Model Preparation
Install Ollama and download the required models:
<code># Download inference model – for understanding and generating answers
ollama run deepseek-r1:14b
# Download embedding model – for text vectorization
ollama run bge-m3:latest
</code>1.2 Vector Database Preparation
We use Milvus as the vector database. Choose one of the two installation methods:
Use Milvus test environment via Zilliz Cloud (https://cloud.zilliz.com.cn) and obtain connection information.
Docker installation:
<code># 1. Download install script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
# 2. Start Docker container
bash standalone_embed.sh start
</code>Note: If you choose the Docker method, ensure your network can access GitHub.
1.3 Project Dependencies
Add the following Maven dependencies to your project:
<code><dependency>
<groupId>io.github.pig-mesh.ai</groupId>
<artifactId>deepseek-spring-boot-starter</artifactId>
<version>1.4.0</version>
</dependency>
<!-- Milvus SDK -->
<dependency>
<groupId>io.milvus</groupId>
<artifactId>milvus-sdk-java</artifactId>
<version>2.5.3</version>
</dependency>
</code>Application.yml Configuration
<code># Inference model configuration
deepseek:
base-url: http://127.0.0.1:11434/v1
model: deepseek-r1:14b
api-key: ollama-local
# Embedding model configuration
embedding:
api-key: ${deepseek.api-key}
base-url: ${deepseek.base-url}
model: bge-m3:latest
</code>2. Initialize Private Knowledge
The first step of building a RAG system is converting existing knowledge into vectors and storing them in the vector database.
2.1 Create Connection Client
<code>// Connect to Milvus server
ConnectConfig connectConfig = ConnectConfig.builder()
.uri(CLUSTER_ENDPOINT) // Milvus endpoint obtained earlier
.token(TOKEN) // Milvus token
.build();
MilvusClientV2 milvusClientV2 = new MilvusClientV2(connectConfig);
</code>2.2 Prepare Data and Upload Vectors
The example below processes plain‑text data. For Office documents, images, PDFs, audio, or video, deepseek4j offers a complete solution (see the office2md project).
<code>@Autowired
EmbeddingClient embeddingClient;
{
String law = FileUtil.readString("/Users/lengleng/Downloads/law.txt", Charset.defaultCharset());
String[] lawSplits = StrUtil.split(law, 400);
List<JsonObject> data = new ArrayList<>();
for (String lawSplit : lawSplits) {
List<Float> floatList = embeddingClient.embed(lawSplit);
JsonObject jsonObject = new JsonObject();
JsonArray jsonArray = new JsonArray();
for (Float value : floatList) {
jsonArray.add(value);
}
jsonObject.add("vector", jsonArray);
jsonObject.addProperty("text", lawSplit);
data.add(jsonObject);
}
InsertReq insertReq = InsertReq.builder()
.collectionName("deepseek4j_test")
.data(data)
.build();
milvusClientV2.insert(insertReq);
}
</code>3. Create RAG Endpoint
<code>@GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ChatCompletionResponse> chat(String prompt) {
MilvusClientV2 milvusClientV2 = new MilvusClientV2(connectConfig);
List<Float> floatList = embeddingClientOptional.get().embed(prompt);
SearchReq searchReq = SearchReq.builder()
.collectionName("deepseek4j_test")
.data(Collections.singletonList(new FloatVec(floatList)))
.outputFields(Collections.singletonList("text"))
.topK(3)
.build();
SearchResp searchResp = milvusClientV2.search(searchReq);
List<String> resultList = new ArrayList<>();
for (List<SearchResp.SearchResult> results : searchResp.getSearchResults()) {
for (SearchResp.SearchResult result : results) {
resultList.add(result.getEntity().get("text").toString());
}
}
ChatCompletionRequest request = ChatCompletionRequest.builder()
.model("deepseek-r1:14b")
.addUserMessage(String.format("你要根据用户输入的问题:%s \n \n 参考如下内容: %s \n\n 整理处理最终结果", prompt, resultList))
.build();
return deepSeekClient.chatFluxCompletion(request);
}
</code>Frontend Test
Conclusion
The article quickly builds a basic RAG system through the following core steps:
Environment preparation: deploy inference and vector models.
Knowledge base construction: vectorize and store data.
Retrieval augmentation: fetch relevant knowledge via semantic search.
Inference generation: combine context with LLM to produce final answers.
To make the RAG system production‑ready, each stage requires further optimization:
Retrieval strategy: combine keyword and semantic search for higher recall.
Re‑ranking: secondary sorting to ensure the most relevant results appear first.
Prompt engineering: refine prompt templates for more accurate model outputs.
Knowledge base management: regularly update and maintain data freshness.
Performance tuning: optimize vector search and model inference latency.
Official documentation: https://javaai.pig4cloud.com/deepseek
Issue feedback: https://github.com/pig-mesh/deepseek4j
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.