Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama
This guide walks through creating a Retrieval‑Augmented Generation (RAG) system using Spring Boot 3.4.2, Milvus vector database, and the bge‑m3 embedding model via Ollama, covering environment setup, dependency configuration, vector store operations, and integration with a large language model to deliver refined, similarity‑based answers.
1. Introduction
1.1 What is RAG?
Retrieval‑Augmented Generation (RAG) combines a large language model (LLM) with an external knowledge base to improve the accuracy and relevance of generated text.
RAG enables the model to retrieve relevant documents from a set of files and incorporate that information into its responses, rather than relying solely on its pre‑trained knowledge.
1.2 What is a vector database?
A vector database stores embeddings (numeric vectors) and performs similarity search instead of exact matching. It allows you to find items with similar semantic meaning, such as images or sentences.
Example: the text "I love Spring full‑stack case source code" is converted by an embedding model into a vector like [0.24, -0.56, 0.89].
1.3 Milvus Overview
Milvus is a popular open‑source vector database. For this tutorial we only need to know how to use it; detailed documentation is available at https://milvus.io/docs/zh .
2. Practical Example
2.1 Environment Preparation
Install Milvus (standalone) using the provided script.
Install the bge‑m3 embedding model with ollama pull bge-m3:latest.
# Download script
$ curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
# Start container
$ bash standalone_embed.sh start2.2 Project Configuration
Add the following Maven dependencies:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-starter</artifactId>
<version>1.0.0-M6.1</version>
</dependency>Key configuration (YAML style):
spring:
ai:
dashscope:
api-key: sk-xxxooo
base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
chat:
options:
model: qwen-turbo
embedding:
enabled: false
---
spring:
ai:
ollama:
chat:
enabled: false
base-url: http://localhost:11111
embedding:
enabled: true
model: bge-m3:latest
---
spring:
ai:
vectorstore:
milvus:
client:
host: localhost
port: 19530
username: root
password: root
initialize-schema: true
embeddingDimension: 10242.3 Vector Store Operations
Service to save documents and perform similarity search:
@Service
public class DocumentService {
private final VectorStore vectorStore;
public DocumentService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
// Save sample texts
public void save() {
List<Document> documents = List.of(
new Document("banana"),
new Document("apple"),
new Document("orange"),
new Document("strawberry"),
new Document("Java"),
new Document("python"),
new Document("C#"),
new Document("tiger"));
this.vectorStore.add(documents);
}
// Similarity search
public List<Document> query(String prompt, int topK) {
SearchRequest request = SearchRequest.builder()
.query(prompt)
.topK(topK)
.build();
return this.vectorStore.similaritySearch(request);
}
}Controller exposing endpoints:
@RestController
@RequestMapping("/rag")
public class RagController {
private final DocumentService documentService;
public RagController(DocumentService documentService) {
this.documentService = documentService;
}
@GetMapping("/save")
public ResponseEntity<String> save() {
this.documentService.save();
return ResponseEntity.ok("success");
}
@GetMapping("/{topK}")
public ResponseEntity<List<Document>> query(@PathVariable Integer topK, String prompt) {
return ResponseEntity.ok(this.documentService.query(prompt, topK));
}
}2.4 Combine with LLM
Configure a ChatClient bean:
@Configuration
public class ChatConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultAdvisors(List.of(new SimpleLoggerAdvisor()))
.build();
}
}Endpoint that retrieves relevant documents, builds a prompt, and calls the LLM:
@GetMapping("/query/{topK}")
public ResponseEntity<String> queryLLM(@PathVariable Integer topK,
@RequestParam String prompt) {
SearchRequest request = SearchRequest.builder()
.query(prompt)
.topK(topK)
.build();
List<Document> docs = this.vectorStore.similaritySearch(request);
PromptTemplate template = new PromptTemplate("{userMessage}
Use the following information to answer the question:
{contents}");
Prompt finalPrompt = template.create(Map.of("userMessage", prompt, "contents", docs));
String result = this.chatClient.prompt(finalPrompt).call().content();
return ResponseEntity.ok(result);
}Running /rag/save stores the sample texts in Milvus; /rag/{topK}?prompt=... performs similarity search; the final LLM step filters and formats the answer.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Spring Full-Stack Practical Cases
Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
