Artificial Intelligence 11 min read

Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama

This guide walks through creating a Retrieval‑Augmented Generation (RAG) system using Spring Boot 3.4.2, Milvus vector database, and the bge‑m3 embedding model via Ollama, covering environment setup, dependency configuration, vector store operations, and integration with a large language model to deliver refined, similarity‑based answers.

Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama

1. Introduction

1.1 What is RAG?

Retrieval‑Augmented Generation (RAG) combines a large language model (LLM) with an external knowledge base to improve the accuracy and relevance of generated text.

RAG enables the model to retrieve relevant documents from a set of files and incorporate that information into its responses, rather than relying solely on its pre‑trained knowledge.

1.2 What is a vector database?

A vector database stores embeddings (numeric vectors) and performs similarity search instead of exact matching. It allows you to find items with similar semantic meaning, such as images or sentences.

Example: the text "I love Spring full‑stack case source code" is converted by an embedding model into a vector like [0.24, -0.56, 0.89].

1.3 Milvus Overview

Milvus is a popular open‑source vector database. For this tutorial we only need to know how to use it; detailed documentation is available at https://milvus.io/docs/zh .

2. Practical Example

2.1 Environment Preparation

Install Milvus (standalone) using the provided script.

Install the bge‑m3 embedding model with ollama pull bge-m3:latest .

<code># Download script
$ curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
# Start container
$ bash standalone_embed.sh start
</code>

2.2 Project Configuration

Add the following Maven dependencies:

<code>&lt;dependency&gt;
  &lt;groupId&gt;org.springframework.ai&lt;/groupId&gt;
  &lt;artifactId&gt;spring-ai-milvus-store-spring-boot-starter&lt;/artifactId&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
  &lt;groupId&gt;org.springframework.ai&lt;/groupId&gt;
  &lt;artifactId&gt;spring-ai-ollama-spring-boot-starter&lt;/artifactId&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
  &lt;groupId&gt;com.alibaba.cloud.ai&lt;/groupId&gt;
  &lt;artifactId&gt;spring-ai-alibaba-starter&lt;/artifactId&gt;
  &lt;version&gt;1.0.0-M6.1&lt;/version&gt;
&lt;/dependency&gt;
</code>

Key configuration (YAML style):

<code>spring:
  ai:
    dashscope:
      api-key: sk-xxxooo
      base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
      chat:
        options:
          model: qwen-turbo
      embedding:
        enabled: false
---
spring:
  ai:
    ollama:
      chat:
        enabled: false
      base-url: http://localhost:11111
      embedding:
        enabled: true
        model: bge-m3:latest
---
spring:
  ai:
    vectorstore:
      milvus:
        client:
          host: localhost
          port: 19530
          username: root
          password: root
        initialize-schema: true
        embeddingDimension: 1024
</code>

2.3 Vector Store Operations

Service to save documents and perform similarity search:

<code>@Service
public class DocumentService {
  private final VectorStore vectorStore;
  public DocumentService(VectorStore vectorStore) {
    this.vectorStore = vectorStore;
  }
  // Save sample texts
  public void save() {
    List<Document> documents = List.of(
        new Document("banana"),
        new Document("apple"),
        new Document("orange"),
        new Document("strawberry"),
        new Document("Java"),
        new Document("python"),
        new Document("C#"),
        new Document("tiger"));
    this.vectorStore.add(documents);
  }
  // Similarity search
  public List<Document> query(String prompt, int topK) {
    SearchRequest request = SearchRequest.builder()
        .query(prompt)
        .topK(topK)
        .build();
    return this.vectorStore.similaritySearch(request);
  }
}
</code>

Controller exposing endpoints:

<code>@RestController
@RequestMapping("/rag")
public class RagController {
  private final DocumentService documentService;
  public RagController(DocumentService documentService) {
    this.documentService = documentService;
  }
  @GetMapping("/save")
  public ResponseEntity<String> save() {
    this.documentService.save();
    return ResponseEntity.ok("success");
  }
  @GetMapping("/{topK}")
  public ResponseEntity<List<Document>> query(@PathVariable Integer topK, String prompt) {
    return ResponseEntity.ok(this.documentService.query(prompt, topK));
  }
}
</code>

2.4 Combine with LLM

Configure a ChatClient bean:

<code>@Configuration
public class ChatConfig {
  @Bean
  ChatClient chatClient(ChatClient.Builder builder) {
    return builder.defaultAdvisors(List.of(new SimpleLoggerAdvisor()))
                  .build();
  }
}
</code>

Endpoint that retrieves relevant documents, builds a prompt, and calls the LLM:

<code>@GetMapping("/query/{topK}")
public ResponseEntity<String> queryLLM(@PathVariable Integer topK,
                                        @RequestParam String prompt) {
  SearchRequest request = SearchRequest.builder()
      .query(prompt)
      .topK(topK)
      .build();
  List<Document> docs = this.vectorStore.similaritySearch(request);
  PromptTemplate template = new PromptTemplate("{userMessage}\n\n Use the following information to answer the question:\n {contents}");
  Prompt finalPrompt = template.create(Map.of("userMessage", prompt, "contents", docs));
  String result = this.chatClient.prompt(finalPrompt).call().content();
  return ResponseEntity.ok(result);
}
</code>

Running /rag/save stores the sample texts in Milvus; /rag/{topK}?prompt=... performs similarity search; the final LLM step filters and formats the answer.

Spring AI RAG
Spring AI RAG
Milvus data view
Milvus data view
LLM prompt result
LLM prompt result
LLMRAGvector databaseSpring BootMilvusEmbedding
Spring Full-Stack Practical Cases
Written by

Spring Full-Stack Practical Cases

Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.