Spring AI RAG: Concepts, Hands‑On Implementation, and Full Code
This article explains the limitations of large language models, introduces Retrieval‑Augmented Generation (RAG) and its four‑step workflow, details Spring AI's RAG components and vector‑store options, and provides complete, runnable Java code—including Maven, configuration, and service classes—to build a local knowledge‑base Q&A system.
RAG Overview
Large language models (LLMs) suffer from knowledge staleness, hallucinations, and limited domain expertise. Retrieval‑Augmented Generation (RAG) mitigates these problems by first retrieving relevant information from an external knowledge source and then feeding that context to the LLM for answer generation.
Four‑step RAG workflow
Ingestion : Load raw documents (PDF, TXT, etc.) and split them into small chunks suitable for embedding. The example uses DocumentReader (specifically TikaDocumentReader) and TokenTextSplitter.
Embedding & Store : Convert each chunk into a high‑dimensional vector with an EmbeddingModel and store the vectors in a vector database. The demo uses SimpleVectorStore (in‑memory) but mentions persistent alternatives such as PgVector, Elasticsearch, Milvus, Weaviate, and Chroma.
Retrieval : When a user asks a question, the query is embedded and a similarity search returns the most relevant chunks.
Generation : Retrieved chunks are injected into the prompt (via ChatClient and a PromptTemplate) and the LLM generates a precise answer.
Vector store semantics
Vector stores enable semantic search; for example, the vectors for “Apple phone” and “iPhone” are close, allowing matching despite different keywords. Spring AI abstracts access through the VectorStore interface.
Key Spring AI components
Document– represents a raw document and its metadata (interface org.springframework.ai.document.Document). DocumentReader – loads documents from the file system (implementations: JsonReader, TextReader, PagePdfDocumentReader, TikaDocumentReader). TextSplitter – splits long text into chunks (implementation: TokenTextSplitter). EmbeddingModel – transforms text chunks into vectors (provided by Ollama, e.g., nomic-embed-text). VectorStore – stores and retrieves vectors (implementations: SimpleVectorStore, ElasticsearchVectorStore, etc.). QuestionAnswerAdvisor – intercepts user requests, performs retrieval, and injects context before generation (built with QuestionAnswerAdvisor.builder(vectorStore)).
Maven dependencies (Spring AI 1.1.2)
<properties>
<java.version>17</java.version>
<spring-ai.version>1.1.2</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
</dependencies>VectorStoreConfig – document loading & vector store initialization
package com.badao.ai.config;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import java.util.List;
@Configuration
public class VectorStoreConfig {
private static final Logger logger = LoggerFactory.getLogger(VectorStoreConfig.class);
@Value("classpath:knowledge-base/badao-internal.txt")
private Resource knowledgeResource;
@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
// In‑memory store for demo purposes
return SimpleVectorStore.builder(embeddingModel).build();
}
@Bean
public CommandLineRunner loadDocuments(VectorStore vectorStore) {
return args -> {
// 1. Load documents (auto‑detect format)
DocumentReader reader = new TikaDocumentReader(knowledgeResource);
List<Document> documents = reader.get();
logger.info("Loaded {} documents", documents.size());
// 2. Split into chunks (max 300 tokens, min 50 chars, keep separator)
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(300)
.withMinChunkSizeChars(50)
.withMinChunkLengthToEmbed(5)
.withKeepSeparator(true)
.build();
List<Document> chunks = splitter.apply(documents);
logger.info("Split into {} chunks", chunks.size());
// 3. Vectorize and store
vectorStore.add(chunks);
logger.info("Vector store initialized");
};
}
}RagConfig – registering the RAG advisor
package com.badao.ai.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class RagConfig {
@Bean
public ChatClient chatClient(ChatModel chatModel, VectorStore vectorStore) {
return ChatClient.builder(chatModel)
.defaultAdvisors(
QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder()
.similarityThreshold(0.7)
.topK(3)
.build())
.build())
.build();
}
}Service and controller layers
package com.badao.ai.service;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
@Service
public class RagService {
private final ChatClient chatClient;
public RagService(ChatClient chatClient) { this.chatClient = chatClient; }
public String ask(String question) {
return chatClient.prompt()
.user(question)
.call()
.content();
}
} package com.badao.ai.controller;
import com.badao.ai.service.RagService;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api")
public class RagController {
private final RagService ragService;
public RagController(RagService ragService) { this.ragService = ragService; }
@PostMapping("/rag")
public ChatResponse rag(@RequestBody ChatRequest request) {
String result = ragService.ask(request.message());
return new ChatResponse(200, "success", result);
}
public record ChatRequest(String message) {}
public record ChatResponse(int code, String msg, String data) {}
}Application configuration (application.yml)
server:
port: 886
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
model: qwen2.5:7b-instruct
options:
temperature: 0.3
embedding:
model: nomic-embed-text
options:
num-batch: 4
logging:
level:
org.springframework.ai.rag: DEBUG
org.springframework.ai.vectorstore: DEBUGEmbedding model download
ollama pull nomic-embed-textModel selection notes
Chat model qwen2.5:7b-instruct supports tool calling and strong Chinese capability; alternatives include deepseek-r1:8b and llama3.1:8b. Embedding model nomic-embed-text produces 768‑dimensional vectors, is free, and works well; alternatives such as bge-m3 (1024‑dim) or mxbai-embed-large (1024‑dim) can be used if the vector store dimensions are adjusted accordingly. Both chat and embedding models must be present in the local Ollama repository, and the embedding dimension must match the vector store configuration.
ETL model explanation
The RAG pipeline follows an Extract‑Transform‑Load (ETL) pattern: Extract reads documents from the knowledge base; Transform splits them into chunks and converts each chunk to a vector via the embedding model; Load writes the vectors into the vector database. These steps are typically executed at application startup. At runtime, a user query is embedded, the vector store is searched, and the retrieved documents are injected into the LLM prompt.
Common optimization strategies
Increase similarity threshold – filters out less‑relevant documents (e.g., similarityThreshold(0.8)).
Decrease similarity threshold – retrieves more candidates for large corpora (e.g., similarityThreshold(0.5)).
Control return count – set topK to avoid exceeding the model’s context window (e.g., topK(3)).
Chunk size & overlap – balance precision and recall (e.g., chunkSize(300), chunkOverlap(50)).
Dynamic filtering – filter by metadata such as type or date (e.g.,
.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'manual'")).
Custom prompt template – control how context and question are concatenated using placeholders {query} and {question_answer_context}.
Pipeline summary
Document loading – TikaDocumentReader auto‑detects PDF, Word, TXT, etc.
Text chunking – TokenTextSplitter with chunkSize(300), chunkOverlap(30).
Embedding – Ollama nomic-embed-text (768‑dim).
Vector store – SimpleVectorStore for demo; replace with PgVector, Elasticsearch, etc., for production.
Retrieval + generation – QuestionAnswerAdvisor with similarityThreshold and topK to control quality.
Enhanced generation – ChatClient automatically injects retrieved chunks into the prompt.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
The Dominant Programmer
Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
