Building a Private Document Vector Search with SpringBoot, LangChain4j, and Ollama RAG
This guide walks through why Retrieval‑Augmented Generation (RAG) is needed for large language models, explains the three‑step indexing and query workflow, details LangChain4j’s core components, and provides a complete SpringBoot example—including Maven setup, configuration, service code, and troubleshooting—to create a private document‑vector search system powered by Ollama.
Large language models (LLMs) suffer from two main drawbacks: knowledge becomes outdated after training and the models may hallucinate when uncertain, especially on proprietary data. Retrieval‑Augmented Generation (RAG) addresses these issues by first retrieving relevant information from a knowledge base and then generating answers based on that factual context.
RAG Core Process
RAG consists of two phases: indexing and retrieval‑generation. The indexing phase transforms raw documents into searchable vectors through the pipeline Load → Parse → Split → Embed → Store. Loading reads files (PDF, Word, TXT, web pages); parsing extracts plain text; splitting cuts long texts into short TextSegment chunks to fit the model’s context window; embedding converts each chunk into a high‑dimensional vector (e.g., 768‑dim float array); storing saves vectors and original text in a vector database.
During query time, the user question is also embedded, the most similar K segments are retrieved (e.g., by cosine similarity), combined with the original question to form an augmented prompt, and finally sent to the LLM for answer generation.
Key LangChain4j Components for RAG
Document : represents a raw file with text and metadata.
DocumentParser : parses specific formats such as PDF or DOCX.
DocumentSplitter : splits a document into TextSegment chunks.
EmbeddingModel : turns text into vectors.
EmbeddingStore : abstract interface for a vector database.
EmbeddingStoreIngestor : automates Split → Embed → Store.
ContentRetriever : fetches relevant segments for a query.
RetrievalAugmentor : injects retrieved content into the AI request.
AiServices : builds the AI service with RAG capabilities.
Implementation Steps
1. Maven Dependencies (pom.xml)
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.5</version>
</parent>
<groupId>com.example</groupId>
<artifactId>spring-langchain4j-ollama-rag</artifactId>
<version>1.0</version>
<properties>
<java.version>17</java.version>
<langchain4j.version>1.0.0-beta4</langchain4j.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-bom</artifactId>
<version>${langchain4j.version}</version>
</dependency>
</dependencies>
</dependencyManagement>2. Pull the Embedding Model
ollama pull nomic-embed-text3. Application Configuration (application.yml)
langchain4j:
ollama:
chat-model:
base-url: http://localhost:11434
model-name: qwen2:7b
temperature: 0.7
timeout: PT600S # 10 minutes
connect-timeout: PT300S # 5 minutes
read-timeout: PT300S # 5 minutes
log-requests: true
log-responses: true
embedding-model:
base-url: http://localhost:11434
model-name: nomic-embed-textThe chat model generates final answers, while the embedding model converts documents and queries into vectors; they can be different because embedding models are usually lighter.
4. Document Indexing Service
package com.badao.ai.service;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import jakarta.annotation.PostConstruct;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.ClassPathResource;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
@Service
public class DocumentIndexingService {
@Autowired
private EmbeddingModel embeddingModel;
@Autowired
private EmbeddingStore<TextSegment> embeddingStore;
@PostConstruct
public void init() throws IOException {
ClassPathResource resource = new ClassPathResource("knowledge/knowledge.txt");
Path docPath = resource.getFile().toPath();
Document document = FileSystemDocumentLoader.loadDocument(docPath);
List<Document> documents = List.of(document);
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
ingestor.ingest(documents);
System.out.println("Document indexed, processed " + documents.size() + " files.");
}
}The ingestor encapsulates the full Split → Embed → Store flow; TextSegment.from(document, 500, 50) would split a document into 500‑character chunks with a 50‑character overlap to avoid losing boundary information.
5. AI Service Configuration with Retrieval Augmentation
package com.badao.ai.config;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.data.segment.TextSegment;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class RagAiConfig {
@Bean
public RAGAssistant ragAssistant(ChatModel chatModel,
EmbeddingStore<TextSegment> embeddingStore,
EmbeddingModel embeddingModel) {
var contentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(3) // return up to 3 relevant chunks
.minScore(0.7) // minimum similarity threshold
.build();
var retrievalAugmentor = DefaultRetrievalAugmentor.builder()
.contentRetriever(contentRetriever)
.build();
return AiServices.builder(RAGAssistant.class)
.chatModel(chatModel)
.retrievalAugmentor(retrievalAugmentor)
.build();
}
}Key components: EmbeddingStoreContentRetriever fetches similar segments; DefaultRetrievalAugmentor inserts them into the prompt; AiServices.retrievalAugmentor() binds the augmentor to the AI service.
6. Assistant Interface
package com.badao.ai.service;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
public interface RAGAssistant {
@SystemMessage("You are a knowledge‑base assistant. Answer based on the provided context. If the answer is not found, state that clearly.")
String chat(@UserMessage String userMessage);
}7. REST Controller
package com.badao.ai.controller;
import com.badao.ai.service.RAGAssistant;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/rag")
public class RAGController {
private final RAGAssistant ragAssistant;
public RAGController(RAGAssistant ragAssistant) {
this.ragAssistant = ragAssistant;
}
@GetMapping("/ask")
public String ask(@RequestParam String question) {
return ragAssistant.chat(question);
}
}8. Testing the System
Create knowledge.txt under src/main/resources/knowledge with sample policies, e.g., work hours and annual leave rules. Start the SpringBoot application and call:
http://localhost:885/rag/ask?question=员工几点下班?The LLM returns the answer based on the indexed policy.
Common Issues and Solutions
Local embedding model fails to load (AccessDeniedException): switch to Ollama’s nomic-embed-text model to avoid DLL problems.
Irrelevant retrieval results: adjust chunk size or try a different embedding model such as bge-m3 or all-MiniLM-L6-v2.
LLM ignores retrieved context: add explicit instructions in the @SystemMessage to “answer based on the provided material”.
Slow responses: use a faster embedding model, enable GPU, or cache frequent retrieval results.
Data loss after restart: switch from in‑memory storage to a persistent vector database like Redis or Chroma.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
The Dominant Programmer
Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
