Artificial Intelligence 13 min read

Building a Private Document Vector Search with SpringBoot, LangChain4j, and Ollama RAG

This guide walks through why Retrieval‑Augmented Generation (RAG) is needed for large language models, explains the three‑step indexing and query workflow, details LangChain4j’s core components, and provides a complete SpringBoot example—including Maven setup, configuration, service code, and troubleshooting—to create a private document‑vector search system powered by Ollama.

The Dominant Programmer

Apr 27, 2026

Building a Private Document Vector Search with SpringBoot, LangChain4j, and Ollama RAG

Large language models (LLMs) suffer from two main drawbacks: knowledge becomes outdated after training and the models may hallucinate when uncertain, especially on proprietary data. Retrieval‑Augmented Generation (RAG) addresses these issues by first retrieving relevant information from a knowledge base and then generating answers based on that factual context.

RAG Core Process

RAG consists of two phases: indexing and retrieval‑generation. The indexing phase transforms raw documents into searchable vectors through the pipeline Load → Parse → Split → Embed → Store. Loading reads files (PDF, Word, TXT, web pages); parsing extracts plain text; splitting cuts long texts into short TextSegment chunks to fit the model’s context window; embedding converts each chunk into a high‑dimensional vector (e.g., 768‑dim float array); storing saves vectors and original text in a vector database.

During query time, the user question is also embedded, the most similar K segments are retrieved (e.g., by cosine similarity), combined with the original question to form an augmented prompt, and finally sent to the LLM for answer generation.

Key LangChain4j Components for RAG

Document : represents a raw file with text and metadata.

DocumentParser : parses specific formats such as PDF or DOCX.

DocumentSplitter : splits a document into TextSegment chunks.

EmbeddingModel : turns text into vectors.

EmbeddingStore : abstract interface for a vector database.

EmbeddingStoreIngestor : automates Split → Embed → Store.

ContentRetriever : fetches relevant segments for a query.

RetrievalAugmentor : injects retrieved content into the AI request.

AiServices : builds the AI service with RAG capabilities.

Implementation Steps

1. Maven Dependencies (pom.xml)

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.2.5</version>
</parent>
<groupId>com.example</groupId>
<artifactId>spring-langchain4j-ollama-rag</artifactId>
<version>1.0</version>
<properties>
    <java.version>17</java.version>
    <langchain4j.version>1.0.0-beta4</langchain4j.version>
</properties>
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-bom</artifactId>
            <version>${langchain4j.version}</version>
        </dependency>
    </dependencies>
</dependencyManagement>

2. Pull the Embedding Model

ollama pull nomic-embed-text

3. Application Configuration (application.yml)

langchain4j:
  ollama:
    chat-model:
      base-url: http://localhost:11434
      model-name: qwen2:7b
      temperature: 0.7
      timeout: PT600S   # 10 minutes
      connect-timeout: PT300S   # 5 minutes
      read-timeout: PT300S   # 5 minutes
      log-requests: true
      log-responses: true
    embedding-model:
      base-url: http://localhost:11434
      model-name: nomic-embed-text

The chat model generates final answers, while the embedding model converts documents and queries into vectors; they can be different because embedding models are usually lighter.

4. Document Indexing Service

package com.badao.ai.service;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import jakarta.annotation.PostConstruct;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.ClassPathResource;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.file.Path;
import java.util.List;

@Service
public class DocumentIndexingService {
    @Autowired
    private EmbeddingModel embeddingModel;
    @Autowired
    private EmbeddingStore<TextSegment> embeddingStore;
    @PostConstruct
    public void init() throws IOException {
        ClassPathResource resource = new ClassPathResource("knowledge/knowledge.txt");
        Path docPath = resource.getFile().toPath();
        Document document = FileSystemDocumentLoader.loadDocument(docPath);
        List<Document> documents = List.of(document);
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(documents);
        System.out.println("Document indexed, processed " + documents.size() + " files.");
    }
}

The ingestor encapsulates the full Split → Embed → Store flow; TextSegment.from(document, 500, 50) would split a document into 500‑character chunks with a 50‑character overlap to avoid losing boundary information.

5. AI Service Configuration with Retrieval Augmentation

package com.badao.ai.config;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.data.segment.TextSegment;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class RagAiConfig {
    @Bean
    public RAGAssistant ragAssistant(ChatModel chatModel,
                                      EmbeddingStore<TextSegment> embeddingStore,
                                      EmbeddingModel embeddingModel) {
        var contentRetriever = EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                .maxResults(3)   // return up to 3 relevant chunks
                .minScore(0.7)   // minimum similarity threshold
                .build();
        var retrievalAugmentor = DefaultRetrievalAugmentor.builder()
                .contentRetriever(contentRetriever)
                .build();
        return AiServices.builder(RAGAssistant.class)
                .chatModel(chatModel)
                .retrievalAugmentor(retrievalAugmentor)
                .build();
    }
}

Key components: EmbeddingStoreContentRetriever fetches similar segments; DefaultRetrievalAugmentor inserts them into the prompt; AiServices.retrievalAugmentor() binds the augmentor to the AI service.

6. Assistant Interface

package com.badao.ai.service;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;

public interface RAGAssistant {
    @SystemMessage("You are a knowledge‑base assistant. Answer based on the provided context. If the answer is not found, state that clearly.")
    String chat(@UserMessage String userMessage);
}

7. REST Controller

package com.badao.ai.controller;
import com.badao.ai.service.RAGAssistant;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/rag")
public class RAGController {
    private final RAGAssistant ragAssistant;
    public RAGController(RAGAssistant ragAssistant) {
        this.ragAssistant = ragAssistant;
    }
    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return ragAssistant.chat(question);
    }
}

8. Testing the System

Create knowledge.txt under src/main/resources/knowledge with sample policies, e.g., work hours and annual leave rules. Start the SpringBoot application and call:

http://localhost:885/rag/ask?question=员工几点下班？

The LLM returns the answer based on the indexed policy.

Common Issues and Solutions

Local embedding model fails to load (AccessDeniedException): switch to Ollama’s nomic-embed-text model to avoid DLL problems.

Irrelevant retrieval results: adjust chunk size or try a different embedding model such as bge-m3 or all-MiniLM-L6-v2.

LLM ignores retrieved context: add explicit instructions in the @SystemMessage to “answer based on the provided material”.

Slow responses: use a faster embedding model, enable GPU, or cache frequent retrieval results.

Data loss after restart: switch from in‑memory storage to a persistent vector database like Redis or Chroma.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG Vector Search Embedding SpringBoot Ollama LangChain4j

Written by

The Dominant Programmer

Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.