Artificial Intelligence 12 min read

Java RAG Tutorial: Vector Search and Knowledge‑Base Integration

This article explains how to equip a Java application with Retrieval‑Augmented Generation (RAG) so large language models can access private PDFs, Word files, and internal documents, covering the core architecture, two implementation paths using LangChain4j and Spring AI, vector‑store options, and practical tuning techniques.

Coder Trainee

Jun 20, 2026

Java RAG Tutorial: Vector Search and Knowledge‑Base Integration

Introduction

The previous two installments demonstrated basic and tool calls with Spring AI, but large language models inherently cannot read private data. This article introduces Retrieval‑Augmented Generation (RAG) to give Java applications an "external brain" that can understand PDFs, Word files, and internal documents.

Core RAG Architecture

The fundamental idea of RAG is retrieve first, generate later . The complete pipeline is illustrated below:

┌─────────────────────────────────────────────────────────────────┐
│                     RAG 完整流水线                              │
├─────────────────────────────────────────────────────────────────┤
│                                                               │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐       │
│   │   文档加载   │ → │   文档分块   │ → │   向量化    │       │
│   │   PDF/Word   │   │   Chunking   │   │ Embedding   │       │
│   └─────────────┘   └─────────────┘   └──────┬──────┘       │
│                                 │                │       │
│                                 ▼                │       │
│                           ┌─────────────┐          │       │
│                           │  向量数据库  │          │       │
│                           │   Chroma/MV │          │       │
│                           └──────┬──────┘          │       │
│                                 │                │       │
│   用户问题 ──► 向量化 ──► 相似度检索 ──► 上下文注入 ──► LLM 生成 │
└─────────────────────────────────────────────────────────────────┘

Option 1 – Quick Start with LangChain4j

LangChain4j is the most mature RAG framework in the Java ecosystem, offering rich features and a declarative programming experience.

2.1 Core Dependencies

<!-- pom.xml -->
<properties>
  <langchain4j.version>0.35.0</langchain4j.version>
</properties>

<dependencies>
  <!-- LangChain4j core -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>

  <!-- OpenAI integration -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>

  <!-- In‑memory vector store (development) -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>

  <!-- Document parser for PDF/Word -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-document-parser-apache-poi</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>
</dependencies>

2.2 Document Ingestion Pipeline

// config/DocumentIngestionConfig.java
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.nio.file.Paths;
import java.util.List;

@Configuration
public class DocumentIngestionConfig {

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() {
        // In‑memory store for development; replace with Chroma, PgVector, etc. in production
        return new InMemoryEmbeddingStore<>();
    }

    @Bean
    public EmbeddingStoreIngestor embeddingStoreIngestor(
            EmbeddingStore<TextSegment> embeddingStore,
            EmbeddingModel embeddingModel) {
        return EmbeddingStoreIngestor.builder()
                .documentSplitter(DocumentSplitters.recursive(500, 100))
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
    }

    @Bean
    public Boolean loadKnowledgeBase(EmbeddingStoreIngestor ingestor,
                                    @Value("${app.knowledge-base-path:./knowledge}") String path) {
        // Load all documents from the directory
        List<Document> documents = FileSystemDocumentLoader.loadDocuments(
                Paths.get(path), new TextDocumentParser());
        ingestor.ingest(documents);
        System.out.println("✅ 知识库加载完成，共 " + documents.size() + " 个文档");
        return true;
    }
}

2.3 Declarative AI Service

// service/KnowledgeBaseAssistant.java
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

public interface KnowledgeBaseAssistant {

    @SystemMessage("""
        你是一个专业的客服助手，专门回答关于公司产品和服务的问题。
        请严格根据提供的资料回答。如果资料中没有相关信息，请明确告知无法回答。
        回答要友好、简洁、准确。
        """)
    String answerQuestion(@UserMessage @V("question") String question);
}

2.4 Assemble the AI Service (Inject RAG Capability)

// config/AIConfig.java
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class AIConfig {

    @Bean
    public KnowledgeBaseAssistant knowledgeBaseAssistant(
            ChatLanguageModel chatLanguageModel,
            EmbeddingStore<TextSegment> embeddingStore,
            EmbeddingModel embeddingModel) {

        EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                .maxResults(3)
                .build();

        return AiServices.builder(KnowledgeBaseAssistant.class)
                .chatLanguageModel(chatLanguageModel)
                .contentRetriever(retriever)
                .build();
    }
}

2.5 Expose a REST API

// controller/AssistantController.java
import org.springframework.web.bind.annotation.*;
import java.util.Map;

@RestController
@RequestMapping("/api/assistant")
public class AssistantController {

    private final KnowledgeBaseAssistant assistant;

    public AssistantController(KnowledgeBaseAssistant assistant) {
        this.assistant = assistant;
    }

    @PostMapping("/ask")
    public Map<String, String> askQuestion(@RequestBody Map<String, String> request) {
        String question = request.get("question");
        String answer = assistant.answerQuestion(question);
        return Map.of("question", question, "answer", answer);
    }
}

With the above configuration, the KnowledgeBaseAssistant interface is dynamically proxied by LangChain4j, automatically executing the full "retrieve → augment → generate" RAG flow. Only the interface and beans need to be defined, keeping the code footprint minimal.

Option 2 – Spring AI VectorStore Integration

Spring AI provides a VectorStore abstraction that feels natural to developers already familiar with the Spring ecosystem.

3.1 Dependency Configuration (MariaDB example)

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-vector-store-mariadb</artifactId>
</dependency>

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

3.2 Configuration File

spring:
  datasource:
    url: jdbc:mariadb://localhost:3306/db
    username: myUser
    password: myPassword
  ai:
    vectorstore:
      mariadb:
        initialize-schema: true
        distance-type: COSINE
        dimensions: 1536

3.3 VectorStore Operations

// Autowire the VectorStore
@Autowired
private VectorStore vectorStore;

public void addDocuments() {
    List<Document> documents = List.of(
        new Document("Spring AI 的向量存储非常方便", Map.of("source", "docs")),
        new Document("RAG 是检索增强生成的缩写", Map.of("source", "docs"))
    );
    vectorStore.add(documents);
}

public List<Document> search(String query) {
    return vectorStore.similaritySearch(
        SearchRequest.builder()
            .query(query)
            .topK(5)
            .similarityThreshold(0.7)
            .build()
    );
}

Spring AI also offers starters for other mainstream vector databases, allowing flexible switching based on business needs.

Supported Vector Databases

InMemoryStore – embedded, suitable for development, supports thousands of vectors, data lost on restart.

MariaDB – native vector support from version 11.7, lightweight, no extra deployment, suitable for up to ten‑thousand vectors.

Chroma – independent deployment, open‑source, ideal for beginners, supports up to one hundred thousand vectors.

Milvus – distributed, production‑grade, handles billions of vectors.

Pinecone – cloud‑managed service, pay‑as‑you‑go, scale‑agnostic.

Practical RAG Tuning Strategies

5.1 Document Chunking Strategy

// semantic chunking vs fixed‑length chunking
// Recommended: split on semantic boundaries (paragraphs, sections)
DocumentSplitters.recursive(500, 100)

5.2 Hybrid Retrieval

Pure vector search may miss exact keyword matches. Combining BM25 full‑text search can significantly improve recall.

{
  "query": {
    "function_score": {
      "query": {
        "match": { "content": "Java RAG" }
      },
      "functions": [
        { "filter": { "exists": { "field": "vector" } }, "weight": 0.7 }
      ]
    }
  }
}

5.3 Metadata Filtering

SearchRequest.builder()
    .query("Spring")
    .topK(5)
    .filterExpression("author == '张三'")
    .build();

5.4 Batch Processing and Token Control

Spring AI includes a TokenCountBatchingStrategy that automatically handles token limits for large batches, preventing overflow errors.

Next Episode Preview

The upcoming article will cover practical Java Function Calling, including multi‑tool collaboration, complex parameter parsing, and tool‑chain orchestration.

💡 The code repository was provided in the first episode.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java RAG Spring AI Retrieval Augmented Generation LangChain4j Vector Store

Written by

Coder Trainee

Experienced in Java and Python, we share and learn together. For submissions or collaborations, DM us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.