Artificial Intelligence 35 min read

Implementing Retrieval‑Augmented Generation (RAG) with LangChain4j in Java

This article provides a step‑by‑step guide for Java engineers on building a Retrieval‑Augmented Generation (RAG) application using the LangChain4j framework, covering RAG fundamentals, environment setup, Maven integration, document loading, splitting, embedding with OpenAI, vector store management with Chroma, and prompt‑based LLM interaction.

JD Tech
JD Tech
JD Tech
Implementing Retrieval‑Augmented Generation (RAG) with LangChain4j in Java

Introduction

The rapid rise of large language models (LLMs) such as ChatGPT has highlighted the need for Retrieval‑Augmented Generation (RAG) to keep responses up‑to‑date and incorporate private enterprise data securely.

What is RAG?

RAG combines traditional information retrieval (IR) with generative LLMs: relevant document fragments are first retrieved from a knowledge base and then supplied as context to the LLM, improving answer accuracy and relevance.

LangChain4j Overview

LangChain4j is the Java implementation of the LangChain framework. It abstracts LLM interaction, prompt handling, document splitting, embedding, and vector‑store operations, allowing developers to focus on business logic.

Differences Between LLM Development and Traditional Java Development

LLM development emphasizes data preparation, model selection/fine‑tuning, prompt engineering, and integration of external knowledge, whereas traditional Java development focuses on system architecture, modular design, and algorithm implementation.

Environment Setup

Install Python (for Chroma) and verify with python --version . Install Chroma via chroma run . On macOS, use Homebrew: brew install python then repeat the verification steps.

Integrating LangChain4j

<properties>
    <langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-core</artifactId>
    <version>${langchain4j.version}</version>
</dependency>
... (additional dependencies for OpenAI, embeddings, Chroma, etc.)

Project Structure

A typical Maven layout is shown, with source code under src/main/java (packages ChatWithMemory , Constants , Main , RagChat , Utils ) and resources such as log4j2.xml and 笑话.txt .

Knowledge Acquisition

URL docUrl = Main.class.getClassLoader().getResource("笑话.txt");
if (docUrl == null) {
    log.error("未获取到文件");
}
Document document = getDocument(docUrl);

Document Splitting

Use DocumentSplitters.recursive(150, 10, new OpenAiTokenizer()) to split the text into overlapping segments, preserving semantic continuity.

Token Basics

Tokens are the units produced by a tokenizer (e.g., BPE for OpenAI models). Example tokenization of the Chinese sentence "我喜欢吃苹果" yields four tokens.

Embedding (Vectorisation)

OpenAiEmbeddingModel embeddingModel = new OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder()
    .apiKey(API_KEY)
    .baseUrl(BASE_URL)
    .build();
String text = "两只眼睛";
Embedding embedding = embeddingModel.embed(text).content();
log.info("向量维度: {}", embedding.dimension());

The model text-embedding-ada-002 returns a 1536‑dimensional vector.

Vector Store (Chroma) Setup

Client client = new Client(CHROMA_URL);
EmbeddingFunction embeddingFunction = new OpenAIEmbeddingFunction(API_KEY, OPEN_AI_MODULE_NAME);
client.createCollection(CHROMA_DB_DEFAULT_COLLECTION_NAME, null, true, embeddingFunction);
EmbeddingStore<TextSegment> embeddingStore = ChromaEmbeddingStore.builder()
    .baseUrl(CHROMA_URL)
    .collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
    .build();
segments.forEach(segment -> {
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
});

Retrieval from Vector Store

Embedding queryEmbedding = embeddingModel.embed(qryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(1)
    .build();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(request);
TextSegment matched = result.matches().get(0).embedded();

LLM Interaction

String promptTemplate = "基于如下信息进行回答:\n{{context}}\n提问:\n{{question}}";
PromptTemplate prompt = PromptTemplate.from(promptTemplate);
Map<String, Object> vars = new HashMap<>();
vars.put("context", matched.text());
vars.put("question", QUESTION);
Prompt promptInst = prompt.apply(vars);
UserMessage userMessage = promptInst.toUserMessage();
OpenAiChatModel chatModel = OpenAiChatModel.builder()
    .apiKey(API_KEY)
    .baseUrl(BASE_URL)
    .modelName(OPEN_AI_MODULE_NAME)
    .temperature(0.0)
    .build();
Response<AiMessage> response = chatModel.generate(userMessage);
String answer = response.content();

Testing and Results

The article demonstrates both a direct LLM query and a RAG‑enhanced query using the stored jokes. The RAG version returns a response that incorporates the retrieved joke fragment, illustrating the benefit of contextual retrieval.

Conclusion & Outlook

Through a concrete Java example, the guide shows how to build a RAG‑enabled LLM application with LangChain4j, highlighting the importance of document preprocessing, embedding, vector storage, and prompt engineering. Continued exploration of RAG will unlock more intelligent and efficient solutions across business scenarios.

JavaLLMRAGvector databaseembeddingLangChain4j
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.