Artificial Intelligence 35 min read

Practical Guide to Building Retrieval‑Augmented Generation (RAG) Applications with LangChain4j in Java

This article provides a step‑by‑step tutorial for Java engineers on using the LangChain4j framework to implement Retrieval‑Augmented Generation (RAG) with large language models, covering concepts, environment setup, code integration, document splitting, embedding, vector‑store operations, and prompt engineering.

JD Tech Talk

Jan 9, 2025

Practical Guide to Building Retrieval‑Augmented Generation (RAG) Applications with LangChain4j in Java

1. Introduction

ChatGPT and similar large language models are pre‑trained and cannot contain the most recent knowledge; GPT‑4o, for example, only knows data up to June 2023. To obtain up‑to‑date answers, Retrieval‑Augmented Generation (RAG) is required. RAG is also needed for private, on‑premise data that cannot be uploaded to the internet.

2. Core Concepts

2.1 What is RAG?

RAG (Retrieval‑Augmented Generation) combines traditional information retrieval (IR) with generative large models. Before generating an answer, the system retrieves relevant document fragments from a knowledge base and feeds them as additional context to the LLM, improving accuracy and richness.

The workflow consists of four steps:

Receive request : user asks a question.

Retrieve (R) : the system searches a large document collection for the most relevant fragments.

Augment (A) : retrieved fragments are combined with the original query and passed to the LLM using a prompt such as "Please answer the question based on the following context: ...".

Generate (G) : the LLM produces the final answer.

Although a relational database or full‑text engine (e.g., MySQL, Elasticsearch) can be used for retrieval, vector databases are preferred because they excel at similarity search rather than exact matching.

2.2 LangChain4j Overview

LangChain4j is the Java implementation of the LangChain framework. "Lang" stands for Large Language Model and "Chain" denotes the modular, chained execution of LLM‑related functions. The library abstracts LLM integration details, simplifying development and improving productivity for Java engineers.

2.3 Large‑Model Development vs. Traditional Java Development

Large‑model development focuses on data preparation, model selection, fine‑tuning, prompt engineering, and integrating the LLM into existing systems.

Traditional Java development emphasizes system architecture, module design, and algorithm implementation, with the business logic written directly in code.

3. Hands‑On Experience

3.1 Environment Setup

3.1.1 Vector Store (Chroma)

Windows : python --version Install Python from the official site, then install Chroma following the official guide and verify with: chroma run macOS : brew install python Or download from python.org, then verify with python --version and install Chroma similarly.

3.1.2 Integrate LangChain4j

<properties>
    <langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-core</artifactId>
    <version>${langchain4j.version}</version>
</dependency>
... (other required dependencies) ...

3.2 Programming Steps

3.2.1 Project Structure

LangChain
├── core
│   ├── src
│   │   ├── main
│   │   │   ├── java
│   │   │   │   └── cn.jdl.tech_and_data.ka
│   │   │   │       ├── ChatWithMemory
│   │   │   │       ├── Constants
│   │   │   │       ├── Main
│   │   │   │       ├── RagChat
│   │   │   │       └── Utils
│   │   │   └── resources
│   │   │       ├── log4j2.xml
│   │   │       └── 笑话.txt
│   └── pom.xml
├── parent [learn.langchain.parent]
└── pom.xml

3.2.2 Knowledge Acquisition

Load a local text file (e.g., 笑话.txt) as the knowledge source:

URL docUrl = Main.class.getClassLoader().getResource("笑话.txt");
if (docUrl == null) {
    log.error("Failed to locate file");
}
Document document = getDocument(docUrl);
if (document == null) {
    log.error("Failed to load file");
}

3.2.3 Document Splitting

Use DocumentSplitters.recursive to split the document into overlapping chunks. Parameters are chunk size (tokens), overlap size (tokens), and tokenizer.

DocumentSplitter splitter = DocumentSplitters.recursive(150, 10, new OpenAiTokenizer());
List<TextSegment> segments = splitter.split(document);

Tokens are the basic units after tokenization; for Chinese, a token may correspond to a character or sub‑character. The article explains token‑to‑character ratios and shows a logging example.

3.2.4 Text Vectorization

Convert each text segment into an embedding vector using OpenAI’s text-embedding-ada-002 model (1536 dimensions):

OpenAiEmbeddingModel embeddingModel = new OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder()
        .apiKey(API_KEY)
        .baseUrl(BASE_URL)
        .build();
Embedding embedding = embeddingModel.embed(text).content();
log.info("Embedding dimension: {}", embedding.dimension());

3.2.5 Vector Store Storage

Connect to a Chroma vector database, create a collection, and store each TextSegment together with its embedding:

Client client = new Client(CHROMA_URL);
EmbeddingFunction embeddingFunction = new OpenAIEmbeddingFunction(API_KEY, OPEN_AI_MODULE_NAME);
client.createCollection(CHROMA_DB_DEFAULT_COLLECTION_NAME, null, true, embeddingFunction);
EmbeddingStore<TextSegment> store = ChromaEmbeddingStore.builder()
        .baseUrl(CHROMA_URL)
        .collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
        .build();
segments.forEach(s -> {
    Embedding e = embeddingModel.embed(s).content();
    store.add(e, s);
});

3.2.6 Vector Store Retrieval

Embed the user query, then search the collection for the most similar segment:

Embedding queryEmbedding = embeddingModel.embed(queryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .maxResults(1)
        .build();
EmbeddingSearchResult<TextSegment> result = store.search(request);
TextSegment matched = result.matches().get(0).embedded();

3.2.7 Interaction with the LLM

Build a prompt that injects the retrieved context and the original question, then call the chat model:

String prompt = "Based on the following information answer the question:
{{context}}
Question:
{{question}}";
PromptTemplate template = PromptTemplate.from(prompt);
Map<String, Object> vars = new HashMap<>();
vars.put("context", matched.text());
vars.put("question", userQuestion);
Prompt finalPrompt = template.apply(vars);
UserMessage userMessage = finalPrompt.toUserMessage();
Response<AiMessage> response = openAiChatModel.generate(userMessage);
String answer = response.content();

3.3 Testing and Verification

The article shows log output for both a plain LLM call (without RAG) and an RAG‑enhanced call, demonstrating how the retrieved joke fragment is incorporated into the final answer.

4. Conclusion and Outlook

This hands‑on guide demonstrates how Java engineers can leverage LangChain4j to build RAG‑enabled applications, covering everything from environment preparation to prompt engineering. Continued exploration of RAG will unlock more intelligent, efficient solutions across diverse business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java RAG Vector Database Embedding LangChain4j

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.