Artificial Intelligence 34 min read

Boost Your Java Apps with LangChain4j: A Hands‑On RAG Guide

This article walks Java developers through the fundamentals of Retrieval‑Augmented Generation (RAG), explains the LangChain4j framework, compares large‑model development with traditional Java coding, and provides step‑by‑step code examples for environment setup, document splitting, embedding, vector‑store operations, and LLM interaction.

JD Cloud Developers

Jan 9, 2025

Boost Your Java Apps with LangChain4j: A Hands‑On RAG Guide

Introduction

ChatGPT and other large language models (LLMs) are pre‑trained and cannot incorporate the latest data without additional techniques. Retrieval‑Augmented Generation (RAG) enables up‑to‑date answers by retrieving relevant private or recent documents before generation.

What is RAG

RAG combines traditional information retrieval (IR) with generative LLMs. The workflow consists of four steps: receive a user request, retrieve relevant document fragments, augment the original query with these fragments, and let the LLM generate a final answer.

Retrieval can use relational databases, full‑text search engines, or vector databases; vector stores are preferred because they support similarity search rather than exact keyword matching.

LangChain4j Overview

LangChain4j is the Java implementation of the LangChain framework. It abstracts LLM interaction, prompt handling, document splitting, embedding, and vector‑store management, allowing developers to focus on business logic.

Large‑Model Development vs. Traditional Java Development

In large‑model development, the focus is on data preparation, model selection/fine‑tuning, prompt engineering, and integrating LLMs into existing systems. Traditional Java development emphasizes system architecture, modular design, and algorithm implementation.

Practical Experience

Environment Setup

Windows : Install Python, verify with python --version, then install Chroma (vector store) and verify with chroma run.

macOS : Install Python via Homebrew ( brew install python) or download from python.org, then verify and install Chroma similarly.

Integrating LangChain4j

<properties>
    <langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-core</artifactId>
    <version>${langchain4j.version}</version>
</dependency>
... (additional dependencies for OpenAI, embeddings, Chroma, etc.)

Project Structure

LangChain
├── core
│   ├── src/main/java/cn/jdl/tech_and_data/ka
│   │   ├── ChatWithMemory
│   │   ├── Constants
│   │   ├── Main
│   │   ├── RagChat
│   │   └── Utils
│   └── resources
│       ├── log4j2.xml
│       └── 笑话.txt
├── pom.xml
└── parent [learn.langchain.parent]

Knowledge Acquisition

Load a local text file (e.g., 笑话.txt) as the knowledge base:

URL docUrl = Main.class.getClassLoader().getResource("笑话.txt");
Document document = getDocument(docUrl);

Document Splitting

Use DocumentSplitters.recursive(150, 10, new OpenAiTokenizer()) to split the document into overlapping segments of up to 150 tokens.

Tokens are the basic units after tokenization (e.g., BPE for GPT‑4o). Splitting is necessary because LLMs have input‑length limits.

Embedding

Create an OpenAI embedding model (text‑embedding‑ada‑002, 1536‑dimensional) and embed each segment:

OpenAiEmbeddingModel embeddingModel = new OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder()
    .apiKey(API_KEY)
    .baseUrl(BASE_URL)
    .build();
Embedding embedding = embeddingModel.embed(text).content();

Vector Store Storage

Start a Chroma instance, create a collection, and store each TextSegment together with its embedding:

Client client = new Client(CHROMA_URL);
EmbeddingFunction embeddingFunction = new OpenAIEmbeddingFunction(API_KEY, OPEN_AI_MODULE_NAME);
client.createCollection(CHROMA_DB_DEFAULT_COLLECTION_NAME, null, true, embeddingFunction);
EmbeddingStore<TextSegment> store = ChromaEmbeddingStore.builder()
    .baseUrl(CHROMA_URL)
    .collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
    .build();
segments.forEach(s -> {
    Embedding e = embeddingModel.embed(s).content();
    store.add(e, s);
});

Vector Store Retrieval

Embed the user query, then search the collection for the most similar segment:

Embedding queryEmbedding = embeddingModel.embed(queryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(1)
    .build();
EmbeddingSearchResult<TextSegment> result = store.search(request);
TextSegment matched = result.matches().get(0).embedded();

LLM Interaction

Build a prompt that injects the retrieved context and the original question, then call the OpenAI chat model:

PromptTemplate template = PromptTemplate.from(
    "Based on the following information answer the question:
{{context}}
Question:
{{question}}");
Map<String, Object> vars = new HashMap<>();
vars.put("context", matched.text());
vars.put("question", QUESTION);
Prompt prompt = template.apply(vars);
UserMessage userMessage = prompt.toUserMessage();
OpenAiChatModel chatModel = OpenAiChatModel.builder()
    .apiKey(API_KEY)
    .baseUrl(BASE_URL)
    .modelName(OPEN_AI_MODULE_NAME)
    .temperature(0)
    .build();
Response<AiMessage> response = chatModel.generate(userMessage);
String answer = response.content();

Testing

The article demonstrates both a plain LLM call (without RAG) and a RAG‑enhanced call using the ice‑cream joke dataset, showing how the retrieved segment improves answer relevance.

Conclusion and Outlook

The hands‑on example illustrates the complete pipeline—from environment preparation to vector‑store retrieval and LLM generation—using LangChain4j for RAG. Continued exploration will unlock more sophisticated applications of RAG in enterprise scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java RAG vector database Embedding Large Language Model LangChain4j

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.