Practical Guide to Building Retrieval‑Augmented Generation (RAG) Applications with LangChain4j in Java
This article provides a step‑by‑step tutorial for Java engineers on using the LangChain4j framework to implement Retrieval‑Augmented Generation (RAG) with large language models, covering concepts, environment setup, code integration, document splitting, embedding, vector‑store operations, and prompt engineering.
1. Introduction
ChatGPT and similar large language models are pre‑trained and cannot contain the most recent knowledge; GPT‑4o, for example, only knows data up to June 2023. To obtain up‑to‑date answers, Retrieval‑Augmented Generation (RAG) is required. RAG is also needed for private, on‑premise data that cannot be uploaded to the internet.
2. Core Concepts
2.1 What is RAG?
RAG (Retrieval‑Augmented Generation) combines traditional information retrieval (IR) with generative large models. Before generating an answer, the system retrieves relevant document fragments from a knowledge base and feeds them as additional context to the LLM, improving accuracy and richness.
The workflow consists of four steps:
Receive request : user asks a question.
Retrieve (R) : the system searches a large document collection for the most relevant fragments.
Augment (A) : retrieved fragments are combined with the original query and passed to the LLM using a prompt such as "Please answer the question based on the following context: ...".
Generate (G) : the LLM produces the final answer.
Although a relational database or full‑text engine (e.g., MySQL, Elasticsearch) can be used for retrieval, vector databases are preferred because they excel at similarity search rather than exact matching.
2.2 LangChain4j Overview
LangChain4j is the Java implementation of the LangChain framework. "Lang" stands for Large Language Model and "Chain" denotes the modular, chained execution of LLM‑related functions. The library abstracts LLM integration details, simplifying development and improving productivity for Java engineers.
2.3 Large‑Model Development vs. Traditional Java Development
Large‑model development focuses on data preparation, model selection, fine‑tuning, prompt engineering, and integrating the LLM into existing systems.
Traditional Java development emphasizes system architecture, module design, and algorithm implementation, with the business logic written directly in code.
3. Hands‑On Experience
3.1 Environment Setup
3.1.1 Vector Store (Chroma)
Windows :
python --versionInstall Python from the official site, then install Chroma following the official guide and verify with:
chroma runmacOS :
brew install pythonOr download from python.org, then verify with python --version and install Chroma similarly.
3.1.2 Integrate LangChain4j
<properties>
<langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>${langchain4j.version}</version>
</dependency>
... (other required dependencies) ...3.2 Programming Steps
3.2.1 Project Structure
LangChain
├── core
│ ├── src
│ │ ├── main
│ │ │ ├── java
│ │ │ │ └── cn.jdl.tech_and_data.ka
│ │ │ │ ├── ChatWithMemory
│ │ │ │ ├── Constants
│ │ │ │ ├── Main
│ │ │ │ ├── RagChat
│ │ │ │ └── Utils
│ │ │ └── resources
│ │ │ ├── log4j2.xml
│ │ │ └── 笑话.txt
│ └── pom.xml
├── parent [learn.langchain.parent]
└── pom.xml3.2.2 Knowledge Acquisition
Load a local text file (e.g., 笑话.txt ) as the knowledge source:
URL docUrl = Main.class.getClassLoader().getResource("笑话.txt");
if (docUrl == null) {
log.error("Failed to locate file");
}
Document document = getDocument(docUrl);
if (document == null) {
log.error("Failed to load file");
}3.2.3 Document Splitting
Use DocumentSplitters.recursive to split the document into overlapping chunks. Parameters are chunk size (tokens), overlap size (tokens), and tokenizer.
DocumentSplitter splitter = DocumentSplitters.recursive(150, 10, new OpenAiTokenizer());
List
segments = splitter.split(document);Tokens are the basic units after tokenization; for Chinese, a token may correspond to a character or sub‑character. The article explains token‑to‑character ratios and shows a logging example.
3.2.4 Text Vectorization
Convert each text segment into an embedding vector using OpenAI’s text-embedding-ada-002 model (1536 dimensions):
OpenAiEmbeddingModel embeddingModel = new OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder()
.apiKey(API_KEY)
.baseUrl(BASE_URL)
.build();
Embedding embedding = embeddingModel.embed(text).content();
log.info("Embedding dimension: {}", embedding.dimension());3.2.5 Vector Store Storage
Connect to a Chroma vector database, create a collection, and store each TextSegment together with its embedding:
Client client = new Client(CHROMA_URL);
EmbeddingFunction embeddingFunction = new OpenAIEmbeddingFunction(API_KEY, OPEN_AI_MODULE_NAME);
client.createCollection(CHROMA_DB_DEFAULT_COLLECTION_NAME, null, true, embeddingFunction);
EmbeddingStore
store = ChromaEmbeddingStore.builder()
.baseUrl(CHROMA_URL)
.collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
.build();
segments.forEach(s -> {
Embedding e = embeddingModel.embed(s).content();
store.add(e, s);
});3.2.6 Vector Store Retrieval
Embed the user query, then search the collection for the most similar segment:
Embedding queryEmbedding = embeddingModel.embed(queryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(1)
.build();
EmbeddingSearchResult
result = store.search(request);
TextSegment matched = result.matches().get(0).embedded();3.2.7 Interaction with the LLM
Build a prompt that injects the retrieved context and the original question, then call the chat model:
String prompt = "Based on the following information answer the question:\n{{context}}\nQuestion:\n{{question}}";
PromptTemplate template = PromptTemplate.from(prompt);
Map
vars = new HashMap<>();
vars.put("context", matched.text());
vars.put("question", userQuestion);
Prompt finalPrompt = template.apply(vars);
UserMessage userMessage = finalPrompt.toUserMessage();
Response
response = openAiChatModel.generate(userMessage);
String answer = response.content();3.3 Testing and Verification
The article shows log output for both a plain LLM call (without RAG) and an RAG‑enhanced call, demonstrating how the retrieved joke fragment is incorporated into the final answer.
4. Conclusion and Outlook
This hands‑on guide demonstrates how Java engineers can leverage LangChain4j to build RAG‑enabled applications, covering everything from environment preparation to prompt engineering. Continued exploration of RAG will unlock more intelligent, efficient solutions across diverse business scenarios.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.