Implementing Retrieval‑Augmented Generation (RAG) with LangChain4j in Java
This article provides a step‑by‑step guide for Java engineers on building a Retrieval‑Augmented Generation (RAG) application using the LangChain4j framework, covering RAG fundamentals, environment setup, Maven integration, document loading, splitting, embedding with OpenAI, vector store management with Chroma, and prompt‑based LLM interaction.
Introduction
The rapid rise of large language models (LLMs) such as ChatGPT has highlighted the need for Retrieval‑Augmented Generation (RAG) to keep responses up‑to‑date and incorporate private enterprise data securely.
What is RAG?
RAG combines traditional information retrieval (IR) with generative LLMs: relevant document fragments are first retrieved from a knowledge base and then supplied as context to the LLM, improving answer accuracy and relevance.
LangChain4j Overview
LangChain4j is the Java implementation of the LangChain framework. It abstracts LLM interaction, prompt handling, document splitting, embedding, and vector‑store operations, allowing developers to focus on business logic.
Differences Between LLM Development and Traditional Java Development
LLM development emphasizes data preparation, model selection/fine‑tuning, prompt engineering, and integration of external knowledge, whereas traditional Java development focuses on system architecture, modular design, and algorithm implementation.
Environment Setup
Install Python (for Chroma) and verify with python --version . Install Chroma via chroma run . On macOS, use Homebrew: brew install python then repeat the verification steps.
Integrating LangChain4j
<properties>
<langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>${langchain4j.version}</version>
</dependency>
... (additional dependencies for OpenAI, embeddings, Chroma, etc.)Project Structure
A typical Maven layout is shown, with source code under src/main/java (packages ChatWithMemory , Constants , Main , RagChat , Utils ) and resources such as log4j2.xml and 笑话.txt .
Knowledge Acquisition
URL docUrl = Main.class.getClassLoader().getResource("笑话.txt");
if (docUrl == null) {
log.error("未获取到文件");
}
Document document = getDocument(docUrl);Document Splitting
Use DocumentSplitters.recursive(150, 10, new OpenAiTokenizer()) to split the text into overlapping segments, preserving semantic continuity.
Token Basics
Tokens are the units produced by a tokenizer (e.g., BPE for OpenAI models). Example tokenization of the Chinese sentence "我喜欢吃苹果" yields four tokens.
Embedding (Vectorisation)
OpenAiEmbeddingModel embeddingModel = new OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder()
.apiKey(API_KEY)
.baseUrl(BASE_URL)
.build();
String text = "两只眼睛";
Embedding embedding = embeddingModel.embed(text).content();
log.info("向量维度: {}", embedding.dimension());The model text-embedding-ada-002 returns a 1536‑dimensional vector.
Vector Store (Chroma) Setup
Client client = new Client(CHROMA_URL);
EmbeddingFunction embeddingFunction = new OpenAIEmbeddingFunction(API_KEY, OPEN_AI_MODULE_NAME);
client.createCollection(CHROMA_DB_DEFAULT_COLLECTION_NAME, null, true, embeddingFunction);
EmbeddingStore<TextSegment> embeddingStore = ChromaEmbeddingStore.builder()
.baseUrl(CHROMA_URL)
.collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
.build();
segments.forEach(segment -> {
Embedding embedding = embeddingModel.embed(segment).content();
embeddingStore.add(embedding, segment);
});Retrieval from Vector Store
Embedding queryEmbedding = embeddingModel.embed(qryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(1)
.build();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(request);
TextSegment matched = result.matches().get(0).embedded();LLM Interaction
String promptTemplate = "基于如下信息进行回答:\n{{context}}\n提问:\n{{question}}";
PromptTemplate prompt = PromptTemplate.from(promptTemplate);
Map<String, Object> vars = new HashMap<>();
vars.put("context", matched.text());
vars.put("question", QUESTION);
Prompt promptInst = prompt.apply(vars);
UserMessage userMessage = promptInst.toUserMessage();
OpenAiChatModel chatModel = OpenAiChatModel.builder()
.apiKey(API_KEY)
.baseUrl(BASE_URL)
.modelName(OPEN_AI_MODULE_NAME)
.temperature(0.0)
.build();
Response<AiMessage> response = chatModel.generate(userMessage);
String answer = response.content();Testing and Results
The article demonstrates both a direct LLM query and a RAG‑enhanced query using the stored jokes. The RAG version returns a response that incorporates the retrieved joke fragment, illustrating the benefit of contextual retrieval.
Conclusion & Outlook
Through a concrete Java example, the guide shows how to build a RAG‑enabled LLM application with LangChain4j, highlighting the importance of document preprocessing, embedding, vector storage, and prompt engineering. Continued exploration of RAG will unlock more intelligent and efficient solutions across business scenarios.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.