Mastering RAG with LangChain4j: From Simple Setup to Advanced Retrieval‑Augmented Generation
This article explains how to extend large language models with domain‑specific knowledge using Retrieval‑Augmented Generation (RAG) in LangChain4j, covering the concepts of RAG, its indexing and retrieval stages, simple RAG setup, detailed API usage, and advanced customization options such as query transformers and content injectors.
What is Retrieval‑Augmented Generation (RAG)
RAG injects relevant information from a user‑provided data source into the prompt before it is sent to a large language model (LLM). This reduces hallucinations and lets the LLM answer with up‑to‑date, domain‑specific facts.
RAG pipeline
The pipeline consists of two stages: Indexing and Retrieval . LangChain4j supplies utilities for both.
Indexing
Documents are loaded, optionally filtered, parsed (Apache Tika), split into TextSegment s (default 300 tokens with 30‑token overlap), embedded with an EmbeddingModel, and stored in an EmbeddingStore (vector database). Indexing can be performed offline (e.g., nightly batch) or online when users upload new files.
Retrieval
At query time the user question is embedded, a similarity search is executed against the EmbeddingStore, and the most relevant segments are appended to the user message before calling the LLM.
Simple RAG with LangChain4j
The langchain4j‑easy‑rag module hides the low‑level details. The following steps illustrate a minimal setup.
Dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-easy-rag</artifactId>
<version>0.34.0</version>
</dependency>Load documents
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");FileSystemDocumentLoader uses Apache Tika to detect file type and parse content.
Ingest into an in‑memory vector store
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);The ingestor splits each document into 300‑token segments with 30‑token overlap, embeds them with the configured EmbeddingModel, and stores the Embedding together with the original segment.
Build the assistant service
ChatLanguageModel chatModel = OpenAiChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName(GPT_4_O_MINI)
.build();
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(EmbeddingStoreContentRetriever.from(store))
.build();The assistant keeps the last 10 messages and retrieves relevant content from the in‑memory store.
Query the assistant
String answer = assistant.chat("How to use LangChain4j for simple RAG?");Access retrieved sources
Wrap the return type in Result<String> and call result.sources(). For streaming responses use onRetrieved() callbacks.
RAG API overview
Document
The Document class represents an unstructured text file (PDF, DOCX, HTML, etc.) and carries optional metadata such as file_name, url, or custom fields.
Metadata
Metadata is a Map<String, Object> of primitive values (String, Integer, Long, Float, Double). It can be used for filtering during retrieval or for enriching prompts.
Document loaders
FileSystemDocumentLoader
UrlDocumentLoader
AmazonS3DocumentLoader
AzureBlobStorageDocumentLoader
GitHubDocumentLoader
TencentCosDocumentLoader
Embedding model and store
EmbeddingModelconverts text or TextSegment into a numeric Embedding. EmbeddingStore (vector DB) stores embeddings and provides search() for similarity lookup. Implementations include in‑memory, Pinecone, Milvus, etc.
EmbeddingStoreIngestor
Ingests Document s into an EmbeddingStore using a configured EmbeddingModel. Optional DocumentTransformer, DocumentSplitter, and TextSegmentTransformer can be supplied to clean, chunk, or enrich data before embedding.
Retrieval augmentors
The RetrievalAugmentor is the entry point of the RAG pipeline. It receives a Query, optionally transforms it, routes it to one or more ContentRetriever s, aggregates the results, and injects them into the user prompt.
Query transformers
CompressingQueryTransformer– uses an LLM to compress a follow‑up question with its conversation context. ExpandingQueryTransformer – generates multiple reformulations of the original query.
Custom transformers (e.g., HyDE) can be implemented.
Content retrievers
EmbeddingStoreContentRetriever– vector‑search based retrieval. WebSearchContentRetriever – fetches results from a web search engine (e.g., Google Custom Search). SqlDatabaseContentRetriever (experimental) – generates SQL from a natural‑language query via an LLM and executes it. AzureAiSearchContentRetriever, Neo4jContentRetriever, etc.
Query router
DefaultQueryRouterforwards the query to all configured retrievers. LanguageModelQueryRouter can use an LLM to decide which retriever(s) to invoke.
Content aggregator
Aggregates results from multiple retrievers. Implementations include DefaultContentAggregator and ReRankingContentAggregator (re‑orders results with a secondary model).
Content injector
The default injector appends retrieved contents to the original user message using the template:
{{userMessage}}
Answer using the following information:
{{contents}}Parallel execution
If multiple queries or retrievers are present, DefaultRetrievalAugmentor runs routing and retrieval in parallel using a cached thread pool (keep‑alive 1 s).
Advanced RAG pipeline example
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.documentTransformer(doc -> {
doc.metadata().put("userId", "12345");
return doc;
})
.documentSplitter(DocumentSplitters.recursive(1000, 200, new OpenAiTokenizer()))
.textSegmentTransformer(seg -> TextSegment.from(
seg.metadata("file_name") + "
" + seg.text(),
seg.metadata()))
.build();
ContentRetriever vectorRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(3)
.minScore(0.75)
.filter(metadataKey("userId").isEqualTo("12345"))
.build();
ContentRetriever webRetriever = WebSearchContentRetriever.builder()
.webSearchEngine(googleSearchEngine)
.maxResults(3)
.build();
RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
.queryTransformer(new ExpandingQueryTransformer(chatModel))
.contentRetrievers(vectorRetriever, webRetriever)
.contentAggregator(new ReRankingContentAggregator(chatModel))
.contentInjector(new DefaultContentInjector())
.build();
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(chatModel)
.retrievalAugmentor(augmentor)
.build();This configuration indexes documents with custom metadata, retrieves both vector‑based and web‑based results, expands the query, re‑ranks the combined list, and injects the final context into the LLM prompt.
Key classes and interfaces
Document– raw text plus metadata. Metadata – key‑value map for filtering and enrichment. EmbeddingModel – creates Embedding from text. EmbeddingStore – vector database with add(), search(), and removal APIs. EmbeddingStoreIngestor – pipelines documents → segments → embeddings → store. ContentRetriever – fetches Content for a Query. RetrievalAugmentor – orchestrates query transformation, routing, aggregation, and injection. ContentInjector – formats the final prompt (default template shown above).
Conclusion
LangChain4j provides a complete toolbox for building RAG pipelines, from a quick “simple RAG” starter to fully customizable pipelines with query transformation, multiple retrievers, content aggregation, and parallel execution. By combining vector stores, embedding models, and optional web or database retrievers, developers can give LLMs access to up‑to‑date, domain‑specific knowledge while keeping the integration code concise and type‑safe.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
