Mastering RAG with LangChain4j: From Simple Setup to Advanced Retrieval‑Augmented Generation

This article explains how to extend large language models with domain‑specific knowledge using Retrieval‑Augmented Generation (RAG) in LangChain4j, covering the concepts of RAG, its indexing and retrieval stages, simple RAG setup, detailed API usage, and advanced customization options such as query transformers and content injectors.

JavaEdge
JavaEdge
JavaEdge
Mastering RAG with LangChain4j: From Simple Setup to Advanced Retrieval‑Augmented Generation

What is Retrieval‑Augmented Generation (RAG)

RAG injects relevant information from a user‑provided data source into the prompt before it is sent to a large language model (LLM). This reduces hallucinations and lets the LLM answer with up‑to‑date, domain‑specific facts.

RAG pipeline

The pipeline consists of two stages: Indexing and Retrieval . LangChain4j supplies utilities for both.

Indexing

Documents are loaded, optionally filtered, parsed (Apache Tika), split into TextSegment s (default 300 tokens with 30‑token overlap), embedded with an EmbeddingModel, and stored in an EmbeddingStore (vector database). Indexing can be performed offline (e.g., nightly batch) or online when users upload new files.

Retrieval

At query time the user question is embedded, a similarity search is executed against the EmbeddingStore, and the most relevant segments are appended to the user message before calling the LLM.

Simple RAG with LangChain4j

The langchain4j‑easy‑rag module hides the low‑level details. The following steps illustrate a minimal setup.

Dependency

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-easy-rag</artifactId>
    <version>0.34.0</version>
</dependency>

Load documents

List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

FileSystemDocumentLoader uses Apache Tika to detect file type and parse content.

Ingest into an in‑memory vector store

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);

The ingestor splits each document into 300‑token segments with 30‑token overlap, embeds them with the configured EmbeddingModel, and stores the Embedding together with the original segment.

Build the assistant service

ChatLanguageModel chatModel = OpenAiChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName(GPT_4_O_MINI)
    .build();

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

The assistant keeps the last 10 messages and retrieves relevant content from the in‑memory store.

Query the assistant

String answer = assistant.chat("How to use LangChain4j for simple RAG?");

Access retrieved sources

Wrap the return type in Result<String> and call result.sources(). For streaming responses use onRetrieved() callbacks.

RAG API overview

Document

The Document class represents an unstructured text file (PDF, DOCX, HTML, etc.) and carries optional metadata such as file_name, url, or custom fields.

Metadata

Metadata is a Map<String, Object> of primitive values (String, Integer, Long, Float, Double). It can be used for filtering during retrieval or for enriching prompts.

Document loaders

FileSystemDocumentLoader

UrlDocumentLoader

AmazonS3DocumentLoader

AzureBlobStorageDocumentLoader

GitHubDocumentLoader

TencentCosDocumentLoader

Embedding model and store

EmbeddingModel

converts text or TextSegment into a numeric Embedding. EmbeddingStore (vector DB) stores embeddings and provides search() for similarity lookup. Implementations include in‑memory, Pinecone, Milvus, etc.

EmbeddingStoreIngestor

Ingests Document s into an EmbeddingStore using a configured EmbeddingModel. Optional DocumentTransformer, DocumentSplitter, and TextSegmentTransformer can be supplied to clean, chunk, or enrich data before embedding.

Retrieval augmentors

The RetrievalAugmentor is the entry point of the RAG pipeline. It receives a Query, optionally transforms it, routes it to one or more ContentRetriever s, aggregates the results, and injects them into the user prompt.

Query transformers

CompressingQueryTransformer

– uses an LLM to compress a follow‑up question with its conversation context. ExpandingQueryTransformer – generates multiple reformulations of the original query.

Custom transformers (e.g., HyDE) can be implemented.

Content retrievers

EmbeddingStoreContentRetriever

– vector‑search based retrieval. WebSearchContentRetriever – fetches results from a web search engine (e.g., Google Custom Search). SqlDatabaseContentRetriever (experimental) – generates SQL from a natural‑language query via an LLM and executes it. AzureAiSearchContentRetriever, Neo4jContentRetriever, etc.

Query router

DefaultQueryRouter

forwards the query to all configured retrievers. LanguageModelQueryRouter can use an LLM to decide which retriever(s) to invoke.

Content aggregator

Aggregates results from multiple retrievers. Implementations include DefaultContentAggregator and ReRankingContentAggregator (re‑orders results with a secondary model).

Content injector

The default injector appends retrieved contents to the original user message using the template:

{{userMessage}}

Answer using the following information:
{{contents}}

Parallel execution

If multiple queries or retrievers are present, DefaultRetrievalAugmentor runs routing and retrieval in parallel using a cached thread pool (keep‑alive 1 s).

Advanced RAG pipeline example

EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
    .embeddingModel(embeddingModel)
    .embeddingStore(embeddingStore)
    .documentTransformer(doc -> {
        doc.metadata().put("userId", "12345");
        return doc;
    })
    .documentSplitter(DocumentSplitters.recursive(1000, 200, new OpenAiTokenizer()))
    .textSegmentTransformer(seg -> TextSegment.from(
        seg.metadata("file_name") + "
" + seg.text(),
        seg.metadata()))
    .build();

ContentRetriever vectorRetriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(3)
    .minScore(0.75)
    .filter(metadataKey("userId").isEqualTo("12345"))
    .build();

ContentRetriever webRetriever = WebSearchContentRetriever.builder()
    .webSearchEngine(googleSearchEngine)
    .maxResults(3)
    .build();

RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .queryTransformer(new ExpandingQueryTransformer(chatModel))
    .contentRetrievers(vectorRetriever, webRetriever)
    .contentAggregator(new ReRankingContentAggregator(chatModel))
    .contentInjector(new DefaultContentInjector())
    .build();

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(chatModel)
    .retrievalAugmentor(augmentor)
    .build();

This configuration indexes documents with custom metadata, retrieves both vector‑based and web‑based results, expands the query, re‑ranks the combined list, and injects the final context into the LLM prompt.

Key classes and interfaces

Document

– raw text plus metadata. Metadata – key‑value map for filtering and enrichment. EmbeddingModel – creates Embedding from text. EmbeddingStore – vector database with add(), search(), and removal APIs. EmbeddingStoreIngestor – pipelines documents → segments → embeddings → store. ContentRetriever – fetches Content for a Query. RetrievalAugmentor – orchestrates query transformation, routing, aggregation, and injection. ContentInjector – formats the final prompt (default template shown above).

Conclusion

LangChain4j provides a complete toolbox for building RAG pipelines, from a quick “simple RAG” starter to fully customizable pipelines with query transformation, multiple retrievers, content aggregation, and parallel execution. By combining vector stores, embedding models, and optional web or database retrievers, developers can give LLMs access to up‑to‑date, domain‑specific knowledge while keeping the integration code concise and type‑safe.

JavaLLMRAGEmbeddingLangchain4jVectorSearchRetrievalAugmentor
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.