Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

This guide walks through the importance of Retrieval‑Augmented Generation, outlines the core Langchain4j and Ollama 3 components, and provides a complete Java example—including Maven setup, document ingestion, embedding creation, similarity search, prompt construction, and response generation—to demonstrate a functional RAG pipeline.

JakartaEE China Community
JakartaEE China Community
JakartaEE China Community
Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

Why Retrieval‑Augmented Generation matters

RAG combines external document retrieval with generative language models, dramatically improving answer accuracy and contextual relevance. Unlike pure generation, it can pull up‑to‑date domain knowledge that the model’s pre‑training may miss, reducing hallucinations and boosting reliability for precision‑critical applications.

Key Langchain4j and Ollama 3 components

EmbeddingStore : Manages vector embeddings extracted from documents.

EmbeddingStoreIngestor : Loads documents and generates their embeddings.

OllamaEmbeddingModel : Produces embeddings from text for retrieval.

OllamaLanguageModel : Generates responses using retrieved context.

Step‑by‑step example

First, ensure the Ollama 3 engine is running. Add the Maven dependency:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-ollama</artifactId>
    <version>0.33.0</version>
</dependency>

Create a text file dictionary.txt containing a fantasy creature description (the “Shadowmire”).

The Shadowmire is a mysterious and ancient creature that dwells in the darkest, most secluded swamps of Middle‑earth.
It has the body of a large, sleek panther, but its fur is a deep, iridescent black that seems to absorb light.
Its eyes are a piercing emerald green, glowing with an eerie luminescence that can be seen from afar.

Place the file in the Maven project’s resources directory.

Then implement the RAG pipeline:

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import java.net.URL;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.util.List;
import java.util.Map;

public class RAGIngestor {
    private static final Duration timeout = Duration.ofSeconds(900);
    public static void main(String[] args) throws Exception {
        EmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3")
                .build();
        EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
        URL fileUrl = RAGIngestor.class.getResource("/dictionary.txt");
        Path path = Paths.get(fileUrl.toURI());
        Document document = FileSystemDocumentLoader.loadDocument(path, new TextDocumentParser());
        DocumentSplitter splitter = DocumentSplitters.recursive(600, 0);
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(splitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(document);
        Embedding queryEmbedding = embeddingModel.embed("What is the Shadowmire ?").content();
        List<EmbeddingMatch<TextSegment>> relevant = embeddingStore.findRelevant(queryEmbedding, 1);
        EmbeddingMatch<TextSegment> embeddingMatch = relevant.get(0);
        String information = embeddingMatch.embedded().text();
        Prompt prompt = PromptTemplate.from("""
                Tell me about {{name}}?
                Use the following information to answer the question:
                {{information}}
                """).apply(Map.of("name", "Shadowmire", "information", information));
        OllamaLanguageModel model = OllamaLanguageModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3")
                .timeout(timeout)
                .build();
        String answer = model.generate(prompt).content();
        System.out.println("Answer:" + answer);
    }
}

Code walkthrough

Initialize embedding model : Connects to the local Ollama 3 service.

Initialize embedding store : Creates an in‑memory vector store.

Load and parse document : Reads dictionary.txt and converts it to a Document object.

Split document : Uses a recursive splitter to break the text into manageable chunks (max 600 tokens).

Ingest document : Generates embeddings for each chunk and stores them.

Create query embedding : Embeds the user query "What is the Shadowmire?".

Retrieve relevant information : Performs a similarity search in the embedding store.

Prepare prompt : Combines the retrieved snippet with a template.

Initialize language model : Sets up OllamaLanguageModel for generation.

Generate response : Calls model.generate and prints the answer.

Running the program for a few minutes produces a context‑aware answer, as shown in the screenshot below.

RAG answer screenshot
RAG answer screenshot

Conclusion

By integrating retrieval‑based and generative models, the Langchain4j‑Ollama 3 stack offers a powerful method to boost the accuracy and relevance of natural‑language tasks. The tutorial provides a baseline framework that can be customized and extended for specific datasets and use cases.

JavaLLMRAGEmbeddingOllamaLangchain4j
JakartaEE China Community
Written by

JakartaEE China Community

JakartaEE China Community, official website: jakarta.ee/zh/community/china; gitee.com/jakarta-ee-china; space.bilibili.com/518946941; reply "Join group" to get QR code

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.