Artificial Intelligence 8 min read

From Scratch to Production: Java + Spring Boot RAG Pipeline for Enterprise GenAI

This article walks through building a production‑ready Retrieval‑Augmented Generation (RAG) system using Java, Spring Boot, LangChain4j, Chroma vector store, and Ollama LLM, covering architecture, key dependencies, configuration, document ingestion, retrieval APIs, scoring, and security considerations.

LuTiao Programming

Apr 12, 2026

From Scratch to Production: Java + Spring Boot RAG Pipeline for Enterprise GenAI

Why a RAG pipeline is needed

Large language models can generate inaccurate answers when they lack access to specific enterprise data, creating a knowledge gap. Retrieval‑Augmented Generation (RAG) feeds enterprise knowledge into the model to close this gap.

RAG workflow

The user query is vectorized; the query vector searches a vector database for the top‑K relevant document fragments; the fragments are concatenated into a prompt; the LLM generates the final answer.

System architecture

Web layer – Spring Boot

Orchestration layer – LangChain4j

Embedding model – HuggingFace sentence‑transformers/all‑MiniLM‑L6‑v2 Vector store – Chroma (local persistence)

LLM – Ollama (local Llama 3.2 3B model)

Key Maven dependencies

<groupId>com.icoderoad</groupId>
<artifactId>rag-embeddings-poc</artifactId>

Essential starters:

langchain4j-spring-boot-starter – core entry point

langchain4j-hugging-face – embedding model

langchain4j-chroma – vector store

Application configuration

spring:
  application:
    name: rag-system
  servlet:
    multipart:
      max-file-size: 50MB
server:
  port: 8080
  servlet:
    context-path: /rag-system

langchain4j:
  embeddings:
    hugging-face:
      model-id: sentence-transformers/all-MiniLM-L6-v2
  vector-store:
    chroma:
      base-url: http://localhost:8000
      collection-name: tourist-knowledge
  chat-model:
    ollama:
      base-url: http://localhost:11434
      model-name: llama3.2:3b
  chunk-size: 500
  chunk-overlap: 50

Running the vector store (Chroma)

docker run -d \
  -p 8000:8000 \
  -v /data/chroma:/chroma/chroma \
  chromadb/chroma:0.5.4

Verify with curl http://localhost:8000/api/v1/version.

Running the local LLM (Ollama)

export OLLAMA_HOST=127.0.0.1:11434
nohup ollama serve > /data/logs/ollama.log 2>&1 &

ollama pull llama3.2:3b

Verify with curl http://localhost:11434/api/tags.

Spring Boot bean configuration

package com.icoderoad.config;

@Configuration
public class RagConfig {
    @Bean @Lazy
    public EmbeddingStore<TextSegment> embeddingStore() { /* ... */ }
    @Bean @Lazy
    public EmbeddingModel embeddingModel() { /* ... */ }
    @Bean @Lazy
    public ChatModel chatModel() { /* ... */ }
}

The @Lazy annotation prevents startup failures when external services are unavailable.

Document ingestion API

Controller package:

/src/main/java/com/icoderoad/controller

POST /api/admin/rag/upload

– upload PDF/TXT/JSON files (size limit, chunk‑level fault tolerance) DELETE /api/admin/rag/collection – delete a collection

Embedding embedding = embeddingModel.embed(segment.text()).content();
embeddingStore.add(embedding, segment);

Retrieval API

GET /api/v1/retrieve/embedded-chunks

GET /api/v1/retrieve/embedded-chunks-with-score

Embedding queryEmbedding = embeddingModel.embed(question).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .minScore(minScoreThreshold)
    .maxResults(fetchLimit)
    .build();
List<EmbeddingMatch&lt;TextSegment&gt;> results = embeddingStore.search(request)
    .sorted(Comparator.comparingDouble(EmbeddingMatch::score).reversed());

Scoring model

public class RetrievedChunk {
    private String text;
    private double score; // similarity 0~1
    private int rank;    // order after sorting
    private int textLength; // auxiliary metric
}

Practical guidance: a minScore of at least 0.6 yields meaningful results.

Full run steps

Build and start the application:

mvn clean install
nohup java -jar target/rag-embeddings-poc.jar > /data/logs/rag.log 2>&1 &

Upload a document (e.g., PDF):

curl -F "file=@/data/docs/tourist.pdf" http://localhost:8080/rag-system/api/admin/rag/upload

Query the knowledge base:

curl "http://localhost:8080/rag-system/api/v1/retrieve/embedded-chunks?question=South%20India%20temples"

Security considerations

The upload and collection‑management endpoints should be protected with RBAC and OAuth2/JWT to avoid an open‑delete risk.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java RAG Spring Boot Ollama LangChain4j GenAI Chroma

Written by

LuTiao Programming

LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.