From Scratch to Production: Java + Spring Boot RAG Pipeline for Enterprise GenAI
This article walks through building a production‑ready Retrieval‑Augmented Generation (RAG) system using Java, Spring Boot, LangChain4j, Chroma vector store, and Ollama LLM, covering architecture, key dependencies, configuration, document ingestion, retrieval APIs, scoring, and security considerations.
Why a RAG pipeline is needed
Large language models can generate inaccurate answers when they lack access to specific enterprise data, creating a knowledge gap. Retrieval‑Augmented Generation (RAG) feeds enterprise knowledge into the model to close this gap.
RAG workflow
The user query is vectorized; the query vector searches a vector database for the top‑K relevant document fragments; the fragments are concatenated into a prompt; the LLM generates the final answer.
System architecture
Web layer – Spring Boot
Orchestration layer – LangChain4j
Embedding model – HuggingFace sentence‑transformers/all‑MiniLM‑L6‑v2 Vector store – Chroma (local persistence)
LLM – Ollama (local Llama 3.2 3B model)
Key Maven dependencies
<groupId>com.icoderoad</groupId>
<artifactId>rag-embeddings-poc</artifactId>Essential starters:
langchain4j-spring-boot-starter – core entry point
langchain4j-hugging-face – embedding model
langchain4j-chroma – vector store
Application configuration
spring:
application:
name: rag-system
servlet:
multipart:
max-file-size: 50MB
server:
port: 8080
servlet:
context-path: /rag-system langchain4j:
embeddings:
hugging-face:
model-id: sentence-transformers/all-MiniLM-L6-v2
vector-store:
chroma:
base-url: http://localhost:8000
collection-name: tourist-knowledge
chat-model:
ollama:
base-url: http://localhost:11434
model-name: llama3.2:3b
chunk-size: 500
chunk-overlap: 50Running the vector store (Chroma)
docker run -d \
-p 8000:8000 \
-v /data/chroma:/chroma/chroma \
chromadb/chroma:0.5.4Verify with curl http://localhost:8000/api/v1/version.
Running the local LLM (Ollama)
export OLLAMA_HOST=127.0.0.1:11434
nohup ollama serve > /data/logs/ollama.log 2>&1 & ollama pull llama3.2:3bVerify with curl http://localhost:11434/api/tags.
Spring Boot bean configuration
package com.icoderoad.config;
@Configuration
public class RagConfig {
@Bean @Lazy
public EmbeddingStore<TextSegment> embeddingStore() { /* ... */ }
@Bean @Lazy
public EmbeddingModel embeddingModel() { /* ... */ }
@Bean @Lazy
public ChatModel chatModel() { /* ... */ }
}The @Lazy annotation prevents startup failures when external services are unavailable.
Document ingestion API
Controller package:
/src/main/java/com/icoderoad/controller POST /api/admin/rag/upload– upload PDF/TXT/JSON files (size limit, chunk‑level fault tolerance) DELETE /api/admin/rag/collection – delete a collection
Embedding embedding = embeddingModel.embed(segment.text()).content();
embeddingStore.add(embedding, segment);Retrieval API
GET /api/v1/retrieve/embedded-chunks GET /api/v1/retrieve/embedded-chunks-with-score Embedding queryEmbedding = embeddingModel.embed(question).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.minScore(minScoreThreshold)
.maxResults(fetchLimit)
.build();
List<EmbeddingMatch<TextSegment>> results = embeddingStore.search(request)
.sorted(Comparator.comparingDouble(EmbeddingMatch::score).reversed());Scoring model
public class RetrievedChunk {
private String text;
private double score; // similarity 0~1
private int rank; // order after sorting
private int textLength; // auxiliary metric
}Practical guidance: a minScore of at least 0.6 yields meaningful results.
Full run steps
Build and start the application:
mvn clean install
nohup java -jar target/rag-embeddings-poc.jar > /data/logs/rag.log 2>&1 &Upload a document (e.g., PDF):
curl -F "file=@/data/docs/tourist.pdf" http://localhost:8080/rag-system/api/admin/rag/uploadQuery the knowledge base:
curl "http://localhost:8080/rag-system/api/v1/retrieve/embedded-chunks?question=South%20India%20temples"Security considerations
The upload and collection‑management endpoints should be protected with RBAC and OAuth2/JWT to avoid an open‑delete risk.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
LuTiao Programming
LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
