Artificial Intelligence 24 min read

Getting Started with LangChain4j: Building Java AI Agents with a Mature LLM Framework

LangChain4j fills the long‑standing gap for Java developers by offering a Java‑native, enterprise‑grade LLM framework that abstracts model calls, prompts, memory, tools, RAG, streaming and structured output, enabling quick setup, clean AI Service definitions, and seamless integration into Spring Boot or Quarkus applications.

java1234

Apr 22, 2026

Getting Started with LangChain4j: Building Java AI Agents with a Mature LLM Framework

What LangChain4j Solves

After 2023, most backend systems face the question of whether to integrate a large language model (LLM). Implementing LLM calls manually involves writing HTTP requests, handling prompts, token limits, streaming responses, managing conversation history, building RAG pipelines, and designing function‑calling schemas. LangChain4j abstracts these repetitive steps into a unified, object‑oriented API that lets developers focus on business logic.

Relation to Python LangChain

Inspired by the Python LangChain, LangChain4j is not a line‑by‑line translation. It embraces Java conventions such as strong typing, annotation‑driven configuration, dependency injection, and explicit interfaces. The framework provides annotations like @AiService, @SystemMessage, @UserMessage, and @Tool to declare AI capabilities in a Java‑idiomatic way.

Suitable Projects

Enterprise internal Q&A / knowledge bases (RAG)

Customer‑service or marketing chatbots with multi‑turn memory

AI assistants for business tasks (e.g., auto‑generating reports, extracting contract fields, writing SQL)

Agent / Copilot style applications where the model calls back‑end tools

If your system already runs on Spring Boot or Quarkus, LangChain4j is the most convenient choice.

Quick Start

Environment

JDK 17+ (21 recommended)

Maven 3.8+ or Gradle 7+

An LLM endpoint (OpenAI API key, Azure OpenAI, Ollama, etc.)

Dependencies

<properties>
  <langchain4j.version>0.35.0</langchain4j.version>
</properties>

<dependencies>
  <!-- Core abstraction -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>
  <!-- OpenAI provider -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>${langchain4j.version}</version>
  </dependency>
</dependencies>

First Program (5 lines)

import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.chat.ChatLanguageModel;

public class HelloLangChain4j {
    public static void main(String[] args) {
        ChatLanguageModel model = OpenAiChatModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("gpt-4o-mini")
            .build();
        String answer = model.generate("Explain LangChain4j in one sentence.");
        System.out.println(answer);
    }
}

The key point is that ChatLanguageModel is an interface; swapping OpenAI for Ollama or another provider only requires changing the builder.

Declarative AI Services

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.AiServices;

interface Translator {
    @SystemMessage("You are a senior Chinese‑English translator. Produce natural, accurate translations.")
    String translate(@UserMessage String text);
}

public class TranslatorDemo {
    public static void main(String[] args) {
        ChatLanguageModel model = OpenAiChatModel.withApiKey(System.getenv("OPENAI_API_KEY"));
        Translator translator = AiServices.create(Translator.class, model);
        System.out.println(translator.translate("LangChain4j makes building LLM apps feel native to Java."));
    }
}

No HTTP request or JSON handling is written; the framework generates a dynamic proxy that turns the method call into an LLM request.

ChatMemory

ChatMemory memory = MessageWindowChatMemory.withMaxMessages(20);

interface Assistant {
    String chat(@MemoryId String userId, @UserMessage String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(model)
    .chatMemoryProvider(uid -> MessageWindowChatMemory.withMaxMessages(20))
    .build();

Memory can be a sliding window ( MessageWindowChatMemory) or token‑budgeted ( TokenWindowChatMemory) and can be persisted via ChatMemoryStore (Redis, MySQL, etc.).

Tools / Function Calling

class OrderTools {
    @Tool("Get order status by order ID")
    public String getOrderStatus(@P("orderId") String orderId) {
        return orderRepository.findById(orderId)
            .map(o -> "Order " + orderId + " status: " + o.getStatus())
            .orElse("Order not found");
    }

    @Tool("Get latest order by phone number")
    public String getLatestOrder(@P("phone") String phone) {
        return orderRepository.findLatestByPhone(phone).toString();
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(model)
    .tools(new OrderTools())
    .build();

The framework automatically generates JSON schemas, sends function‑call requests, invokes the Java methods via reflection, and feeds the results back to the model.

RAG (Retrieval‑Augmented Generation)

EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
Document document = FileSystemDocumentLoader.loadDocument(Path.of("docs/company-handbook.pdf"), new ApachePdfBoxDocumentParser());
DocumentSplitter splitter = DocumentSplitters.recursive(500, 50);
List<TextSegment> segments = splitter.split(document);
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
store.addAll(embeddings, segments);

ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingModel(embeddingModel)
    .embeddingStore(store)
    .maxResults(5)
    .minScore(0.6)
    .build();

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(model)
    .contentRetriever(retriever)
    .build();

String answer = assistant.chat("What is the company’s vacation policy?");

The indexing pipeline (load → split → embed → store) runs offline, while the retrieval‑generation pipeline runs online, sharing the same embedding model and vector store.

Streaming Output

StreamingChatLanguageModel streamingModel = OpenAiStreamingChatModel.builder()
    .apiKey(apiKey)
    .modelName("gpt-4o-mini")
    .build();

streamingModel.generate("Tell a 100‑word programmer joke", new StreamingResponseHandler<AiMessage>() {
    public void onNext(String token) { System.out.print(token); }
    public void onComplete(Response<AiMessage> resp) { System.out.println("
[END]"); }
    public void onError(Throwable error) { error.printStackTrace(); }
});

In web applications, combine with Spring WebFlux / SSE to push tokens to the front‑end.

Structured Output

record Resume(String name, int age, List<String> skills, String summary) {}

interface ResumeParser {
    @UserMessage("Extract structured info from the following resume text:
{{it}}")
    Resume parse(String resumeText);
}

Resume r = AiServices.create(ResumeParser.class, model)
    .parse("Li Lei, 29, Java/Spring/Kafka, worked on …");

The framework adds a JSON schema to the prompt, receives a JSON response, and deserializes it into the POJO, with a fallback self‑repair step on failure.

Overall Architecture

LangChain4j follows a clean, layered design:

Application layer : business‑level services (chatbots, agents, RAG assistants).

Advanced API layer (AI Services) : declarative interfaces that hide prompt engineering.

Core abstraction layer : ChatLanguageModel, EmbeddingModel, ChatMemory, ContentRetriever, ToolExecutor, OutputParser.

Integration layer : adapters for LLM providers, vector stores, and tool implementations.

Infrastructure layer : HTTP client, JSON handling, retry, observability (Micrometer, OpenTelemetry), and Spring/Quarkus starters.

This separation ensures each layer depends only on the abstraction of the layer below, making it trivial to swap OpenAI for Ollama, Pinecone for PgVector, or any other implementation without changing business code.

RAG Lifecycle Diagram

Complete Knowledge‑Base Chatbot Example

@Configuration
public class AiConfig {
    @Bean
    public EmbeddingModel embeddingModel() { return new AllMiniLmL6V2EmbeddingModel(); }
    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() { return new InMemoryEmbeddingStore<>(); }
    @Bean
    public ContentRetriever contentRetriever(EmbeddingModel em, EmbeddingStore<TextSegment> store) {
        return EmbeddingStoreContentRetriever.builder()
            .embeddingModel(em)
            .embeddingStore(store)
            .maxResults(6)
            .minScore(0.55)
            .build();
    }
    @Bean
    public KnowledgeAssistant knowledgeAssistant(ChatLanguageModel chatModel, ContentRetriever retriever) {
        return AiServices.builder(KnowledgeAssistant.class)
            .chatLanguageModel(chatModel)
            .contentRetriever(retriever)
            .chatMemoryProvider(uid -> MessageWindowChatMemory.withMaxMessages(30))
            .build();
    }
}

public interface KnowledgeAssistant {
    @SystemMessage("""
        You are an internal knowledge‑base assistant. Follow these rules:
        1. Use only provided documents; do not fabricate.
        2. If insufficient data, reply "No relevant information."
        3. Cite source documents at the end.
        """)
    String chat(@MemoryId String userId, @UserMessage String question);
}

@Component
public class KnowledgeIndexer {
    private final EmbeddingModel embeddingModel;
    private final EmbeddingStore<TextSegment> store;
    public KnowledgeIndexer(EmbeddingModel em, EmbeddingStore<TextSegment> store) { this.embeddingModel = em; this.store = store; }
    @PostConstruct
    public void indexAtStartup() throws IOException {
        DocumentSplitter splitter = DocumentSplitters.recursive(500, 80);
        try (Stream<Path> files = Files.walk(Path.of("./knowledge"))) {
            files.filter(p -> p.toString().endsWith(".pdf"))
                 .forEach(p -> {
                     Document doc = FileSystemDocumentLoader.loadDocument(p, new ApachePdfBoxDocumentParser());
                     List<TextSegment> segs = splitter.split(doc);
                     List<Embedding> embs = embeddingModel.embedAll(segs).content();
                     store.addAll(embs, segs);
                 });
        }
    }
}

@RestController
@RequestMapping("/chat")
public class ChatController {
    private final KnowledgeAssistant assistant;
    public ChatController(KnowledgeAssistant assistant) { this.assistant = assistant; }
    @PostMapping
    public Map<String, String> chat(@RequestHeader("X-User-Id") String userId, @RequestBody Map<String, String> body) {
        String answer = assistant.chat(userId, body.get("q"));
        return Map.of("answer", answer);
    }
}

This minimal yet production‑ready service indexes PDFs at startup, provides per‑user memory, supports RAG, and exposes a REST endpoint. Swapping the in‑memory store for Milvus or PgVector and persisting ChatMemoryStore to Redis completes the enterprise setup.

Engineering Best Practices

Separate prompts from code; manage them with PromptTemplate or external configuration.

Make LLM calls idempotent and add retry with exponential back‑off.

Instrument calls with Micrometer (token count, latency, error rate) and optionally OpenTelemetry for end‑to‑end tracing.

Enforce content moderation via SystemMessage and pre‑filter user inputs.

Configure model selection per use‑case (small model for classification, large model for complex QA).

Focus on chunking strategy for RAG; segment size and overlap impact retrieval quality more than the vector store choice.

Protect proprietary data by using private LLM endpoints (Azure OpenAI, self‑hosted Ollama, etc.) when handling confidential documents.

Conclusion

LangChain4j’s strength lies not in novel algorithms but in providing Java developers with a set of well‑designed abstractions— ChatLanguageModel for model‑agnostic calls, declarative AI Services for clean prompt handling, built‑in memory, tool integration, RAG pipelines, streaming, and structured output—all wired into Spring Boot / Quarkus starters for near‑zero integration friction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java LLM RAG Spring Boot LangChain4j AI services ChatMemory

Written by

java1234

Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.