Artificial Intelligence 20 min read

How to Infuse AI into Your Java Applications with LangChain4j and Quarkus

This article shows Java developers how to add generative AI to enterprise applications using the open‑source LangChain4j library and Quarkus, covering a chatbot example, streaming responses, chat memory management, and structured output without learning a new programming language.

JakartaEE China Community

Mar 3, 2026

How to Infuse AI into Your Java Applications with LangChain4j and Quarkus

Artificial intelligence (AI) is becoming ubiquitous, and Java developers can start building AI‑enabled applications without learning another language. By using the open‑source LangChain4j library, developers can manage interactions with large language models (LLMs), store and retrieve chat memory, and keep requests efficient and low‑cost.

Building a simple chatbot with LangChain4j and Quarkus

The first version of the spaceship‑rental chatbot consists of two Java files: CustomerSupportAgent.java – an AI service that builds a prompt, supplies planet information, and sends the request to the LLM. ChatWebSocket.java – a WebSocket endpoint that receives user messages from the UI.

The AI service is declared as an interface and annotated with Quarkus‑specific annotations:

@SessionScoped
@RegisterAiService
public interface CustomerSupportAgent {
    @SystemMessage("""
        You are a friendly, but terse customer service agent for Rocket's
        Cosmic Cruisers, a spaceship rental shop.
        You answer questions from potential guests about the different planets
        they can visit.
        If asked about the planets, only use info from the fact sheet below.
        """ + PlanetInfo.PLANET_FACT_SHEET)
    String chat(String userMessage);
}

@SessionScoped

keeps a session alive for the duration of a WebSocket connection, preserving chat memory. @RegisterAiService tells LangChain4j to generate an implementation of the interface, and @SystemMessage provides the system‑level prompt that guides the LLM.

When a user types a message, the WebSocket endpoint forwards it to CustomerSupportAgent.chat(). The interface automatically creates a user message from the method argument, combines it with the system message, and sends the prompt to the LLM. The LLM’s response is then returned to the UI.

Creating the illusion of memory

LLMs are stateless, so LangChain4j stores previous user and bot messages as chat memory. The Quarkus LangChain4j extension keeps this memory in memory by default, discarding or summarising old entries when needed, which gives the chatbot a sense of continuity without blowing up token usage.

Streaming responses for a more responsive UI

To avoid waiting for the full response, the CustomerSupportAgent interface is extended with a streaming method that returns a Multi<String> of tokens:

@SessionScoped
@RegisterAiService
@SystemMessage("""
    You are a friendly, but terse customer service agent for Rocket's
    Cosmic Cruisers, a spaceship rental shop.
    You answer questions from potential guests about the different planets
    they can visit.
    If asked about the planets, only use info from the fact sheet below.
    """ + PlanetInfo.PLANET_FACT_SHEET)
public interface CustomerSupportAgent {
    String chat(String userMessage);
    Multi<String> streamChat(String userMessage);
}

The @SystemMessage annotation is moved to the interface level, so it applies to both methods. A new WebSocket endpoint ChatWebSocketStream at path /chat/stream calls streamChat(), allowing the UI to display each token as it arrives.

Returning structured output

Beyond plain text, the AI service can return POJOs. A record Spaceship models fleet data, and a record SpaceshipQuery captures user requirements:

record Spaceship(String name, int maxPassengers, boolean hasCargoBay, List<String> allowedDestinations) {}

@Description("A request for a compatible spaceship")
public record SpaceshipQuery(int passengers, boolean hasCargo, List<String> destinations) {}

The service adds methods to extract a SpaceshipQuery from the user message, determine whether the message concerns spaceships, and suggest compatible ships:

@SystemMessage("""
    You are a friendly, but terse customer service agent for Rocket's
    Cosmic Cruisers, a spaceship rental shop.
    Respond with 'true' if the user message is regarding spaceships in our
    rental fleet, and 'false' otherwise.
    """)
boolean isSpaceshipQuery(String userMessage);

SpaceshipQuery extractSpaceshipAttributes(String userMessage);

String suggestSpaceships(String message, List<Spaceship> compatibleSpaceships);
Multi<String> streamSuggestSpaceships(String message, List<Spaceship> compatibleSpaceships);

The WebSocket handler now checks isSpaceshipQuery. If true, it extracts a SpaceshipQuery, finds matching ships from a Fleet class, and returns either a plain or streaming suggestion. Otherwise it falls back to the generic chat() method.

@OnTextMessage
public String onTextMessage(String message) {
    boolean isSpaceshipQuery = customerSupportAgent.isSpaceshipQuery(message);
    if (isSpaceshipQuery) {
        SpaceshipQuery userQuery = customerSupportAgent.extractSpaceshipAttributes(message);
        List<Spaceship> spaceships = Fleet.findCompatibleSpaceships(userQuery);
        return customerSupportAgent.suggestSpaceships(message, spaceships);
    } else {
        return customerSupportAgent.chat(message);
    }
}

LLM fundamentals: prompts, memory, and tokens

Prompt engineering is crucial: the system message sets context, while the user message carries the actual request. Including too much history inflates token count, raising cost and hitting length limits. LangChain4j’s memory provider helps balance context and token usage.

Tokens are the billing unit for most hosted LLMs; a token can be a whole word or part of a word. Managing how many tokens are sent (including memory) directly impacts runtime cost.

LangChain4j and Quarkus overview

LangChain4j is an open‑source Java framework that abstracts LLM interaction, handling prompt construction, chat memory, and JSON‑schema‑based structured output. Quarkus is a cloud‑native, container‑optimized Java framework that offers dev mode, live reload, low startup time, and low memory footprint. Their integration simplifies AI‑enabled Java services and can be used with other Java stacks such as Spring Boot or Micronaut.

Conclusion

Injecting AI into Java applications enriches functionality and improves user experience. By leveraging Quarkus and LangChain4j, Java developers can interact with LLMs efficiently, use streaming for responsive chats, and obtain structured data that integrates cleanly with existing Java code, all while benefiting from Java’s performance, security, and observability features.

Java LLM Streaming Quarkus Chatbot Langchain4j structured output

Written by

JakartaEE China Community

JakartaEE China Community, official website: jakarta.ee/zh/community/china; gitee.com/jakarta-ee-china; space.bilibili.com/518946941; reply "Join group" to get QR code

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.