Build and Integrate a Local LLM with Spring Boot, LangChain4j, and Ollama

This guide walks through installing Ollama on Windows, downloading a Qwen2.5‑7B model, configuring Spring Boot with LangChain4j dependencies, setting up application.yml, defining AI service interfaces, adding conversation memory, creating REST and streaming controllers, and testing the end‑to‑end local LLM workflow.

The Dominant Programmer
The Dominant Programmer
The Dominant Programmer
Build and Integrate a Local LLM with Spring Boot, LangChain4j, and Ollama

Scenario

Integrate LangChain4j in a Spring Boot project to call the Alibaba Baichuan platform for AI conversation memory and isolation, and to run a local large language model (LLM) for chat.

1. Ollama Basics

1.1 What is Ollama?

Ollama is an open‑source platform that runs LLMs locally, handling model download, inference service, and API exposure.

1.2 Core Features

One‑click model run: ollama run <model> Model management: pull, list, cp, rm

Standard OpenAI‑compatible HTTP API (default port 11434)

Cross‑platform support: Windows, macOS, Linux

Hardware acceleration for NVIDIA (CUDA) and AMD (ROCm) GPUs

1.3 Common Commands

ollama serve

– start the Ollama service ollama pull <model> – download a model ollama list – list downloaded models ollama run <model> – start an interactive chat ollama cp <src> <dst> – create an alias for a model

2. Installing Ollama on Windows

System requirements: Windows 10/11 64‑bit, ≥16 GB RAM (for 7B models), ≥10 GB disk space.

Download OllamaSetup.exe from https://ollama.com/, run the installer (default to C:\Program Files\Ollama), and verify with ollama --version.

Optional: set OLLAMA_MODELS environment variable to change the model storage directory.

Download a model, e.g., ollama pull qwen2:7b. Chinese users may use a mirror such as ollama pull modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF.

Run the model: ollama run modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF and test by typing a question.

3. Spring Boot Integration

3.1 Maven Dependencies (pom.xml)

<parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>3.2.5</version>
</parent>
<groupId>com.example</groupId>
<artifactId>spring-langchain4j-ollama</artifactId>
<version>1.0</version>
<properties>
  <java.version>17</java.version>
  <langchain4j.version>1.0.0-beta3</langchain4j.version>
</properties>
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-bom</artifactId>
      <version>${langchain4j.version}</version>
    </dependency>
  </dependencies>
</dependencyManagement>

Note: add langchain4j-reactor to avoid IllegalConfigurationException when returning Flux<String>.

3.2 application.yml Configuration

langchain4j:
  ollama:
    chat-model:
      base-url: http://localhost:11434
      model-name: modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF:latest
      temperature: 0.7
      timeout: PT120S          # total timeout 120 seconds
      connect-timeout: PT10S   # connection timeout 10 seconds
      read-timeout: PT120S     # read timeout 120 seconds
      log-requests: true
      log-responses: true
    streaming-chat-model:
      base-url: http://localhost:11434
      model-name: modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF:latest
      temperature: 0.7
      timeout: PT120S
      connect-timeout: PT10S
      read-timeout: PT120S

Parameters explained: timeout is the overall request timeout (ISO‑8601, e.g., PT120S = 120 s); read-timeout is crucial for slow models; log-requests/responses help debugging.

4. Writing the AI Service Code

4.1 Define the AI Service Interface

import dev.langchain4j.service.AiService;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;

@AiService
public interface Assistant {
    @SystemMessage("你是一位知识渊博的AI助手,请用中文友好地回答用户的问题。")
    String chat(@MemoryId Long memoryId, @UserMessage String userMessage);
}
@AiService

– generates an implementation and registers it as a Spring bean. @MemoryId – marks the conversation identifier; the framework manages separate memory per ID. @SystemMessage – sets the system prompt. @UserMessage – marks the user‑message parameter (optional).

4.2 Configure Conversation Memory (optional for multi‑turn dialogue)

import dev.langchain4j.memory.chat.ChatMemoryProvider;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class AiConfig {
    @Bean
    public ChatMemoryProvider chatMemoryProvider() {
        // Allocate a separate ChatMemory for each memoryId, keeping the latest 10 messages
        return memoryId -> MessageWindowChatMemory.withMaxMessages(10);
    }
}

If no memory provider is configured, each call is stateless. The memory can be persisted with Redis by implementing ChatMemoryStore.

4.3 Create a REST Controller for Normal Chat

import com.badao.ai.config.Assistant;
import org.springframework.web.bind.annotation.*;

@RestController
public class ChatController {
    private final Assistant assistant;
    public ChatController(Assistant assistant) { this.assistant = assistant; }

    @GetMapping("/chat")
    public String chat(@RequestParam("message") String message,
                       @RequestHeader("X-User-Id") String userId) {
        // Pass userId as memoryId
        return assistant.chat(userId, message);
    }
}

4.4 Streaming Output Interface (requires langchain4j-reactor )

import dev.langchain4j.service.AiService;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import reactor.core.publisher.Flux;

@AiService
public interface StreamingAssistant {
    @SystemMessage("你是一个友好、乐于助人的智能AI助手。请用中文回答。")
    Flux<String> chat(@MemoryId String userId, @UserMessage String message);
}
import com.badao.ai.config.StreamingAssistant;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;

@RestController
public class StreamingChatController {
    private final StreamingAssistant streamingAssistant;
    public StreamingChatController(StreamingAssistant streamingAssistant) { this.streamingAssistant = streamingAssistant; }

    @GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam("message") String message,
                                   @RequestHeader("X-User-Id") String userId) {
        return streamingAssistant.chat(userId, message);
    }
}

5. Testing and Verification

5.1 Start Services

Ensure Ollama is running: ollama serve Confirm the model is downloaded: ollama list Run the Spring Boot application.

5.2 Test Normal Endpoint

curl -H "X-User-Id: 1" "http://localhost:885/ai/chat?message=我叫张三"
curl -H "X-User-Id: 1" "http://localhost:885/ai/chat?message=我叫什么名字?"

5.3 Test Streaming Endpoint

6. Common Issues and Solutions

model 'xxx' not found – Verify the model name with ollama list and correct the configuration.

SocketTimeoutException: Read timed out – Increase read-timeout beyond 120 s for slow models.

IllegalConfigurationException: Please import langchain4j-reactor – Add the langchain4j-reactor dependency.

NoClassDefFoundError: ClientHttpRequestFactorySettings – Use LangChain4j version 1.0.0‑beta3 consistently with Spring Boot 3.2.x.

JDK not set in IDE – Configure JDK 17 in project structure.

Slow model download – Use a mirror (e.g., Modao community) or configure an Ollama mirror source.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMspring-bootChatbotOllamaLangChain4j
The Dominant Programmer
Written by

The Dominant Programmer

Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.