Build and Integrate a Local LLM with Spring Boot, LangChain4j, and Ollama
This guide walks through installing Ollama on Windows, downloading a Qwen2.5‑7B model, configuring Spring Boot with LangChain4j dependencies, setting up application.yml, defining AI service interfaces, adding conversation memory, creating REST and streaming controllers, and testing the end‑to‑end local LLM workflow.
Scenario
Integrate LangChain4j in a Spring Boot project to call the Alibaba Baichuan platform for AI conversation memory and isolation, and to run a local large language model (LLM) for chat.
1. Ollama Basics
1.1 What is Ollama?
Ollama is an open‑source platform that runs LLMs locally, handling model download, inference service, and API exposure.
1.2 Core Features
One‑click model run: ollama run <model> Model management: pull, list, cp, rm
Standard OpenAI‑compatible HTTP API (default port 11434)
Cross‑platform support: Windows, macOS, Linux
Hardware acceleration for NVIDIA (CUDA) and AMD (ROCm) GPUs
1.3 Common Commands
ollama serve– start the Ollama service ollama pull <model> – download a model ollama list – list downloaded models ollama run <model> – start an interactive chat ollama cp <src> <dst> – create an alias for a model
2. Installing Ollama on Windows
System requirements: Windows 10/11 64‑bit, ≥16 GB RAM (for 7B models), ≥10 GB disk space.
Download OllamaSetup.exe from https://ollama.com/, run the installer (default to C:\Program Files\Ollama), and verify with ollama --version.
Optional: set OLLAMA_MODELS environment variable to change the model storage directory.
Download a model, e.g., ollama pull qwen2:7b. Chinese users may use a mirror such as ollama pull modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF.
Run the model: ollama run modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF and test by typing a question.
3. Spring Boot Integration
3.1 Maven Dependencies (pom.xml)
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.5</version>
</parent>
<groupId>com.example</groupId>
<artifactId>spring-langchain4j-ollama</artifactId>
<version>1.0</version>
<properties>
<java.version>17</java.version>
<langchain4j.version>1.0.0-beta3</langchain4j.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-bom</artifactId>
<version>${langchain4j.version}</version>
</dependency>
</dependencies>
</dependencyManagement>Note: add langchain4j-reactor to avoid IllegalConfigurationException when returning Flux<String>.
3.2 application.yml Configuration
langchain4j:
ollama:
chat-model:
base-url: http://localhost:11434
model-name: modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF:latest
temperature: 0.7
timeout: PT120S # total timeout 120 seconds
connect-timeout: PT10S # connection timeout 10 seconds
read-timeout: PT120S # read timeout 120 seconds
log-requests: true
log-responses: true
streaming-chat-model:
base-url: http://localhost:11434
model-name: modelscope.cn/Qwen/Qwen2.5-7B-Instruct-GGUF:latest
temperature: 0.7
timeout: PT120S
connect-timeout: PT10S
read-timeout: PT120SParameters explained: timeout is the overall request timeout (ISO‑8601, e.g., PT120S = 120 s); read-timeout is crucial for slow models; log-requests/responses help debugging.
4. Writing the AI Service Code
4.1 Define the AI Service Interface
import dev.langchain4j.service.AiService;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
@AiService
public interface Assistant {
@SystemMessage("你是一位知识渊博的AI助手,请用中文友好地回答用户的问题。")
String chat(@MemoryId Long memoryId, @UserMessage String userMessage);
} @AiService– generates an implementation and registers it as a Spring bean. @MemoryId – marks the conversation identifier; the framework manages separate memory per ID. @SystemMessage – sets the system prompt. @UserMessage – marks the user‑message parameter (optional).
4.2 Configure Conversation Memory (optional for multi‑turn dialogue)
import dev.langchain4j.memory.chat.ChatMemoryProvider;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class AiConfig {
@Bean
public ChatMemoryProvider chatMemoryProvider() {
// Allocate a separate ChatMemory for each memoryId, keeping the latest 10 messages
return memoryId -> MessageWindowChatMemory.withMaxMessages(10);
}
}If no memory provider is configured, each call is stateless. The memory can be persisted with Redis by implementing ChatMemoryStore.
4.3 Create a REST Controller for Normal Chat
import com.badao.ai.config.Assistant;
import org.springframework.web.bind.annotation.*;
@RestController
public class ChatController {
private final Assistant assistant;
public ChatController(Assistant assistant) { this.assistant = assistant; }
@GetMapping("/chat")
public String chat(@RequestParam("message") String message,
@RequestHeader("X-User-Id") String userId) {
// Pass userId as memoryId
return assistant.chat(userId, message);
}
}4.4 Streaming Output Interface (requires langchain4j-reactor )
import dev.langchain4j.service.AiService;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import reactor.core.publisher.Flux;
@AiService
public interface StreamingAssistant {
@SystemMessage("你是一个友好、乐于助人的智能AI助手。请用中文回答。")
Flux<String> chat(@MemoryId String userId, @UserMessage String message);
} import com.badao.ai.config.StreamingAssistant;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;
@RestController
public class StreamingChatController {
private final StreamingAssistant streamingAssistant;
public StreamingChatController(StreamingAssistant streamingAssistant) { this.streamingAssistant = streamingAssistant; }
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam("message") String message,
@RequestHeader("X-User-Id") String userId) {
return streamingAssistant.chat(userId, message);
}
}5. Testing and Verification
5.1 Start Services
Ensure Ollama is running: ollama serve Confirm the model is downloaded: ollama list Run the Spring Boot application.
5.2 Test Normal Endpoint
curl -H "X-User-Id: 1" "http://localhost:885/ai/chat?message=我叫张三"
curl -H "X-User-Id: 1" "http://localhost:885/ai/chat?message=我叫什么名字?"5.3 Test Streaming Endpoint
6. Common Issues and Solutions
model 'xxx' not found – Verify the model name with ollama list and correct the configuration.
SocketTimeoutException: Read timed out – Increase read-timeout beyond 120 s for slow models.
IllegalConfigurationException: Please import langchain4j-reactor – Add the langchain4j-reactor dependency.
NoClassDefFoundError: ClientHttpRequestFactorySettings – Use LangChain4j version 1.0.0‑beta3 consistently with Spring Boot 3.2.x.
JDK not set in IDE – Configure JDK 17 in project structure.
Slow model download – Use a mirror (e.g., Modao community) or configure an Ollama mirror source.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
The Dominant Programmer
Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
