How Spring AI Simplifies LLM Integration with Ollama: A Hands‑On Guide
This article explains Spring AI's architecture, including strategy, template, advisor, function‑calling, and RAG patterns, and provides a step‑by‑step tutorial for building a zero‑cost, privacy‑preserving AI assistant with the local Ollama model in a Spring Boot project.
Introduction
Spring AI brings a vendor‑agnostic programming model to the Java ecosystem, allowing developers to integrate large language models (LLMs) without dealing with each provider’s API differences.
Core Architecture Patterns
1. Strategy Pattern – Vendor‑agnostic Model Interfaces
Core interfaces : ChatModel, EmbeddingModel, ImageModel Concrete strategies : OpenAiChatModel, OllamaChatModel, AzureAiChatModel, etc.
Switching providers : Change only the configuration keys (e.g., replace spring.ai.openai.api-key with spring.ai.ollama.base-url) – no Java code changes are required.
2. Template Method & Fluent API – ChatClient
ChatClientwraps low‑level ChatModel calls, handling prompt construction, system messages and streaming responses.
String content = chatClient
.prompt("你好,你是什么大模型")
.system("作为大模型技术架构师,回答要体现技术架构思想")
.call()
.content();3. Interceptor Chain – Advisors
Purpose : Intercept requests before they reach the model and responses after they return.
Typical advisors : MessageChatMemoryAdvisor – automatic conversation history. PromptChatMemoryAdvisor – context‑window optimisation.
Custom advisors – e.g., sensitive‑word filtering, token‑usage monitoring.
Composition : Advisors are attached to a ChatClient instance like plug‑ins.
4. Converter Pattern – Function Calling
Java → LLM : Method signatures annotated with @Tool are converted into JSON‑Schema tool definitions automatically.
LLM → Java : When the model invokes a tool, Spring AI deserialises the arguments and calls the corresponding Java method, returning the result to the model.
5. Resource Abstraction – Retrieval‑Augmented Generation (RAG)
Loading : Unified Resource abstraction reads PDFs, URLs, Markdown, etc.
Splitting : DocumentSplitter breaks long texts into manageable chunks.
Storage : VectorStore interface connects to vector databases such as Pinecone, Milvus or PGVector.
Hands‑On Tutorial: Building a Local AI Assistant with Ollama
Environment preparation
Install Ollama (macOS, Linux or Windows). By default it runs at http://localhost:11434.
Project setup
Create a Spring Boot project with group com.demo and artifact demo1. Add the following Maven dependencies:
spring-boot-starter-webflux
spring-boot-starter-webmvc
spring-ai-starter-model-ollama
lombok (optional)
spring-boot-starter-webflux-test (test scope)
spring-boot-starter-webmvc-test (test scope)
pom.xml snippet
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>4.0.3</version>
</parent>
<groupId>com.demo</groupId>
<artifactId>demo1</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<java.version>25</java.version>
<spring-ai.version>2.0.0-M2</spring-ai.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webmvc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webmvc-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
</project>application.properties
spring.application.name=demo1
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=qwen3:4bCore code implementation
Define a configuration class that creates a ChatClient bean and starts a streaming conversation in a background thread.
@Configuration
public class ChatConfig implements SmartLifecycle {
private ChatClient chatClient;
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
this.chatClient = builder.build();
return this.chatClient;
}
@Override
public void start() {
new Thread(() -> {
System.out.println("==================");
System.out.println("开始聊天: 你好,你是什么大模型");
Flux<String> flux = chatClient
.prompt("你好,你是什么大模型")
.system("作为大模型技术架构师,回答要体现技术架构思想")
.stream()
.content();
flux.subscribe(System.out::print);
System.out.println("
==================");
}).start();
}
@Override public void stop() {}
@Override public boolean isRunning() { return false; }
}Run and test
Execute the Spring Boot main class (e.g., Demo1Application.java).
The console prints the streaming response generated by the local Ollama model.
Architectural Benefits Demonstrated
Zero‑intrusion model switching : Replace the Ollama starter with spring-ai-openai-spring-boot-starter and update application.properties with the OpenAI API key/URL – Java code remains unchanged.
Easy extensibility via function calling : Adding a method annotated with @Tool automatically exposes it to the LLM without manual prompt engineering.
Enterprise‑grade cross‑cutting concerns : Advisors can be plugged in for logging, token‑limit enforcement, or custom preprocessing, preparing prototypes for production.
Conclusion
Spring AI abstracts LLM integration behind familiar Spring patterns (strategy, template method, interceptor chain, converter, resource abstraction). Developers can build low‑cost, privacy‑preserving AI applications locally with Ollama and later migrate to cloud providers by changing only configuration and Maven dependencies, embodying a “write once, run everywhere” approach for AI‑enabled Java services.
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
