Backend Development 14 min read

How Spring AI Simplifies LLM Integration with Ollama: A Hands‑On Guide

This article explains Spring AI's architecture, including strategy, template, advisor, function‑calling, and RAG patterns, and provides a step‑by‑step tutorial for building a zero‑cost, privacy‑preserving AI assistant with the local Ollama model in a Spring Boot project.

Cognitive Technology Team

Mar 3, 2026

How Spring AI Simplifies LLM Integration with Ollama: A Hands‑On Guide

Introduction

Spring AI brings a vendor‑agnostic programming model to the Java ecosystem, allowing developers to integrate large language models (LLMs) without dealing with each provider’s API differences.

Core Architecture Patterns

1. Strategy Pattern – Vendor‑agnostic Model Interfaces

Core interfaces : ChatModel, EmbeddingModel, ImageModel Concrete strategies : OpenAiChatModel, OllamaChatModel, AzureAiChatModel, etc.

Switching providers : Change only the configuration keys (e.g., replace spring.ai.openai.api-key with spring.ai.ollama.base-url) – no Java code changes are required.

2. Template Method & Fluent API – ChatClient

ChatClient

wraps low‑level ChatModel calls, handling prompt construction, system messages and streaming responses.

String content = chatClient
    .prompt("你好，你是什么大模型")
    .system("作为大模型技术架构师，回答要体现技术架构思想")
    .call()
    .content();

3. Interceptor Chain – Advisors

Purpose : Intercept requests before they reach the model and responses after they return.

Typical advisors : MessageChatMemoryAdvisor – automatic conversation history. PromptChatMemoryAdvisor – context‑window optimisation.

Custom advisors – e.g., sensitive‑word filtering, token‑usage monitoring.

Composition : Advisors are attached to a ChatClient instance like plug‑ins.

4. Converter Pattern – Function Calling

Java → LLM : Method signatures annotated with @Tool are converted into JSON‑Schema tool definitions automatically.

LLM → Java : When the model invokes a tool, Spring AI deserialises the arguments and calls the corresponding Java method, returning the result to the model.

5. Resource Abstraction – Retrieval‑Augmented Generation (RAG)

Loading : Unified Resource abstraction reads PDFs, URLs, Markdown, etc.

Splitting : DocumentSplitter breaks long texts into manageable chunks.

Storage : VectorStore interface connects to vector databases such as Pinecone, Milvus or PGVector.

Hands‑On Tutorial: Building a Local AI Assistant with Ollama

Environment preparation

Install Ollama (macOS, Linux or Windows). By default it runs at http://localhost:11434.

Project setup

Create a Spring Boot project with group com.demo and artifact demo1. Add the following Maven dependencies:

spring-boot-starter-webflux

spring-boot-starter-webmvc

spring-ai-starter-model-ollama

lombok (optional)

spring-boot-starter-webflux-test (test scope)

spring-boot-starter-webmvc-test (test scope)

pom.xml snippet

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>4.0.3</version>
  </parent>
  <groupId>com.demo</groupId>
  <artifactId>demo1</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <properties>
    <java.version>25</java.version>
    <spring-ai.version>2.0.0-M2</spring-ai.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-webmvc</artifactId>
    </dependency>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-starter-model-ollama</artifactId>
    </dependency>
    <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <optional>true</optional>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-webflux-test</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-webmvc-test</artifactId>
      <scope>test</scope>
    </dependency>
  </dependencies>
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-bom</artifactId>
        <version>${spring-ai.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
</project>

application.properties

spring.application.name=demo1
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=qwen3:4b

Core code implementation

Define a configuration class that creates a ChatClient bean and starts a streaming conversation in a background thread.

@Configuration
public class ChatConfig implements SmartLifecycle {
    private ChatClient chatClient;

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        this.chatClient = builder.build();
        return this.chatClient;
    }

    @Override
    public void start() {
        new Thread(() -> {
            System.out.println("==================");
            System.out.println("开始聊天: 你好，你是什么大模型");
            Flux<String> flux = chatClient
                .prompt("你好，你是什么大模型")
                .system("作为大模型技术架构师，回答要体现技术架构思想")
                .stream()
                .content();
            flux.subscribe(System.out::print);
            System.out.println("
==================");
        }).start();
    }

    @Override public void stop() {}
    @Override public boolean isRunning() { return false; }
}

Run and test

Execute the Spring Boot main class (e.g., Demo1Application.java).

The console prints the streaming response generated by the local Ollama model.

Architectural Benefits Demonstrated

Zero‑intrusion model switching : Replace the Ollama starter with spring-ai-openai-spring-boot-starter and update application.properties with the OpenAI API key/URL – Java code remains unchanged.

Easy extensibility via function calling : Adding a method annotated with @Tool automatically exposes it to the LLM without manual prompt engineering.

Enterprise‑grade cross‑cutting concerns : Advisors can be plugged in for logging, token‑limit enforcement, or custom preprocessing, preparing prototypes for production.

Conclusion

Spring AI abstracts LLM integration behind familiar Spring patterns (strategy, template method, interceptor chain, converter, resource abstraction). Developers can build low‑cost, privacy‑preserving AI applications locally with Ollama and later migrate to cloud providers by changing only configuration and Maven dependencies, embodying a “write once, run everywhere” approach for AI‑enabled Java services.

Spring AI Java Backend ChatClient advisors rag pipeline

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.