17 min read

Master Spring AI: From Hello World to Advanced RAG, Tool Calling, and Agent Development

This step‑by‑step guide shows Java developers how to set up Spring AI, configure various model providers, build basic and streaming chat APIs, enable multi‑turn memory, implement RAG with vector stores, add tool‑calling and multimodal capabilities, integrate MCP, and create sophisticated agents, while comparing ChatModel and ChatClient and outlining strengths, weaknesses, and ideal use cases.

Su San Talks Tech

Apr 20, 2026

Master Spring AI: From Hello World to Advanced RAG, Tool Calling, and Agent Development

Core Concepts

Spring AI is a full‑stack AI application framework for Java that applies Spring’s portability, modular design, and POJO programming model to the AI domain. It abstracts the APIs of major LLM providers (OpenAI, Alibaba DashScope, Ollama, etc.) behind a unified high‑level interface, so switching models only requires a configuration change.

Chat Completion : conversation, code generation, content creation (OpenAI, DashScope, DeepSeek, Claude)

Embedding : text vectorization, semantic search (same providers)

Multimodal : image/video/audio understanding (GPT‑4o, Qwen‑VL)

Function Calling : external API integration, tool invocation (supported by major models)

Vector Database : similarity search, Retrieval‑Augmented Generation (PGVector, Milvus, Redis, Chroma)

MCP : Model‑Context Protocol for standardized tool integration (native support)

Environment Setup

1. Add Spring AI BOM and Model Starter

In pom.xml add the Spring Milestones repository, import the BOM, and include the desired starter:

<repositories>
  <repository>
    <id>spring-milestones</id>
    <name>Spring Milestones</name>
    <url>https://repo.spring.io/milestone</url>
  </repository>
</repositories>

<properties>
  <spring-ai.version>1.1.2</spring-ai.version>
</properties>

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>${spring-ai.version}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
  </dependency>
</dependencies>

2. Configure API Keys

For OpenAI:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

For Alibaba DashScope (domestic use):

spring:
  ai:
    dashscope:
      api-key: ${AI_DASHSCOPE_API_KEY}

3. Run Ollama Locally (no API key)

# Install and start Ollama
ollama pull qwen2.5:latest
ollama serve

Add the Ollama starter dependency and configure the base URL:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: qwen2.5:latest

Basic Conversation – Hello World

Using ChatClient (recommended)

@RestController
@RequestMapping("/api/ai")
public class ChatController {
    private final ChatClient chatClient;
    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.defaultSystem("You are a professional Java expert").build();
    }
    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
                         .user(message)
                         .call()
                         .content();
    }
}

Streaming Output (typewriter effect)

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String message) {
    return chatClient.prompt()
                     .user(message)
                     .stream()
                     .content();
}

Choosing Between ChatModel and ChatClient

API Complexity : ChatModel requires manual prompt construction (high); ChatClient offers a fluent DSL (low).

Advisor Support : unavailable in ChatModel, built‑in for ChatClient.

Memory Management : manual in ChatModel, automatic session memory in ChatClient.

Streaming Output : extra wrapper needed for ChatModel; one‑line enable in ChatClient.

Error Handling : manual catch in ChatModel; unified exception handling in ChatClient.

Recommendation : ChatClient is the preferred choice for most scenarios; ChatModel only for special low‑level use cases.

Multi‑Turn Conversation – Adding Memory

Use the Advisor mechanism to retain conversation context, e.g., for intelligent customer‑service bots.

@Service
public class ChatMemoryService {
    private final ChatClient chatClient;
    private final ChatMemory chatMemory;
    public ChatMemoryService(ChatClient.Builder builder) {
        this.chatMemory = new InMemoryChatMemory();
        this.chatClient = builder.defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory)).build();
    }
    public String chat(String sessionId, String message) {
        return chatClient.prompt()
                         .user(message)
                         .advisors(advisor -> advisor.param("chat_memory_conversation_id", sessionId))
                         .call()
                         .content();
    }
}

Retrieval‑Augmented Generation (RAG) – Knowledge‑Base Q&A

RAG combines vector similarity search with LLM generation. Spring AI provides VectorStore implementations and a QuestionAnswerAdvisor that automatically enriches prompts with retrieved documents.

Add RAG Dependency

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-pgvector-store</artifactId>
</dependency>

Configure PGVector Store

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/vectordb
    username: postgres
    password: password
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536

Implement RAG Service

@Service
public class RagService {
    @Autowired private VectorStore vectorStore;
    @Autowired private ChatClient chatClient;
    public String ask(String question) {
        // QuestionAnswerAdvisor performs vector retrieval and prompt enrichment
        return chatClient.prompt()
                         .user(question)
                         .advisors(new QuestionAnswerAdvisor(vectorStore))
                         .call()
                         .content();
    }
}

Tool Calling – Exposing Java Methods to the Model

Define a Tool

@Component
public class WeatherTools {
    @Tool(description = "Get current weather by city name")
    public String getWeather(@ToolParam(description = "City name, e.g., 'Beijing'") String city) {
        // Mock data; replace with a real weather API in production
        Map<String, String> weatherMap = Map.of(
            "Beijing", "Sunny, 25°C, North wind level 2",
            "Shanghai", "Cloudy, 22°C, East wind level 3",
            "Shenzhen", "Showers, 28°C, South wind level 2"
        );
        return weatherMap.getOrDefault(city, "Weather data not available");
    }
}

Register the Tool with ChatClient

@RestController
@RequestMapping("/api/tools")
public class ToolController {
    private final ChatClient chatClient;
    public ToolController(ChatClient.Builder builder, WeatherTools weatherTools) {
        this.chatClient = builder.defaultSystem("You are a professional weather assistant").build();
    }
    @GetMapping("/weather")
    public String askWeather(@RequestParam String question) {
        // The model decides when to invoke the tool
        return chatClient.prompt()
                         .user(question)
                         .tools(new WeatherTools())
                         .call()
                         .content();
    }
}

Multimodal – Sending Images to the Model

Using a multimodal model such as GPT‑4o, Spring AI can send an image together with a textual prompt.

@Service
public class MultimodalService {
    private final ChatClient chatClient;
    public MultimodalService(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }
    public String analyzeReceipt(byte[] imageBytes) {
        return chatClient.prompt()
                         .user(userSpec -> userSpec
                             .text("Please analyze this receipt image and extract merchant name, total amount, and item list as JSON.")
                             .media(MimeTypeUtils.IMAGE_JPEG, new ByteArrayResource(imageBytes)))
                         .call()
                         .content();
    }
}

MCP Integration – Standardized Tool Ecosystem

The Model‑Context Protocol (MCP) defines a uniform way for LLMs to invoke external tools. Spring AI supplies native MCP support.

Add MCP Dependency

<dependency>
  <groupId>org.springframework.experimental</groupId>
  <artifactId>spring-ai-mcp</artifactId>
  <version>1.0.0</version>
</dependency>

Wire MCP Tools into ChatClient

@Configuration
public class McpConfig {
    @Bean
    public McpClient mcpClient() {
        var stdioParams = ServerParameters.builder("npx")
            .args("-y", "@modelcontextprotocol/server-brave-search")
            .addEnvVar("BRAVE_API_KEY", System.getenv("BRAVE_API_KEY"))
            .build();
        return McpClient.using(new StdioClientTransport(stdioParams)).sync();
    }

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder, McpClient mcpClient) {
        var init = mcpClient.initialize(); // optional initialization
        return builder
            .defaultFunctions(mcpClient.listTools().tools().stream()
                .map(tool -> new McpFunctionCallback(mcpClient, tool))
                .toArray(McpFunctionCallback[]::new))
            .build();
    }
}

Agent Development – Multi‑Agent Workflows

Spring AI Alibaba (1.1.2.x) provides a React‑based agent framework that reduces development time from days to hours.

Add Spring AI Alibaba Dependency

<dependency>
  <groupId>com.alibaba.cloud.ai</groupId>
  <artifactId>spring-ai-alibaba-starter-dashscope</artifactId>
  <version>1.1.2.0</version>
</dependency>

Create a ReactAgent

@Component
public class WeatherAgent {
    private final ReactAgent agent;
    public WeatherAgent(ChatModel chatModel, WeatherTools weatherTools) {
        this.agent = ReactAgent.builder()
            .name("weather_assistant")
            .model(chatModel)
            .tools(weatherTools)
            .systemPrompt("You are a professional weather assistant helping users query weather information")
            .build();
    }
    public String ask(String question) {
        return agent.call(question);
    }
}

Advantages, Limitations, and Ideal Use Cases

Advantages

Deep Spring Boot integration : follows familiar configuration and development habits.

Zero‑cost model switching : unified API lets you change providers by editing config.

Enterprise‑grade features : supports transactions, security, monitoring, and other Spring ecosystem capabilities.

Modular design : include only needed modules to avoid bloated dependencies.

Comprehensive documentation : maintained by the Spring team; 1.0 GA is stable.

Limitations

Learning curve : developers must understand abstractions such as ChatModel, ChatClient, and Advisor.

Java version requirement : JDK 17 or newer.

Young ecosystem : community size lags behind Python‑centric frameworks like LangChain.

Typical Scenarios

Enterprises already using Spring Boot/Cloud stacks.

Enterprise‑level RAG knowledge‑base systems.

AI‑enhanced customer‑service or content‑generation applications.

Projects needing multi‑model switching or domestic model integration.

Conclusion

The hands‑on examples illustrate six core capabilities of Spring AI:

ChatClient – unified conversational API with synchronous and streaming support.

Multi‑turn conversation – session memory via Advisor.

RAG – vector‑store backed knowledge‑base Q&A.

Tool calling – enabling AI to invoke external services.

Multimodal – handling images, video, and audio inputs.

Agent development – building sophisticated multi‑agent workflows with ReactAgent and graph orchestration.

Start with the basic chat example, then progressively add RAG and tool calling. For high‑performance needs, consider a local Ollama deployment; for production‑grade reliability, use Alibaba DashScope, Azure OpenAI, or another managed provider.

Java RAG Spring AI AI integration ChatClient

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.