Master Spring AI: From Hello World to Advanced RAG, Tool Calling, and Agent Development
This step‑by‑step guide shows Java developers how to set up Spring AI, configure various model providers, build basic and streaming chat APIs, enable multi‑turn memory, implement RAG with vector stores, add tool‑calling and multimodal capabilities, integrate MCP, and create sophisticated agents, while comparing ChatModel and ChatClient and outlining strengths, weaknesses, and ideal use cases.
Core Concepts
Spring AI is a full‑stack AI application framework for Java that applies Spring’s portability, modular design, and POJO programming model to the AI domain. It abstracts the APIs of major LLM providers (OpenAI, Alibaba DashScope, Ollama, etc.) behind a unified high‑level interface, so switching models only requires a configuration change.
Chat Completion : conversation, code generation, content creation (OpenAI, DashScope, DeepSeek, Claude)
Embedding : text vectorization, semantic search (same providers)
Multimodal : image/video/audio understanding (GPT‑4o, Qwen‑VL)
Function Calling : external API integration, tool invocation (supported by major models)
Vector Database : similarity search, Retrieval‑Augmented Generation (PGVector, Milvus, Redis, Chroma)
MCP : Model‑Context Protocol for standardized tool integration (native support)
Environment Setup
1. Add Spring AI BOM and Model Starter
In pom.xml add the Spring Milestones repository, import the BOM, and include the desired starter:
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
</repository>
</repositories>
<properties>
<spring-ai.version>1.1.2</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
</dependencies>2. Configure API Keys
For OpenAI:
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4oFor Alibaba DashScope (domestic use):
spring:
ai:
dashscope:
api-key: ${AI_DASHSCOPE_API_KEY}3. Run Ollama Locally (no API key)
# Install and start Ollama
ollama pull qwen2.5:latest
ollama serveAdd the Ollama starter dependency and configure the base URL:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency> spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: qwen2.5:latestBasic Conversation – Hello World
Using ChatClient (recommended)
@RestController
@RequestMapping("/api/ai")
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.defaultSystem("You are a professional Java expert").build();
}
@GetMapping("/chat")
public String chat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.call()
.content();
}
}Streaming Output (typewriter effect)
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}Choosing Between ChatModel and ChatClient
API Complexity : ChatModel requires manual prompt construction (high); ChatClient offers a fluent DSL (low).
Advisor Support : unavailable in ChatModel, built‑in for ChatClient.
Memory Management : manual in ChatModel, automatic session memory in ChatClient.
Streaming Output : extra wrapper needed for ChatModel; one‑line enable in ChatClient.
Error Handling : manual catch in ChatModel; unified exception handling in ChatClient.
Recommendation : ChatClient is the preferred choice for most scenarios; ChatModel only for special low‑level use cases.
Multi‑Turn Conversation – Adding Memory
Use the Advisor mechanism to retain conversation context, e.g., for intelligent customer‑service bots.
@Service
public class ChatMemoryService {
private final ChatClient chatClient;
private final ChatMemory chatMemory;
public ChatMemoryService(ChatClient.Builder builder) {
this.chatMemory = new InMemoryChatMemory();
this.chatClient = builder.defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory)).build();
}
public String chat(String sessionId, String message) {
return chatClient.prompt()
.user(message)
.advisors(advisor -> advisor.param("chat_memory_conversation_id", sessionId))
.call()
.content();
}
}Retrieval‑Augmented Generation (RAG) – Knowledge‑Base Q&A
RAG combines vector similarity search with LLM generation. Spring AI provides VectorStore implementations and a QuestionAnswerAdvisor that automatically enriches prompts with retrieved documents.
Add RAG Dependency
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store</artifactId>
</dependency>Configure PGVector Store
spring:
datasource:
url: jdbc:postgresql://localhost:5432/vectordb
username: postgres
password: password
ai:
vectorstore:
pgvector:
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1536Implement RAG Service
@Service
public class RagService {
@Autowired private VectorStore vectorStore;
@Autowired private ChatClient chatClient;
public String ask(String question) {
// QuestionAnswerAdvisor performs vector retrieval and prompt enrichment
return chatClient.prompt()
.user(question)
.advisors(new QuestionAnswerAdvisor(vectorStore))
.call()
.content();
}
}Tool Calling – Exposing Java Methods to the Model
Define a Tool
@Component
public class WeatherTools {
@Tool(description = "Get current weather by city name")
public String getWeather(@ToolParam(description = "City name, e.g., 'Beijing'") String city) {
// Mock data; replace with a real weather API in production
Map<String, String> weatherMap = Map.of(
"Beijing", "Sunny, 25°C, North wind level 2",
"Shanghai", "Cloudy, 22°C, East wind level 3",
"Shenzhen", "Showers, 28°C, South wind level 2"
);
return weatherMap.getOrDefault(city, "Weather data not available");
}
}Register the Tool with ChatClient
@RestController
@RequestMapping("/api/tools")
public class ToolController {
private final ChatClient chatClient;
public ToolController(ChatClient.Builder builder, WeatherTools weatherTools) {
this.chatClient = builder.defaultSystem("You are a professional weather assistant").build();
}
@GetMapping("/weather")
public String askWeather(@RequestParam String question) {
// The model decides when to invoke the tool
return chatClient.prompt()
.user(question)
.tools(new WeatherTools())
.call()
.content();
}
}Multimodal – Sending Images to the Model
Using a multimodal model such as GPT‑4o, Spring AI can send an image together with a textual prompt.
@Service
public class MultimodalService {
private final ChatClient chatClient;
public MultimodalService(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
public String analyzeReceipt(byte[] imageBytes) {
return chatClient.prompt()
.user(userSpec -> userSpec
.text("Please analyze this receipt image and extract merchant name, total amount, and item list as JSON.")
.media(MimeTypeUtils.IMAGE_JPEG, new ByteArrayResource(imageBytes)))
.call()
.content();
}
}MCP Integration – Standardized Tool Ecosystem
The Model‑Context Protocol (MCP) defines a uniform way for LLMs to invoke external tools. Spring AI supplies native MCP support.
Add MCP Dependency
<dependency>
<groupId>org.springframework.experimental</groupId>
<artifactId>spring-ai-mcp</artifactId>
<version>1.0.0</version>
</dependency>Wire MCP Tools into ChatClient
@Configuration
public class McpConfig {
@Bean
public McpClient mcpClient() {
var stdioParams = ServerParameters.builder("npx")
.args("-y", "@modelcontextprotocol/server-brave-search")
.addEnvVar("BRAVE_API_KEY", System.getenv("BRAVE_API_KEY"))
.build();
return McpClient.using(new StdioClientTransport(stdioParams)).sync();
}
@Bean
public ChatClient chatClient(ChatClient.Builder builder, McpClient mcpClient) {
var init = mcpClient.initialize(); // optional initialization
return builder
.defaultFunctions(mcpClient.listTools().tools().stream()
.map(tool -> new McpFunctionCallback(mcpClient, tool))
.toArray(McpFunctionCallback[]::new))
.build();
}
}Agent Development – Multi‑Agent Workflows
Spring AI Alibaba (1.1.2.x) provides a React‑based agent framework that reduces development time from days to hours.
Add Spring AI Alibaba Dependency
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-starter-dashscope</artifactId>
<version>1.1.2.0</version>
</dependency>Create a ReactAgent
@Component
public class WeatherAgent {
private final ReactAgent agent;
public WeatherAgent(ChatModel chatModel, WeatherTools weatherTools) {
this.agent = ReactAgent.builder()
.name("weather_assistant")
.model(chatModel)
.tools(weatherTools)
.systemPrompt("You are a professional weather assistant helping users query weather information")
.build();
}
public String ask(String question) {
return agent.call(question);
}
}Advantages, Limitations, and Ideal Use Cases
Advantages
Deep Spring Boot integration : follows familiar configuration and development habits.
Zero‑cost model switching : unified API lets you change providers by editing config.
Enterprise‑grade features : supports transactions, security, monitoring, and other Spring ecosystem capabilities.
Modular design : include only needed modules to avoid bloated dependencies.
Comprehensive documentation : maintained by the Spring team; 1.0 GA is stable.
Limitations
Learning curve : developers must understand abstractions such as ChatModel, ChatClient, and Advisor.
Java version requirement : JDK 17 or newer.
Young ecosystem : community size lags behind Python‑centric frameworks like LangChain.
Typical Scenarios
Enterprises already using Spring Boot/Cloud stacks.
Enterprise‑level RAG knowledge‑base systems.
AI‑enhanced customer‑service or content‑generation applications.
Projects needing multi‑model switching or domestic model integration.
Conclusion
The hands‑on examples illustrate six core capabilities of Spring AI:
ChatClient – unified conversational API with synchronous and streaming support.
Multi‑turn conversation – session memory via Advisor.
RAG – vector‑store backed knowledge‑base Q&A.
Tool calling – enabling AI to invoke external services.
Multimodal – handling images, video, and audio inputs.
Agent development – building sophisticated multi‑agent workflows with ReactAgent and graph orchestration.
Start with the basic chat example, then progressively add RAG and tool calling. For high‑performance needs, consider a local Ollama deployment; for production‑grade reliability, use Alibaba DashScope, Azure OpenAI, or another managed provider.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
