How Spring AI Turns Java Backend into a Powerful AI Service Layer
This article reviews Spring AI's evolution, architecture, and key features—including unified APIs, prompt engineering, streaming, RAG, multi‑modal support, and agent workflows—showing how Java back‑ends can efficiently expose AI capabilities without handling low‑level model logic.
Introduction
Although Java is unlikely to become a mainstream language for AI model development due to language constraints, it remains a strong candidate for providing AI services on the server side. Spring AI, as a Java‑centric framework, aims to simplify the integration of various AI models for B/C‑end applications.
Spring AI Development Timeline
0.8.1 – March 2024 – First public release, basic AI model integration.
1.0.0‑M1 – May 2024 – Added ChatClient API, structured output, conversation memory.
1.0.0‑M2 – August 2024 – Extended provider support, tool calling.
1.0.0‑M6 – February 2025 – @Tool annotation, MCP protocol integration.
1.0.0‑M7 – April 2025 – Independent RAG module, modular architecture.
1.0.0‑RC1 – 13 May 2025 – API lock, production‑ready.
1.0.0 GA – 20 May 2025 – First production version.
1.0.3 – October 2025 – GraalVM native image support.
1.1.0‑M3 – 15 October 2025 – MCP SDK upgrade, multi‑document support.
Core Architecture
┌─────────────────────────────────────────────────────────┐
│ Application Layer (Application) │
│ ChatClient API | Prompt Templates | Structured Output│
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Integration Layer (Integration) │
│ Spring Boot Auto‑Config | Dependency Injection │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Core Layer (Core) │
│ Model Abstraction | Embedding | Vector Store | Memory │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Extension Layer (Extension) │
│ RAG | Agent | MCP | Tool Calling | Observability │
└─────────────────────────────────────────────────────────┘Unified API
Spring AI offers a unified ChatClient (and EmbeddingClient ) that hides provider‑specific details. It supports over 20 model providers—including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Alibaba Tongyi, DeepSeek, Zhipu AI, Ollama—allowing developers to switch models via configuration without changing business code.
return chatClient.prompt()
.system("You are a professional text summarizer.")
.user("Summarize the following text:
" + text)
.options(OpenAiChatOptions.builder()
.withTemperature(0.3)
.withMaxTokens(300)
.build())
.call()
.content();Prompt Engineering & Templates
SystemPromptTemplate sysTpl = new SystemPromptTemplate("Answer in {role} style");
Message sysMsg = sysTpl.createMessage(Map.of("role", "Technical Expert"));
UserMessage userMsg = new UserMessage("Explain Spring Boot auto‑configuration");
Prompt prompt = new Prompt(List.of(sysMsg, userMsg));
String result = chatClient.prompt(prompt).call().content();Streaming & Asynchronous Handling
Built on Reactor, Spring AI supports synchronous, streaming (SSE/Flux), and asynchronous call modes, making long‑text conversations straightforward.
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ChatResponse> stream(@RequestParam String msg) {
return chatClient.prompt()
.user(msg)
.stream()
.chatResponse();
}RAG (Retrieval‑Augmented Generation)
Spring AI abstracts vector stores, supporting PGVector, Milvus, Pinecone, Redis, Chroma, and other databases. It enables retrieval, augmentation, and generation pipelines with minimal code.
@Service
public class RagService {
@Autowired private VectorStore vectorStore;
@Autowired private ChatClient chatClient;
public String query(String question) {
List<Document> docs = vectorStore.similaritySearch(question);
String context = docs.stream()
.map(Document::getContent)
.collect(Collectors.joining("
"));
return chatClient.prompt()
.user("Based on the following content:
" + context + "
Question: " + question)
.call()
.content();
}
}Conversation Memory & Context Management
Spring AI provides a ChatMemory interface for multi‑turn dialogue, with a default in‑memory implementation that can be extended to persistent stores.
@Autowired private ChatMemory chatMemory;
@Autowired private ChatClient chatClient;
public String chat(String sessionId, String msg) {
List<Message> history = chatMemory.get(sessionId);
history.add(new UserMessage(msg));
String reply = chatClient.prompt(new Prompt(history)).call().content();
chatMemory.add(sessionId, new AssistantMessage(reply));
return reply;
}
public class InMemoryChatMemory implements ChatMemory {
Map<String, List<Message>> conversationHistory = new ConcurrentHashMap<>();
// implementation omitted for brevity
}Multi‑Modal Support
Beyond text, Spring AI can handle images, audio, etc. Example of image description:
UserMessage msg = new UserMessage("Describe this image",
new Media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("photo.png")));
String description = chatClient.prompt(new Prompt(msg)).call().content();Agent Workflows
Agents combine multiple tools (e.g., weather service, database) to accomplish complex tasks.
// Agent A: weather + database tools
@Bean(name = "opsAgent")
public Agent opsAgent(ChatClient.Builder builder, WeatherTool weatherTool, DatabaseTool databaseTool) {
ChatClient client = builder.defaultSystem("You are an intelligent assistant that can call tools.").build();
return Agent.builder()
.chatClient(client)
.tools(List.of(weatherTool, databaseTool))
.systemPrompt("You are an intelligent assistant, can call tools to complete tasks.")
.build();
}
// Agent B: read‑only weather tool
@Bean(name = "readonlyWeatherAgent")
public Agent readonlyWeatherAgent(ChatClient.Builder builder, WeatherTool weatherTool) {
ChatClient client = builder.defaultSystem("You are a weather advisor, only provide weather info, no persistence.").build();
return Agent.builder()
.chatClient(client)
.tools(List.of(weatherTool))
.systemPrompt("You are a weather advisor, only provide weather info, no persistence.")
.build();
}Additional Features
Structured Output Mapping : Directly map LLM responses to Java POJOs, enums, lists, etc.
Annotation‑Driven Development : Use @AiPrompt and other annotations for declarative AI calls.
Conclusion
Spring AI has matured into a production‑ready solution for exposing AI capabilities from Java services. While it is not intended for training or deploying models, it provides a robust first‑layer API for downstream applications, and combined with AI‑assisted coding it can dramatically boost development efficiency.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
