How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer
This article explains why observability is essential for Spring AI applications, outlines common cost‑control and performance challenges, and provides a step‑by‑step guide—including Maven setup, client configuration, service implementation, metric exposure, Zipkin tracing, and architecture insights—to create a fully observable, enterprise‑grade AI translation service.
In the era of explosive AI application growth, Spring AI 1.0 brings revolutionary observability features. This article explores how to use Spring AI + Micrometer to build an enterprise‑grade AI monitoring system for cost control, performance optimization, and end‑to‑end tracing.
Why Spring AI applications urgently need observability?
AI service cost‑control pain points
Token consumption opaque : Unable to precisely know the cost of each AI call.
Cost growth out of control : At large scale, AI service fees can grow exponentially.
Performance bottlenecks hard to locate : Complex AI call chains make troubleshooting difficult.
Resource usage unreasonable : Lack of data‑driven optimization decisions.
Value of Spring AI observability
Spring AI’s observability features provide perfect solutions to these pain points:
Precise token monitoring : Real‑time tracking of input/output token consumption per call.
Intelligent cost control : Formulate cost‑optimization strategies based on usage statistics.
Deep performance analysis : Identify AI call bottlenecks and optimize response times.
Full‑chain tracing : End‑to‑end recording of request flow within Spring AI applications.
Hands‑on: Build an observable Spring AI translation app
Step 1: Initialize Spring AI project
Create a Spring Boot project on
start.spring.ioand add the core Spring AI dependencies:
<code><dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Spring AI DeepSeek integration -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-deepseek</artifactId>
</dependency>
<!-- Spring Boot Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring Boot Actuator for monitoring -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
</dependencies></code>Step 2: Spring AI client configuration
Add a bean for the
ChatClientand configure Micrometer metrics:
<code>@SpringBootApplication
public class SpringAiTranslationApplication {
public static void main(String[] args) {
SpringApplication.run(SpringAiTranslationApplication.class, args);
}
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder.build();
}
}</code>Application.yml (Spring AI observability settings):
<code># Spring AI observability configuration
management:
endpoints:
web:
exposure:
include: "*"
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
spring:
threads:
virtual:
enabled: true
ai:
deepseek:
api-key: ${DEEPSEEK_API_KEY}
chat:
options:
model: deepseek-chat
temperature: 0.8</code>Set the environment variable:
<code>export DEEPSEEK_API_KEY=your-deepseek-api-key</code>Step 3: Build the Spring AI translation service
Controller and DTO definitions:
<code>@RestController
@RequestMapping("/api/v1")
@RequiredArgsConstructor
@Slf4j
public class SpringAiTranslationController {
private final ChatModel chatModel;
@PostMapping("/translate")
public TranslationResponse translate(@RequestBody TranslationRequest request) {
log.info("Spring AI translation request: {} -> {}", request.getSourceLanguage(), request.getTargetLanguage());
String prompt = String.format(
"As a professional translation assistant, translate the following %s text to %s, preserving tone and style:\n%s",
request.getSourceLanguage(), request.getTargetLanguage(), request.getText());
String translatedText = chatModel.call(prompt);
return TranslationResponse.builder()
.originalText(request.getText())
.translatedText(translatedText)
.sourceLanguage(request.getSourceLanguage())
.targetLanguage(request.getTargetLanguage())
.timestamp(System.currentTimeMillis())
.build();
}
}
@Data
@Builder
class TranslationRequest {
private String text;
private String sourceLanguage;
private String targetLanguage;
}
@Data
@Builder
class TranslationResponse {
private String originalText;
private String translatedText;
private String sourceLanguage;
private String targetLanguage;
private Long timestamp;
}</code>Step 4: Test the Spring AI translation API
Example curl request and response:
<code>curl -X POST http://localhost:8080/api/v1/translate \
-H "Content-Type: application/json" \
-d '{
"text": "Spring AI makes AI integration incredibly simple and powerful",
"sourceLanguage": "English",
"targetLanguage": "Chinese"
}'
# Response example
{
"originalText": "Spring AI makes AI integration incredibly simple and powerful",
"translatedText": "Spring AI让AI集成变得极其简单而强大",
"sourceLanguage": "English",
"targetLanguage": "Chinese",
"timestamp": 1704067200000
}</code>Spring AI monitoring metrics deep dive
Core metric 1: Spring AI operation performance
Endpoint:
/actuator/metrics/spring.ai.chat.client.operation <code>{
"name": "spring.ai.chat.client.operation",
"description": "Spring AI ChatClient operation performance metric",
"baseUnit": "seconds",
"measurements": [
{"statistic": "COUNT", "value": 15},
{"statistic": "TOTAL_TIME", "value": 8.456780293},
{"statistic": "MAX", "value": 2.123904083}
],
"availableTags": [
{"tag": "gen_ai.operation.name", "values": ["framework"]},
{"tag": "spring.ai.kind", "values": ["chat_client"]}
]
}</code>Business value:
Monitor Spring AI translation service call frequency.
Analyze response time distribution.
Identify performance bottlenecks.
Core metric 2: Precise token usage tracking
Endpoint:
/actuator/metrics/gen_ai.client.token.usage <code>{
"name": "gen_ai.client.token.usage",
"description": "Spring AI token usage statistics",
"measurements": [{"statistic": "COUNT", "value": 1250}],
"availableTags": [
{"tag": "gen_ai.response.model", "values": ["deepseek-chat"]},
{"tag": "gen_ai.request.model", "values": ["deepseek-chat"]},
{"tag": "gen_ai.token.type", "values": ["output", "input", "total"]}
]
}</code>Cost‑control value:
Accurately calculate Spring AI service costs.
Optimize prompt design to reduce token consumption.
Define budget strategies based on usage.
Spring AI call‑chain tracing practice
Step 1: Integrate Zipkin tracing
<code><dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
<groupId>io.zipkin.reporter2</groupId>
<artifactId>zipkin-reporter-brave</artifactId>
</dependency></code>Step 2: Start Zipkin service
<code>docker run -d \
--name zipkin-spring-ai \
-p 9411:9411 \
-e STORAGE_TYPE=mem \
openzipkin/zipkin:latest</code>Step 3: Spring AI tracing configuration
<code>management:
zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans
tracing:
sampling:
probability: 1.0</code>Step 4: Trace visualization
Zipkin UI shows the complete Spring AI call chain, including ChatClient latency and DeepSeek API response time.
Detailed sequence diagram:
Spring AI Observations source architecture analysis
Core components of Spring AI observability:
ChatClientObservationConvention : Defines observation conventions for Spring AI.
ChatClientObservationContext : Holds observation context data.
MicrometerObservationRegistry : Central registry for metrics.
TracingObservationHandler : Handles trace propagation.
Reference links
[1]start.spring.io: https://start.spring.io
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.