Artificial Intelligence 12 min read

How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer

This article explains why observability is essential for Spring AI applications, outlines common cost‑control and performance challenges, and provides a step‑by‑step guide—including Maven setup, client configuration, service implementation, metric exposure, Zipkin tracing, and architecture insights—to create a fully observable, enterprise‑grade AI translation service.

Java Architecture Diary

May 26, 2025

How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer

In the era of explosive AI application growth, Spring AI 1.0 brings revolutionary observability features. This article explores how to use Spring AI + Micrometer to build an enterprise‑grade AI monitoring system for cost control, performance optimization, and end‑to‑end tracing.

Why Spring AI applications urgently need observability?

AI service cost‑control pain points

Token consumption opaque : Unable to precisely know the cost of each AI call.

Cost growth out of control : At large scale, AI service fees can grow exponentially.

Performance bottlenecks hard to locate : Complex AI call chains make troubleshooting difficult.

Resource usage unreasonable : Lack of data‑driven optimization decisions.

Value of Spring AI observability

Spring AI’s observability features provide perfect solutions to these pain points:

Precise token monitoring : Real‑time tracking of input/output token consumption per call.

Intelligent cost control : Formulate cost‑optimization strategies based on usage statistics.

Deep performance analysis : Identify AI call bottlenecks and optimize response times.

Full‑chain tracing : End‑to‑end recording of request flow within Spring AI applications.

Hands‑on: Build an observable Spring AI translation app

Step 1: Initialize Spring AI project

Create a Spring Boot project on start.spring.io and add the core Spring AI dependencies:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <!-- Spring AI DeepSeek integration -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-deepseek</artifactId>
    </dependency>

    <!-- Spring Boot Web -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Spring Boot Actuator for monitoring -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
</dependencies>

Step 2: Spring AI client configuration

Add a bean for the ChatClient and configure Micrometer metrics:

@SpringBootApplication
public class SpringAiTranslationApplication {
    public static void main(String[] args) {
        SpringApplication.run(SpringAiTranslationApplication.class, args);
    }

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder.build();
    }
}

Application.yml (Spring AI observability settings):

# Spring AI observability configuration
management:
  endpoints:
    web:
      exposure:
        include: "*"
  endpoint:
    health:
      show-details: always
  metrics:
    export:
      prometheus:
        enabled: true

spring:
  threads:
    virtual:
      enabled: true
  ai:
    deepseek:
      api-key: ${DEEPSEEK_API_KEY}
    chat:
      options:
        model: deepseek-chat
        temperature: 0.8

Set the environment variable:

export DEEPSEEK_API_KEY=your-deepseek-api-key

Step 3: Build the Spring AI translation service

Controller and DTO definitions:

@RestController
@RequestMapping("/api/v1")
@RequiredArgsConstructor
@Slf4j
public class SpringAiTranslationController {
    private final ChatModel chatModel;

    @PostMapping("/translate")
    public TranslationResponse translate(@RequestBody TranslationRequest request) {
        log.info("Spring AI translation request: {} -> {}", request.getSourceLanguage(), request.getTargetLanguage());
        String prompt = String.format(
            "As a professional translation assistant, translate the following %s text to %s, preserving tone and style:
%s",
            request.getSourceLanguage(), request.getTargetLanguage(), request.getText());
        String translatedText = chatModel.call(prompt);
        return TranslationResponse.builder()
                .originalText(request.getText())
                .translatedText(translatedText)
                .sourceLanguage(request.getSourceLanguage())
                .targetLanguage(request.getTargetLanguage())
                .timestamp(System.currentTimeMillis())
                .build();
    }
}

@Data
@Builder
class TranslationRequest {
    private String text;
    private String sourceLanguage;
    private String targetLanguage;
}

@Data
@Builder
class TranslationResponse {
    private String originalText;
    private String translatedText;
    private String sourceLanguage;
    private String targetLanguage;
    private Long timestamp;
}

Step 4: Test the Spring AI translation API

Example curl request and response:

curl -X POST http://localhost:8080/api/v1/translate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Spring AI makes AI integration incredibly simple and powerful",
    "sourceLanguage": "English",
    "targetLanguage": "Chinese"
}'

# Response example
{
  "originalText": "Spring AI makes AI integration incredibly simple and powerful",
  "translatedText": "Spring AI让AI集成变得极其简单而强大",
  "sourceLanguage": "English",
  "targetLanguage": "Chinese",
  "timestamp": 1704067200000
}

Spring AI monitoring metrics deep dive

Core metric 1: Spring AI operation performance

Endpoint:

/actuator/metrics/spring.ai.chat.client.operation

{
  "name": "spring.ai.chat.client.operation",
  "description": "Spring AI ChatClient operation performance metric",
  "baseUnit": "seconds",
  "measurements": [
    {"statistic": "COUNT", "value": 15},
    {"statistic": "TOTAL_TIME", "value": 8.456780293},
    {"statistic": "MAX", "value": 2.123904083}
  ],
  "availableTags": [
    {"tag": "gen_ai.operation.name", "values": ["framework"]},
    {"tag": "spring.ai.kind", "values": ["chat_client"]}
  ]
}

Business value:

Monitor Spring AI translation service call frequency.

Analyze response time distribution.

Identify performance bottlenecks.

Core metric 2: Precise token usage tracking

Endpoint:

/actuator/metrics/gen_ai.client.token.usage

{
  "name": "gen_ai.client.token.usage",
  "description": "Spring AI token usage statistics",
  "measurements": [{"statistic": "COUNT", "value": 1250}],
  "availableTags": [
    {"tag": "gen_ai.response.model", "values": ["deepseek-chat"]},
    {"tag": "gen_ai.request.model", "values": ["deepseek-chat"]},
    {"tag": "gen_ai.token.type", "values": ["output", "input", "total"]}
  ]
}

Cost‑control value:

Accurately calculate Spring AI service costs.

Optimize prompt design to reduce token consumption.

Define budget strategies based on usage.

Spring AI call‑chain tracing practice

Step 1: Integrate Zipkin tracing

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>

Step 2: Start Zipkin service

docker run -d \
  --name zipkin-spring-ai \
  -p 9411:9411 \
  -e STORAGE_TYPE=mem \
  openzipkin/zipkin:latest

Step 3: Spring AI tracing configuration

management:
  zipkin:
    tracing:
      endpoint: http://localhost:9411/api/v2/spans
  tracing:
    sampling:
      probability: 1.0

Step 4: Trace visualization

Zipkin UI shows the complete Spring AI call chain, including ChatClient latency and DeepSeek API response time.

Detailed sequence diagram:

Spring AI Observations source architecture analysis

Core components of Spring AI observability:

Spring AI Observations architecture diagram

ChatClientObservationConvention : Defines observation conventions for Spring AI.

ChatClientObservationContext : Holds observation context data.

MicrometerObservationRegistry : Central registry for metrics.

TracingObservationHandler : Handles trace propagation.

Reference links

[1]

start.spring.io: https://start.spring.io

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring observability spring-ai tracing Micrometer

Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.