Building Full Observability for Spring Cloud Microservices with Micrometer, Prometheus, and Grafana
After solving distributed transactions with Seata, this tutorial shows how to add complete observability to Spring Cloud microservices by integrating Micrometer, Prometheus, and Grafana, covering metrics pillars, configuration, custom business metrics, dashboard setup, alert rules, validation steps, and common pitfalls.
Goal
Complete the observability stack for the Spring Cloud demo project using Micrometer, Prometheus, and Grafana.
Why Observability?
Metrics – system runtime data (Micrometer + Prometheus)
Logging – event records (ELK / Loki)
Tracing – request flow (SkyWalking / Jaeger)
Metrics Types
Counter : ever‑increasing count (e.g., total requests, total errors)
Gauge : value that can go up or down (e.g., active connections, memory usage)
Timer : request latency
DistributionSummary : size distribution (e.g., request payload)
Environment Preparation
Add Prometheus and Grafana services to docker‑compose.yml:
# docker-compose.yml addition
prometheus:
image: prom/prometheus:latest
container_name: prometheus-teaching
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- "teaching-network"
grafana:
image: grafana/grafana:latest
container_name: grafana-teaching
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_INSTALL_PLUGINS=grafana-piechart-panel
volumes:
- grafana-data:/var/lib/grafana
networks:
- teaching-networkPrometheus scrape configuration ( prometheus.yml) defines three jobs for order-service, stock-service, and point-service exposing /actuator/prometheus on ports 8081‑8083.
Service Integration
Add the following Maven dependencies to each service:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>Enable the Prometheus endpoint in application.yml:
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
metrics:
export:
prometheus:
enabled: true
info:
app:
name: ${spring.application.name}
version: 1.0.0
description: Spring Cloud teaching projectVerify the endpoint with curl http://localhost:8081/actuator/prometheus – you should see lines such as
# HELP http_server_requests_seconds Duration of HTTP server requests.
Core Metrics Details
http_server_requests_seconds– HTTP request latency (Timer) jvm_memory_used_bytes – JVM memory usage (Gauge) jvm_gc_pause_seconds – GC pause time (Timer) system_cpu_usage – CPU usage (Gauge) process_uptime_seconds – Process uptime (Gauge)
Custom Business Metrics
Define a component that registers counters, a timer, and a gauge for order processing:
package com.teaching.order.metrics;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Component;
import java.util.concurrent.atomic.AtomicLong;
@Component
public class OrderMetrics {
private final Counter orderCreateCounter;
private final Counter orderSuccessCounter;
private final Counter orderFailureCounter;
private final Timer orderCreateTimer;
private final AtomicLong pendingOrders;
public OrderMetrics(MeterRegistry registry) {
this.orderCreateCounter = Counter.builder("order.create.total")
.description("订单创建总数")
.register(registry);
this.orderSuccessCounter = Counter.builder("order.create.success")
.description("订单创建成功数")
.register(registry);
this.orderFailureCounter = Counter.builder("order.create.failure")
.description("订单创建失败数")
.register(registry);
this.orderCreateTimer = Timer.builder("order.create.duration")
.description("订单创建耗时")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
this.pendingOrders = registry.gauge("order.pending.count", new AtomicLong(0));
}
public void recordCreate() { orderCreateCounter.increment(); }
public void recordSuccess() { orderSuccessCounter.increment(); }
public void recordFailure() { orderFailureCounter.increment(); }
public <T> T recordTimer(java.util.concurrent.Callable<T> callable) throws Exception {
return orderCreateTimer.recordCallable(callable);
}
public void setPendingOrders(long count) { pendingOrders.set(count); }
}Using Metrics in Business Code
@Service
@RequiredArgsConstructor
@Slf4j
public class OrderService {
private final OrderMetrics orderMetrics;
@GlobalTransactional(name = "create-order", rollbackFor = Exception.class)
public void createOrder(OrderCreateDTO request) {
orderMetrics.recordCreate();
try {
orderMetrics.recordTimer(() -> {
// business logic
doCreateOrder(request);
return null;
});
orderMetrics.recordSuccess();
} catch (Exception e) {
orderMetrics.recordFailure();
throw e;
}
}
}Grafana Dashboard Configuration
Open http://localhost:3000 and log in with admin/admin.
Navigate to Configuration → Data Sources → Add data source → Prometheus.
Set URL to http://prometheus:9090 and save.
Import the official Spring Boot dashboard (ID 12900) and select the Prometheus data source.
Alert Rules (Prometheus)
# prometheus/alerts.yml
groups:
- name: service_alerts
rules:
- alert: ServiceDown
expr: up{job=~"order-service|stock-service|point-service"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down"
description: "Service {{ $labels.job }} has not responded for over 1 minute"
- alert: HighErrorRate
expr: |
sum(rate(http_server_requests_seconds_count{status=~"5.."}[2m])) /
sum(rate(http_server_requests_seconds_count[2m])) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "{{ $labels.job }} error rate too high"
description: "Error rate exceeds 5%"
- alert: SlowResponse
expr: |
histogram_quantile(0.99, sum(rate(http_server_requests_seconds_bucket[5m])) by (le, job)) > 2
for: 3m
labels:
severity: warning
annotations:
summary: "{{ $labels.job }} response too slow"
description: "P99 latency exceeds 2 seconds"
- alert: HighMemoryUsage
expr: |
(sum(jvm_memory_used_bytes{area="heap"}) / sum(jvm_memory_max_bytes{area="heap"})) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $labels.job }} JVM memory high"
description: "Heap usage over 85%"Verification of Observability
Start all services: docker-compose up -d.
Generate test traffic (100 POST requests to /api/order/create with a 0.1 s pause).
Query Prometheus for QPS and error rate, e.g.,
sum(rate(http_server_requests_seconds_count{application="order-service"}[1m])).
Open Grafana at http://localhost:3000 and check QPS trends, latency distribution, and JVM metrics.
Common Issues & Pitfalls
Pitfall 1: /actuator/prometheus returns 404
Cause : Prometheus endpoint not exposed.
Fix :
management:
endpoints:
web:
exposure:
include: prometheus,metricsPitfall 2: Prometheus cannot scrape targets
Check http://localhost:9090/targets – ensure targets are UP.
Verify container network connectivity.
Confirm the metrics path ( /actuator/prometheus) is correct.
Pitfall 3: Custom metrics not appearing
Make sure MeterRegistry is injected.
Confirm the custom metric methods are invoked.
Wait for the default 15 s scrape interval.
Next Episode Preview
Spring Cloud Microservices in Practice – Revised Edition (Part 10): Full Docker‑Compose Deployment, covering one‑click service startup, orchestration optimisation, environment isolation, and production‑grade configuration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Coder Trainee
Experienced in Java and Python, we share and learn together. For submissions or collaborations, DM us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
