Layered Architecture of Microservice Monitoring and Key Practices
This article explains the layered architecture of microservice monitoring, detailing five monitoring levels—from infrastructure to end-user experience—along with essential monitoring points such as logs, metrics, tracing, alerts, and health checks, and presents a typical monitoring stack using agents, Kafka, ELK, and InfluxDB.
Monitoring is a crucial part of microservice governance; a complete monitoring system directly affects service quality, reliability, and stability.
A well‑designed microservice monitoring system can be divided into five hierarchical layers:
1. Infrastructure Monitoring
This layer is usually handled by operations staff and covers low‑level hardware components such as networks, switches, and routers. Core metrics like traffic volume, packet loss, error rates, and connection counts are monitored to ensure the stability of higher‑level services.
2. System Monitoring
This layer includes physical machines, virtual machines, and operating systems. Typical metrics are CPU usage, memory usage, disk I/O, and network bandwidth.
3. Application Monitoring
This layer is closely related to the services themselves, monitoring URL performance, request counts, latency, error rates, slow SQL queries, cache hit rates, response times, and QPS for each service.
4. Business Monitoring
Business monitoring focuses on key business indicators, such as user login, registration, order placement, and payment success rates for an e‑commerce site, providing data for operational and strategic decision‑making.
5. End‑User Experience Monitoring
This layer tracks client‑side performance, return codes, geographic distribution, carrier conditions, device OS, browser versions, and other factors that affect the end‑user experience.
The five essential monitoring points are:
Log monitoring
Metrics monitoring
Tracing (call‑chain) monitoring
Alerting system
Health checks
A typical monitoring architecture places agents beside each microservice to collect metrics and logs, forwards the data through a message queue such as Kafka for decoupling and high availability, and stores logs with the ELK stack (Elasticsearch, Logstash, Kibana) and metrics in a time‑series database like InfluxDB. Frameworks such as Spring Boot expose health‑check endpoints that can be monitored by tools like Nagios or Zabbix.
Author: Chen Yuzhe. Source: https://juejin.im/post/6844903846192349191
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.