Design and Implementation of a Sonar Monitoring System for Spring Boot Applications
This article presents the design, architecture, and technology choices of a Sonar monitoring system for Spring Boot microservices, covering time‑series database selection, deployment topology, client collection strategies, and future plans for advanced analytics and AI‑driven alerting.
1. Introduction
The Sonar system is an integrated solution for data collection, storage, visualization, and alerting. This article introduces its application in Spring Boot projects, design scheme, system architecture, and future plans, covering time‑series databases, selection between Graphite and InfluxDB, scalable high‑availability deployment, client design for performance and stability, and ideas for deep data mining.
2. Background
In the micro‑service era, monolithic applications are split into many services, making real‑time monitoring essential. Spring Cloud provides Spring Boot Actuator, which offers rich metrics but lacks persistence and historical visualization, prompting the introduction of a time‑series database.
The Eagle Eye system (log collection, alarm, analysis) ingests massive event logs; metric‑type logs (e.g., error counts per hour) are better suited for time‑series storage due to simpler queries, lower storage cost, and longer retention.
Business data such as order statistics require time‑series storage for trend analysis, which event‑type logs cannot replace.
3. Case Study
The Sonar dashboard displays runtime metrics of Spring Boot projects, revealing issues like unreleased DB connections and hot data access.
Key panels show high‑frequency data such as system load, thread count, QPS, GC, etc., with expandable sub‑panels for deeper analysis.
Memory details and year‑over‑year/ month‑over‑month alerts help identify hotspots and business anomalies.
The open‑source Grafana plugin graph-compare-panel (https://github.com/AutohomeCorp/graph-compare-panel) was developed to support comparative visualizations.
4. Glossary
Time Series Data: Data indexed by timestamps, e.g., temperature per minute.
Metric: The measured value, e.g., temperature.
Tag: Dimension for a metric, used as an index for queries.
Field: The actual value of a metric.
Timestamp: The moment a metric is recorded.
statsd: Daemon that aggregates client‑sent data; supports gauge, counter, timing, set.
5. Technical Selection
Key requirements: high write performance (≈200 M points/day), scalability, high availability, multi‑protocol support (statsd, InfluxDB, Graphite), and both TCP/HTTP transport.
5.1 Database Selection
InfluxDB ranks first among time‑series databases and offers better write performance, extensibility, and a SQL‑like query language compared to Graphite. Tests show InfluxDB storage cost is about 1/11 of Graphite for the same metric count, leading to its selection.
5.2 Deployment Architecture Selection
Statsd cannot write directly to InfluxDB, so Telegraf replaces Statsd for metric aggregation. Uber’s statsrelay provides TCP/UDP load‑balancing with consistent hashing, while Consul handles health‑checking and dynamic load configuration. Influx‑relay lacks query support, so InfluxDB‑proxy is used for both write and query proxying, supporting sharding, replication, and fail‑over.
5.3 Collector Client Selection
To support TCP transport, a log4j‑based socket appender is reused for metric emission, handling Graphite, Statsd, and InfluxDB protocols. Micrometer is chosen over the native Actuator for Spring Boot 1.x and 2.x compatibility, providing stable interval collection and unified metric handling.
6. Data Applications and Planning
Current data sources include Spring Boot runtime info, Solr, Consul, Sonar clusters, and business order metrics.
Alerting scenarios cover comparative (e.g., PV change >20%), threshold‑based (e.g., heap usage >80%), and state‑based alerts (e.g., process exit code).
Future work aims to incorporate AI for baseline learning, root‑cause analysis, alarm convergence, intelligent decision‑making, and predictive modeling.
References
Ele.me InfluxDB practice
Graphite cluster scaling
Uber statsrelay + statsite
Micrometer monitoring for Spring Boot
Telegraf documentation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
