Operations 15 min read

Design and Implementation of a Sonar Monitoring System for Spring Boot Applications

This article presents the design, architecture, and technology choices of a Sonar monitoring system for Spring Boot microservices, covering time‑series database selection, deployment topology, client collection strategies, and future plans for advanced analytics and AI‑driven alerting.

HomeTech

Mar 7, 2019

Design and Implementation of a Sonar Monitoring System for Spring Boot Applications

1. Introduction

The Sonar system is an integrated solution for data collection, storage, visualization, and alerting. This article introduces its application in Spring Boot projects, design scheme, system architecture, and future plans, covering time‑series databases, selection between Graphite and InfluxDB, scalable high‑availability deployment, client design for performance and stability, and ideas for deep data mining.

2. Background

In the micro‑service era, monolithic applications are split into many services, making real‑time monitoring essential. Spring Cloud provides Spring Boot Actuator, which offers rich metrics but lacks persistence and historical visualization, prompting the introduction of a time‑series database.

The Eagle Eye system (log collection, alarm, analysis) ingests massive event logs; metric‑type logs (e.g., error counts per hour) are better suited for time‑series storage due to simpler queries, lower storage cost, and longer retention.

Business data such as order statistics require time‑series storage for trend analysis, which event‑type logs cannot replace.

3. Case Study

The Sonar dashboard displays runtime metrics of Spring Boot projects, revealing issues like unreleased DB connections and hot data access.

Key panels show high‑frequency data such as system load, thread count, QPS, GC, etc., with expandable sub‑panels for deeper analysis.

Memory details and year‑over‑year/ month‑over‑month alerts help identify hotspots and business anomalies.

The open‑source Grafana plugin graph-compare-panel (https://github.com/AutohomeCorp/graph-compare-panel) was developed to support comparative visualizations.

4. Glossary

Time Series Data: Data indexed by timestamps, e.g., temperature per minute.

Metric: The measured value, e.g., temperature.

Tag: Dimension for a metric, used as an index for queries.

Field: The actual value of a metric.

Timestamp: The moment a metric is recorded.

statsd: Daemon that aggregates client‑sent data; supports gauge, counter, timing, set.

5. Technical Selection

Key requirements: high write performance (≈200 M points/day), scalability, high availability, multi‑protocol support (statsd, InfluxDB, Graphite), and both TCP/HTTP transport.

5.1 Database Selection

InfluxDB ranks first among time‑series databases and offers better write performance, extensibility, and a SQL‑like query language compared to Graphite. Tests show InfluxDB storage cost is about 1/11 of Graphite for the same metric count, leading to its selection.

5.2 Deployment Architecture Selection

Statsd cannot write directly to InfluxDB, so Telegraf replaces Statsd for metric aggregation. Uber’s statsrelay provides TCP/UDP load‑balancing with consistent hashing, while Consul handles health‑checking and dynamic load configuration. Influx‑relay lacks query support, so InfluxDB‑proxy is used for both write and query proxying, supporting sharding, replication, and fail‑over.

5.3 Collector Client Selection

To support TCP transport, a log4j‑based socket appender is reused for metric emission, handling Graphite, Statsd, and InfluxDB protocols. Micrometer is chosen over the native Actuator for Spring Boot 1.x and 2.x compatibility, providing stable interval collection and unified metric handling.

6. Data Applications and Planning

Current data sources include Spring Boot runtime info, Solr, Consul, Sonar clusters, and business order metrics.

Alerting scenarios cover comparative (e.g., PV change >20%), threshold‑based (e.g., heap usage >80%), and state‑based alerts (e.g., process exit code).

Future work aims to incorporate AI for baseline learning, root‑cause analysis, alarm convergence, intelligent decision‑making, and predictive modeling.

References

Ele.me InfluxDB practice

Graphite cluster scaling

Uber statsrelay + statsite

Micrometer monitoring for Spring Boot

Telegraf documentation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices Spring Boot time_series_database InfluxDB

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.