Design and Implementation of an Intelligent Security Monitoring and Alert System
This article presents a comprehensive design of a real‑time security monitoring and alert platform, detailing challenges in high‑concurrency risk control, an architecture that replaces OLAP polling with scalable compute services, event‑time processing, dynamic thresholding using fbprophet, and practical optimizations with Redis and ClickHouse.
The article uses the 58 information security wind‑chime monitoring and alert system as a prototype, focusing on the entire pipeline from online data generation to metric calculation, storage, and threshold configuration to address challenges in high‑concurrency security risk control.
In security risk scenarios, malicious actors constantly evade detection, making rapid and dynamic detection of evolving attack methods a critical challenge.
The initial OLAP‑based design, which relied on polling the OLAP engine for alerts, proved insufficient for high QPS and scalability; therefore, an active‑polling architecture was introduced, where compute services retrieve metrics, a coordinator manages task completion, and Redis stores aggregated results, all deployed on a cloud platform for flexible scaling.
Real‑time metric computation handles both non‑related (map, filter) and related (reduce) calculations, using a Redis cluster for aggregation and a coordinator to track completion across compute nodes, ensuring accurate and timely metric availability.
Event‑time processing leverages Flink’s event‑time semantics, watermarks, and window assigners to handle out‑of‑order data and trigger window calculations based on timestamps.
Optimizations include combiner‑style aggregation at the map stage and the use of Redis HyperLogLog for efficient UV metric estimation.
Aggregated metric results are stored in ClickHouse, enabling fast, flexible queries for monitoring dashboards.
The alert mechanism supports continuous N‑time alerts, N‑minute‑M‑times alerts using Redis hashes and ZSETs, and failure alerts that trigger when a metric remains zero for a specified period.
Threshold setting is discussed in two categories: static thresholds, which are simple but require manual tuning, and dynamic thresholds generated by the fbprophet time‑series forecasting algorithm, which automatically derives upper and lower bounds from historical data.
Evaluation shows that fbprophet provides accurate predictions with confidence intervals that serve as effective alert thresholds, and the system allows customizable scaling of these intervals to suit different data volatility levels.
The conclusion highlights future challenges such as handling unstructured image and text data, emphasizing the need for a data‑driven intelligent risk control system that balances content and behavior security.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.