Operations 17 min read

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

The article outlines G Bank’s transition from a single‑threaded commercial monitoring solution to a self‑developed, open‑source based alert system that leverages Akka for parallel collection, Apache Dubbo for distributed processing, and Apache Ignite for in‑memory storage, achieving million‑level alert capacity, sub‑100 ms latency, and linear scalability.

dbaplus Community

Sep 6, 2020

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

Background

Traditional monitoring in the bank used a commercial suite that performed alert collection with a single‑threaded process and stored alerts in an in‑memory database. Under alert storms the collector dropped data, the database blocked, and latency grew to minutes.

Problems

Data loss and processing blockage during high‑volume alert storms; latency up to minutes.

Simple processing logic could not handle complex, high‑concurrency scenarios.

Solution Overview

The new generation alert system is built entirely on open‑source components to achieve massive concurrent alert handling, flexible rule configuration, and full lifecycle management.

Alert Lifecycle Management

The system follows a closed‑loop lifecycle: generation & ingestion → pre‑processing → storage → notification → post‑recovery closure.

Core Functionalities

Unified ingestion and agile access for heterogeneous sources.

Reduced latency and timely notification.

Root‑cause recommendation and assistance.

Tracking, recovery verification and closure.

Key Architectural Features

The system acts as an alert Manager‑of‑Managers (MOM) and must ingest alerts from infrastructure, middleware, databases, cloud platforms, and business applications.

Technical Design

1. Akka‑Based Parallel Collection

Akka provides a high‑concurrency, distributed, fault‑tolerant runtime based on the Actor model. The collector consists of the following actors:

Data Collection Actor : pulls or receives raw alerts (polling for active sources, passive for push‑based sources).

Raw Data Dispatch Actor : routes raw alerts to analysis actors and performs overall flow control.

Data Analysis Actor : a configurable pool of actors that execute user‑defined processing logic in parallel.

Persisted Data Dispatch Actor : forwards processed data to persistence actors and applies back‑pressure when the storage layer is slow.

Data Persistence Actor : a configurable set of actors that write alerts to the storage backend.

2. Apache Dubbo Distributed Framework

Dubbo supplies high‑performance RPC, intelligent fault tolerance, load balancing, and automatic service registration/discovery. Two services are exposed:

Data Processing Service : CRUD APIs for collectors and other applications (compression, recovery, etc.).

Data Synchronization Service : periodic and incremental backup between primary and backup clusters.

3. APP‑Based Processing for High Configurability

Each processing node runs modular APP containers. An APP represents a logical processing unit (e.g., maintenance window handling, enrichment, notification). APPs can be hot‑plugged, developed with scripts or Scala/Java, and support graceful upgrade.

Stream APP : runs on every node, processes real‑time alerts that match its criteria.

Scheduled Batch APP : a single instance scheduled by the cluster’s scheduler to process a batch of alerts at a fixed interval.

Subscription Batch APP : subscribes to output of Stream or Scheduled Batch APPs for further aggregation.

Broadcast Batch APP : runs on all nodes, processes data assigned by a scheduler for distributed batch work.

Restful APP : dynamically generates REST endpoints to expose internal APP data.

APP containers support hot‑swap, script‑to‑bytecode compilation via Antlr and Java dynamic compilation, and graceful stop‑start where an updating APP finishes in‑flight processing before shutting down.

4. Apache Ignite Distributed In‑Memory Storage

Ignite provides a partitioned, distributed memory cache across five nodes (each 128 GB). Data is stored in ATOMIC mode for high throughput.

SQL tables for active alerts, historical alerts, notification archives, and configuration data.

Key‑value caches for lookup data (CMDB, resource metadata) used during enrichment and pre‑processing.

Memory partitions: active 16 GB, resource 8 GB, history 52 GB, notification 16 GB per node.

Performance Results

Active alert capacity: tens of millions (≈200× previous system).

Historical storage: billions of records.

Write throughput: 11 653 ops/s (≈10× previous).

Alert processing latency: <100 ms (30‑50× improvement).

Scalability: +2 000 ops/s per additional server.

Future Directions

Micro‑service‑based ingestion with webhook interfaces for easier integration.

AI‑driven root‑cause analysis and alert convergence.

Deeper correlation of alerts with performance, configuration, and KPI data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Operations distributed architecture alert system Akka Apache Dubbo Apache Ignite

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.