Backend Development 10 min read

Design and Evolution of a Distributed Scheduling System for Real‑time Alerts in the Beidou Monitoring Platform

This article details the background, design choices, and architectural evolution of a distributed scheduling system—from a simple Redlock‑based implementation for real‑time alerts to a robust Bull‑powered task queue supporting complex scenarios, load balancing, persistence, and reliable execution across multiple Node.js servers.

58 Tech
58 Tech
58 Tech
Design and Evolution of a Distributed Scheduling System for Real‑time Alerts in the Beidou Monitoring Platform

The Beidou front‑end monitoring system consists of data collection (SDK), processing (Java), storage (Druid), analysis (Node.js) and presentation (React). To enable real‑time alerts, a scheduling component was needed on the Node.js layer.

Background : After the first phase, the platform could collect many metrics but lacked diverse data‑driven applications, prompting the addition of real‑time alerting.

Schedule 1.0 – Simple Scenario : Implemented a Redlock‑based distributed lock. Producers generate alert metrics; consumers acquire the lock, compute thresholds, and send notifications. This ensured a single execution per interval across the cluster, suitable for the limited early requirements.

Problems with 1.0 : As the system grew, issues emerged—uneven task distribution, high complexity with many tasks, lack of ordering, no persistence, and no retry mechanism.

Schedule 2.0 – Architecture Upgrade : Introduced a task‑queue layer using Bull (Redis‑backed) to provide priority, concurrency control, delayed jobs, rate limiting, pause/resume, repeatable jobs, atomic operations, persistence, and UI support. The new design separates producers (push jobs) and consumers (process jobs), enabling ordered, reliable, and scalable execution.

Producer Design : Uses node‑schedule to trigger jobs, generates a consistent JobId, and adds the job to Bull, relying on Redis lists for durability.

Consumer Design : Listens with BRPOPLPUSH, processes jobs concurrently, handles completion, failure, and final batch events, ensuring atomic state updates in Redis.

Result : The upgraded system now supports real‑time alerts, sampling analysis, data caching, weekly reports, and other scheduled tasks with high reliability and scalability, while remaining extensible for future challenges.

distributed schedulingBackend DevelopmentRedisNode.jsBullTask Queue
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.