Operations 11 min read

Design and Implementation of a Lightweight Service Monitoring and Traffic Management System

This article shares the design and implementation of a lightweight, robust, and low‑intrusion monitoring management system for microservice traffic, detailing data collection via client filters, Redis‑based structured storage, alerting, rate‑limiting, degradation, and authorization mechanisms, and discusses performance optimizations and future improvements.

JD Tech
JD Tech
JD Tech
Design and Implementation of a Lightweight Service Monitoring and Traffic Management System

Microservice architectures are now mature and widely adopted, but service monitoring and traffic management remain major pain points for high‑QPS services. This article describes a lightweight monitoring management system that addresses these challenges with minimal development effort, high robustness, and low intrusion.

The traffic management process consists of three steps: collecting and sending monitoring data, storing the data in a structured way, and using the data for alerts and control. These steps involve a client component, a data store, and a management console.

Client: Integrated into all service applications via a Maven dependency, the client configures a filter and a startup bean. It records each request’s service name, method name, source application (IP converted to app name), and timestamp. To reduce storage, the system aggregates counts per 4‑second interval instead of per second.

Data Storage: Currently uses a Redis cluster with three shards (1 GB each), storing two days of monitoring data for over 100 services, consuming about 40 % of the cluster capacity.

Management Console: A standalone application that displays charts, sends alert notifications, and provides configuration for degradation, rate limiting, and authorization.

Data Collection Details: Each service instance maintains a static concurrent map where the key is a combination of service and source, and the value is an atomic counter. A monitoring thread runs every 4 seconds, precisely aligned to timestamps (0 s, 4 s, …, 56 s), dumps the counters to an asynchronous queue, and resets them, ensuring minimal impact on the service flow.

Sending Monitoring Data: The async thread pushes the aggregated statistics to the Redis cluster using INCRBY operations. To mitigate the high write load, the implementation adds random delays and later adopts Redis pipelining, achieving roughly a ten‑fold performance boost.

Structured Storage Approaches:

Simple key‑value where the key encodes system‑interface‑method‑caller‑timestamp; easy but storage‑heavy.

Numeric ID mapping for interface and method names, shortening keys but still consuming space.

Hash structures storing timestamps and counts as fields, reducing storage by more than half; this is the chosen method.

For reading and chart generation, the system aggregates second‑level data into minute‑level buckets to avoid excessive query results. Batch reads also use pipelining for speed.

Alerting and Control: The management console provides multi‑dimensional dashboards showing source and TPS information. It supports three control actions:

Degradation: When a circuit‑breaker condition is met, the filter returns failures immediately, protecting the service.

Rate Limiting: A simple per‑node limit blocks requests exceeding a configured count within a 4‑second window.

Traffic Whitelisting: Authorized interfaces are allowed through a whitelist; others are denied.

Configuration for degradation, rate limiting, and whitelisting is stored in Redis and cached locally on the client, refreshed every minute.

The management side also runs periodic tasks such as aggregating second‑level data to minute‑level, converting IPs to application names via an internal operations platform, and synchronizing various control switches.

Conclusion: While the current solution is fast to implement and covers many common monitoring needs, it has limitations: it only monitors the provider side, client‑side controls could be expanded, Redis write pressure may grow with larger deployments, and alternative pipelines like Kafka or HBase could be explored for further scalability.

MonitoringMicroservicesoperationsRedistraffic management
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.