Backend Development 9 min read

Dynamic Rate Limiting Based on System Load: Principles, Implementation, and Test Results

This article explains how dynamic rate limiting can automatically compute throttling thresholds using system load metrics such as CPU usage, load average, and response time, and demonstrates its effectiveness through real‑world testing on a flight‑price service.

Qunar Tech Salon

Jan 4, 2018

Dynamic Rate Limiting Based on System Load: Principles, Implementation, and Test Results

When a system’s processing capacity is exceeded, request queues grow and response times spike; uncontrolled resource consumption can even cause crashes, so services need rate limiting to stay stable under traffic bursts.

Rate limiting falls into two categories: single‑machine (e.g., Guava RateLimiter, Java Semaphore) and global (e.g., Redis counters, a common‑blocking component). All of these require a manually set fixed threshold, which is hard to size accurately, becomes stale after upgrades, cannot handle performance fluctuations, and is affected by VM‑level resource stealing.

To overcome these drawbacks, a dynamic rate‑limiting approach calculates thresholds automatically based on system load, using three common monitoring metrics: CPU usage, load average, and service response time.

The goal of dynamic rate limiting is to find a reasonable threshold that maximizes processing capacity while keeping the system robust. Two load thresholds (e.g., 50 % and 70 % CPU) define health states: healthy, unhealthy, and deteriorating. When the system is healthy, no limiting is applied; when it becomes unhealthy, the recent QPS is used to set an initial limit; when it deteriorates, the limit decays by a factor derived from the ratio of current to healthy QPS.

The initial limit is calculated as H × (C / H)^{0.5}, where H is the healthy‑state QPS and C is the current QPS. This formula prevents the limit from converging too slowly when the initial estimate is poor.

After the initial limit is set, the system continuously adjusts it: a lower‑load threshold (0) is fixed at 70 % of the first load threshold to quickly raise the limit when load drops too low, while fine‑grained adjustments are made near the convergence range to avoid large jumps.

In practice, the load thresholds can be configured flexibly; the default for threshold 0 is 70 % of threshold 1.

When the system recovers to a healthy state and can handle all requests, rate limiting is disabled.

Testing on a 4‑core CPU service showed that load‑based dynamic limiting reduced CPU usage from 99 % to a more stable range, but frequent limit updates (1 s) caused load volatility; increasing the update interval to 10 s smoothed the metrics.

Switching to CPU‑based limiting with thresholds of 70 %–90 % kept CPU usage stable, reduced load, and improved search latency from 70 ms to 150 ms, demonstrating higher resource utilization.

Comparing CPU‑based and load‑based limiting showed that CPU‑based limiting processed 164 successful searches versus 134 for load‑based, confirming its efficiency.

For latency‑sensitive services, time‑based limiting (thresholds 140 ms–200 ms) kept response times stable while maintaining steady CPU and load metrics.

All the described load‑based dynamic limiting logic has been packaged into a reusable dynamic‑limiter API for easy integration across multiple systems.

Dynamic rate limiting is suitable when a single service dominates resource usage (use CPU or load metrics) or when strict response‑time guarantees are required (use time metrics); multiple metrics can be combined for multi‑constraint throttling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance CPU rate limiting system load load-average Dynamic Throttling

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.