Operations 11 min read

Understanding Rate Limiting, Degradation, and Circuit Breaking in Distributed Systems

This article explains the concepts of rate limiting, service degradation, and circuit breaking, illustrating passive and active throttling strategies, asynchronous processing, and practical examples such as Alibaba Sentinel, token‑based controls, and Hystrix, to help engineers design resilient, high‑availability systems.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Understanding Rate Limiting, Degradation, and Circuit Breaking in Distributed Systems

Part 1 – Rate Limiting: Self‑Awareness and Insight

Systems must recognize their own capacity and the capacity of downstream services; when traffic exceeds these limits, protective mechanisms become essential.

1.1 Passive Rate Limiting (Self‑Awareness)

Define clear capacity limits and reject excess requests. Two common approaches are static thresholds/rules and adaptive strategies that adjust limits based on real‑time load, CPU usage, average response time, concurrent threads, or QPS. Alibaba’s open‑source Sentinel implements such adaptive limits.

Load‑based: triggers protection when system load1 exceeds a preset value and concurrent threads surpass estimated capacity.

CPU usage: triggers when CPU usage exceeds a configurable threshold (0.0‑1.0).

Average RT: triggers when average request latency reaches a defined millisecond threshold.

Concurrent threads: triggers when the number of concurrent threads on a machine reaches a limit.

Ingress QPS: triggers when inbound QPS exceeds a threshold.

1.2 Active Rate Limiting (Insight)

When downstream services have limited capacity, callers should proportionally reduce requests. Combining cluster‑wide and single‑node throttling is advisable, especially when downstream instances differ significantly in capability.

One solution collects request logs from service nodes, compares them with configured thresholds, and feeds back to each node for proportional throttling (post‑throttling).

Another solution uses a central token‑issuing service; nodes must acquire a token before proceeding, providing precise and elegant pre‑throttling.

1.3 Synchronous‑to‑Asynchronous Conversion

When downstream processing is slower (e.g., third‑party payment settlement), the front‑end can complete the user action immediately and defer the final confirmation to an asynchronous workflow, reducing peak load and improving overall availability.

Part 2 – Degradation: Sacrificing Minor Features to Preserve Core Functionality

During traffic spikes, non‑essential services can be degraded to protect primary business flows.

1. Page Degradation : Hide or disable UI elements (e.g., points‑deduction entry) via a feature‑toggle platform.

2. Storage Degradation : Replace frequent DB writes with cache writes and asynchronous MQ messages, as commonly done in flash‑sale systems.

3. Read Degradation : Disable non‑critical read requests (e.g., avatar fetching in a red‑packet list) under high load.

4. Write Degradation : Block certain write operations entirely when the system is under pressure.

In short, degradation trades a small loss of user experience for overall system stability.

Part 3 – Circuit Breaking: Maintaining a Global View

Circuit breaking prevents cascading failures and service avalanches by temporarily halting calls to an unhealthy downstream service while monitoring its recovery.

The Hystrix circuit‑breaker flow includes three key steps: determining when to open the circuit (algorithm), providing fallback logic during the open state, and detecting recovery to close the circuit.

In practice, monitoring downstream storage errors can trigger a switch that routes traffic to a fallback message queue, ensuring most requests continue processing while the primary storage recovers.

Recommended Reading:

How to Ensure MQ Message Order?

MySQL Open‑Source Tool Collection

What Is a Bloom Filter? Solving High‑Concurrency Cache Penetration

Using Binlog for Cross‑System Data Synchronization

High‑Concurrency Service Optimization: Detailed RPC Call Process

Designing a High‑Performance Flash‑Sale System

Follow the public WeChat account “Internet Full‑Stack Architecture” for more valuable insights.

Distributed Systemsservice degradationrate limitingsystem resiliencecircuit breaking
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.