How to Prevent Proxy Overload and Cascading Failures in Backend Systems
This article explains what overload means in server development, analyzes a concrete proxy overload case, identifies root causes, and presents a multi‑layer protection strategy—including self‑protection, downstream safeguards, concurrency control, and improved circuit‑breaker designs—to keep services stable under heavy traffic.
What is Overload
In server development, overload occurs when the incoming request volume exceeds the system’s maximum processing capacity, causing response times to increase dramatically and potentially leading to a cascading failure (snowball effect).
Overload Symptoms
During overload, each request takes longer to respond; if no protection is in place, accumulated time‑outs create a vicious cycle where the system appears completely unavailable.
Case Study: Proxy Overload
A reverse‑proxy forwards requests from upstream to downstream modules. Module A uses a Reactor pattern with multithreaded forwarding to Module B. Under normal conditions, Module A can handle 1,000 QPS, but when Module B’s processing latency rises from 10 ms to 40 ms, Module A’s effective capacity drops to 250 QPS while upstream traffic remains at 800 QPS.
Consequently, 550 requests per second pile up in the Recv‑Q buffer, filling it in about 4 seconds. Each request then waits roughly 91 seconds before being processed, far exceeding timeout limits, so the proxy’s external throughput drops to zero.
Root Causes of Overload
Downstream module B experiences large‑scale failures or latency spikes.
Upstream traffic surges (e.g., cache breakdown, flash‑sale spikes).
Other modules on the same machine consume excessive CPU or network resources.
All these boil down to request volume > processing capacity .
Proxy Overload Protection Design
1. Protect the Proxy Itself
Use resource isolation to prevent a single downstream failure from affecting the whole proxy. Two approaches:
Thread‑level isolation: Allocate a fixed number of threads to each downstream service; when a service reaches its limit, reject new requests.
Deployment‑level isolation: Deploy separate proxy instances for high‑traffic or high‑SLA downstream services, achieving physical separation.
In practice, deployment isolation is often simpler and more effective.
2. Protect Downstream Services
Choose an appropriate load‑balancing strategy to route traffic to healthier instances.
Configure retry policies carefully, retrying only on connection errors.
Apply concurrency control: set a maximum concurrent request limit per proxy instance (single‑machine control) or globally via a shared counter (e.g., Redis). Each method has trade‑offs regarding simplicity and coordination.
3. Ensure Downstream Recovery
Recovery requires request volume to fall below processing capacity. Two main techniques:
Rate‑limit or shed load aggressively (degrade service).
Use a circuit‑breaker that monitors downstream latency and error rate, opening when thresholds are exceeded and gradually restoring traffic when the service stabilizes.
Traditional circuit‑breakers stop all traffic when opened, which can cause brief outages. An improved approach throttles traffic instead of cutting it off completely, then gradually ramps up as latency improves.
Other Protection Ideas
Beyond code‑level measures, product and operations teams should guide users during overload (e.g., friendly error pages) and implement comprehensive alerting and auto‑scaling mechanisms to expand capacity when thresholds are reached.
Baidu Maps Tech Team
Want to see the Baidu Maps team's technical insights, learn how top engineers tackle tough problems, or join the team? Follow the Baidu Maps Tech Team to get the answers you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.