Handling Sudden Traffic Spikes in Backend Systems
The article outlines a comprehensive approach for backend engineers to manage a sudden 100‑fold increase in traffic, covering emergency response, traffic analysis, robust system design, rate limiting, circuit breaking, scaling, sharding, pooling, caching, asynchronous processing, and stress testing to ensure system stability and performance.
Preface
Hello, I am Tianluo. This article shares a ByteDance interview scenario: your business system experiences a sudden traffic surge, e.g., QPS increases 100 times, how would you handle it?
Some candidates immediately suggest adding machines or scaling up, which only scores a small portion of the interview and is insufficient.
As a competent backend developer, you should consider the problem from multiple dimensions to provide a complete and correct answer.
Emergency response: quick stop‑bleeding
Calm analysis: why did traffic surge? Is it reasonable?
Robust design: strengthen system resilience
Stress testing: evaluate system’s pressure tolerance
1. Emergency Response – Quick Stop‑Bleeding
1.1 Rate Limiting
Implement rate limiting to protect the system by discarding excess requests.
Rate limiting controls the request rate at a network interface, preventing DoS attacks and limiting web crawlers. It ensures system stability under high concurrency.
You can use Guava ’s RateLimiter for single‑node limiting, Redis for distributed limiting, or Alibaba’s open‑source Sentinel component.
Token‑bucket and leaky‑bucket algorithms can also be applied to drop requests that exceed the threshold.
Token‑bucket: tokens are added to a bucket at a fixed rate; a request must acquire a token, otherwise it is throttled. Leaky‑bucket: requests flow into a bucket that drains at a constant rate; overflow triggers throttling.
1.2 Circuit Breaking & Degradation
Circuit breaking protects the system from cascading failures (service avalanche) when a downstream service becomes unavailable.
Example call chain: A→B→C . If service C slows down due to a bug or a slow SQL query, B and then A will also experience latency, consuming threads, I/O, and CPU, eventually causing a system‑wide collapse.
To mitigate a 100‑fold traffic surge, apply circuit breaking:
Circuit Break : enable circuit breaking (e.g., Hystrix) for non‑core services such as recommendation or comments, allowing fast failure and preserving resources for core services like payment.
Service Degradation : disable non‑critical features (e.g., analytics, logging) and return fallback data (e.g., cached product info) to reduce backend load.
1.3 Elastic Scaling
Beyond limiting and degradation, you can scale the system to serve more user requests:
Scaling : add read replicas, upgrade hardware, or increase MySQL/Redis instances to raise throughput.
Traffic Shifting : in multi‑datacenter deployments, route traffic to a less‑loaded region during spikes.
1.4 Message Queues – Spike Smoothing
During high‑traffic events like Double‑11 sales, introduce a message queue to absorb bursts. If the system can handle 2k requests per second but receives 5k, the queue buffers excess requests, allowing the backend to process at its sustainable rate.
2. Calm Analysis – Why Did Traffic Surge?
Determine whether the surge is caused by promotional activities, bugs, or malicious attacks.
Analyze logs and monitoring; if it’s a bug, assess impact and fix quickly.
If it’s an attack, restrict IPs, add blacklists, and apply risk control.
For legitimate promotions, evaluate the scope (single API vs. all APIs), check system bottlenecks (CPU, memory, disk), and decide on urgent measures.
3. Design Phase – Building a Robust System
3.1 Horizontal Expansion (Divide‑and‑Conquer)
Deploy multiple servers and distribute traffic to avoid single‑point failures and increase overall concurrency.
3.2 Microservice Decomposition
Split a monolithic application into independent services (e.g., user, order, product) to spread load and improve scalability.
3.3 Database Sharding & Partitioning
When traffic spikes, a single MySQL instance may hit connection limits ("too many connections"). Split data across multiple databases and tables to handle tens of millions of rows and alleviate connection pressure.
3.4 Connection Pooling
Use connection pools for databases, HTTP, Redis, etc., to reuse connections and avoid the overhead of creating/destroying them on each request.
3.5 Caching
Leverage caches such as Redis, local JVM caches, or Memcached to serve read‑heavy traffic and reduce backend load.
3.6 Asynchronous Processing
Synchronous calls block the caller until the callee finishes, which can cause bottlenecks under high concurrency. Asynchronous calls return immediately and notify the caller later via callbacks or events.
Implement async processing with message queues: enqueue massive “seckill” requests, respond quickly to users, and process them later, freeing resources for additional traffic.
4. Stress Testing – Verifying System Capacity
Conduct load testing before release to determine the maximum concurrent load and identify bottlenecks across network, Nginx, services, and middleware.
Tools such as LoadRunner or JMeter can be used for performance testing.
5. Final Checklist
Apply rate limiting, circuit breaking, scaling, and spike‑smoothing for quick stop‑bleeding.
After stabilizing, diagnose the root cause (bug, attack, promotion).
Strengthen the system through horizontal expansion, service decomposition, sharding, pooling, caching, async processing, and thorough stress testing.
Always have fallback plans for critical components (e.g., alternative locking mechanisms if Redis fails).
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.