Backend Development 13 min read

Handling Sudden Traffic Spikes in Backend Systems

The article outlines a comprehensive approach for backend engineers to manage a sudden 100‑fold increase in traffic, covering emergency response, traffic analysis, robust system design, rate limiting, circuit breaking, scaling, sharding, pooling, caching, asynchronous processing, and stress testing to ensure system stability and performance.

IT Services Circle

Apr 23, 2025

Handling Sudden Traffic Spikes in Backend Systems

Preface

Hello, I am Tianluo. This article shares a ByteDance interview scenario: your business system experiences a sudden traffic surge, e.g., QPS increases 100 times, how would you handle it?

Some candidates immediately suggest adding machines or scaling up, which only scores a small portion of the interview and is insufficient.

As a competent backend developer, you should consider the problem from multiple dimensions to provide a complete and correct answer.

Emergency response: quick stop‑bleeding

Calm analysis: why did traffic surge? Is it reasonable?

Robust design: strengthen system resilience

Stress testing: evaluate system’s pressure tolerance

1. Emergency Response – Quick Stop‑Bleeding

1.1 Rate Limiting

Implement rate limiting to protect the system by discarding excess requests.

Rate limiting controls the request rate at a network interface, preventing DoS attacks and limiting web crawlers. It ensures system stability under high concurrency.

You can use Guava ’s RateLimiter for single‑node limiting, Redis for distributed limiting, or Alibaba’s open‑source Sentinel component.

Token‑bucket and leaky‑bucket algorithms can also be applied to drop requests that exceed the threshold.

Token‑bucket: tokens are added to a bucket at a fixed rate; a request must acquire a token, otherwise it is throttled. Leaky‑bucket: requests flow into a bucket that drains at a constant rate; overflow triggers throttling.

1.2 Circuit Breaking & Degradation

Circuit breaking protects the system from cascading failures (service avalanche) when a downstream service becomes unavailable.

Example call chain: A→B→C. If service C slows down due to a bug or a slow SQL query, B and then A will also experience latency, consuming threads, I/O, and CPU, eventually causing a system‑wide collapse.

To mitigate a 100‑fold traffic surge, apply circuit breaking:

Circuit Break : enable circuit breaking (e.g., Hystrix) for non‑core services such as recommendation or comments, allowing fast failure and preserving resources for core services like payment.

Service Degradation : disable non‑critical features (e.g., analytics, logging) and return fallback data (e.g., cached product info) to reduce backend load.

1.3 Elastic Scaling

Beyond limiting and degradation, you can scale the system to serve more user requests:

Scaling : add read replicas, upgrade hardware, or increase MySQL/Redis instances to raise throughput.

Traffic Shifting : in multi‑datacenter deployments, route traffic to a less‑loaded region during spikes.

1.4 Message Queues – Spike Smoothing

During high‑traffic events like Double‑11 sales, introduce a message queue to absorb bursts. If the system can handle 2k requests per second but receives 5k, the queue buffers excess requests, allowing the backend to process at its sustainable rate.

2. Calm Analysis – Why Did Traffic Surge?

Determine whether the surge is caused by promotional activities, bugs, or malicious attacks.

Analyze logs and monitoring; if it’s a bug, assess impact and fix quickly.

If it’s an attack, restrict IPs, add blacklists, and apply risk control.

For legitimate promotions, evaluate the scope (single API vs. all APIs), check system bottlenecks (CPU, memory, disk), and decide on urgent measures.

3. Design Phase – Building a Robust System

3.1 Horizontal Expansion (Divide‑and‑Conquer)

Deploy multiple servers and distribute traffic to avoid single‑point failures and increase overall concurrency.

3.2 Microservice Decomposition

Split a monolithic application into independent services (e.g., user, order, product) to spread load and improve scalability.

3.3 Database Sharding & Partitioning

When traffic spikes, a single MySQL instance may hit connection limits ("too many connections"). Split data across multiple databases and tables to handle tens of millions of rows and alleviate connection pressure.

3.4 Connection Pooling

Use connection pools for databases, HTTP, Redis, etc., to reuse connections and avoid the overhead of creating/destroying them on each request.

3.5 Caching

Leverage caches such as Redis, local JVM caches, or Memcached to serve read‑heavy traffic and reduce backend load.

3.6 Asynchronous Processing

Synchronous calls block the caller until the callee finishes, which can cause bottlenecks under high concurrency. Asynchronous calls return immediately and notify the caller later via callbacks or events.

Implement async processing with message queues: enqueue massive “seckill” requests, respond quickly to users, and process them later, freeing resources for additional traffic.

4. Stress Testing – Verifying System Capacity

Conduct load testing before release to determine the maximum concurrent load and identify bottlenecks across network, Nginx, services, and middleware.

Tools such as LoadRunner or JMeter can be used for performance testing.

5. Final Checklist

Apply rate limiting, circuit breaking, scaling, and spike‑smoothing for quick stop‑bleeding.

After stabilizing, diagnose the root cause (bug, attack, promotion).

Strengthen the system through horizontal expansion, service decomposition, sharding, pooling, caching, async processing, and thorough stress testing.

Always have fallback plans for critical components (e.g., alternative locking mechanisms if Redis fails).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching load testing rate limiting asynchronous processing scaling circuit breaking traffic spikes

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.